System Architecture

The system is composed of three distinct components: the Data Downloader, Data Processor, and Data Visualizer.
The primary function of the Data Downloader is to acquire datasets from DesInventar and convert them to CSV format if they are in XML. Following the download process, the filenames of all sub-type files are stored in separate text files for future reference. For example, ‘flash flood.csv’ will be stored in ‘flood.txt’ as one line of texts.
Data Processor is responsible for cleaning, aggregating/merging, and slicing datasets to shape our data for visualisation.
Data Visualizer generates loss exceedance curves with significant points marked and corresponding loss exceedance tables.
Data refers to datasets downloaded from DesInventar, stored under the folder named 'data', see Data Storage section for more information.

Class Diagrams

CSV Downloader

XML downloader

Converter

Categoriser

Design Patterns

Under the _util package, there are two classes using Adapter Design Pattern. For File class, since we don't need to process the file line by line, we wrap the builtin file readlines() and read() method so that we don’t need to care about the open and close of a file. For Directory class, we wrap the original os.scandir() method so that we can easily access the contents of a directory.

Record Converter Submodule: Factory pattern

We have a Record class containing many attributes, and the original xml tag is written in Spanish. Therefore, we use a RecordFactory class to translate the xml tag into English and generate a Record instance.

Categoriser Submodule: Adapter Pattern

Since we need to interact with a number of files, we use a CategorisationFileGetter using “adapter pattern” to get files needed and put the files into a list for later access.

Adapter pattern


EventTypeAdapter: We need to merge the records by their event types. Therefore, instead of manually interacting with lists, we created an EventTypeAdapter to help with the situation.

Adapter Pattern


DataFrameAdapter: We have an event name and a directory containing events of a country. This class will make it easier to get pandas dataframe for the event in the country.

CountryAdapter: A class for working with countries represented as directories. We can get a list of the names of all available countries, and get the directory representing a given country.

FolderSelector: A class for selecting and accessing a specific folder

Data Storage

Since we chose to use Python to manipulate the data, and the mostly used third-party module is pandas, it would be easy for processing and visualising if we store the data into csv spreadsheets.

Folder structure

Data Flow

The flow of data from download to display, including cleaning, categorization, aggregation, and presentation as tables and curves, is depicted in the Data Flow diagram.

Packages and API Defined

xml_downloader

start_process(target_dir, clean_zip=False)

- target_dir: the folder which data will be stored
- clean_zip: determine whether to delete downloader zip files. Default to False.

csv_downloader

start_download(target_dir, mode=0b000)

- target_dir: the folder which data will be stored
- mode (int): an integer from 0 to 7, whose highest bit determines whether to ignore existing spreadsheets and the last two bits determine the level of ignoring of caches.
Let `ignore_cache = mode & 0b011`. If `ignore_cache` is greater than 0, the crawler will ignore cache in `caches/disasters.pkl`. If `ignore_cache` is greater than 1, the crawler will ignore cache in `caches/disasters.pkl` and `caches/disasters/*`. If `ignore_cache` is greater than 2, all caches will be ignored.

record_converter

RecordsConverter: A class to convert xml to csv

- set_data_folder(target_dir)
- start_conversion()

categoriser

start_categorise(target_dir)

processor

set_data_dir(target_dir)


process(option)

- option = { 'desinventar': { 'merge': True, 'slice': True, }, 'emdat': { 'process': True, }, }

visualiser

set_data_folder(target_dir)


get_available_countries()


plot_exceedance_curves(countries, events, losses, years_required)

- countries: A string or list of strings specifying the countries.
- events: A string or list of strings specifying the events.
- losses: A Loss enum or list of Loss enums specifying the losses.
- years_required: An int specifying the minimum number of years of data required. Default is -1.

get_exceedance_table(countries, events,years_required)

- countries: A string or list of strings specifying the countries.
- events: A string or list of strings specifying the events.
- years_required: An int specifying the minimum number of years of data required. Default is -1.