The system is composed of three distinct components: the Data Downloader, Data Processor, and Data Visualizer.
The primary function of the Data Downloader is to acquire datasets from DesInventar and convert them to CSV format if they are in XML. Following the download process, the filenames of all sub-type files are stored in separate text files for future reference. For example, ‘flash flood.csv’ will be stored in ‘flood.txt’ as one line of texts.
Data Processor is responsible for cleaning, aggregating/merging, and slicing datasets to shape our data for visualisation.
Data Visualizer generates loss exceedance curves with significant points marked and corresponding loss exceedance tables.
Data refers to datasets downloaded from DesInventar, stored under the folder named 'data', see Data Storage section for more information.
System Architecture
Class Diagrams
CSV Downloader
XML downloader
Converter
Categoriser
Design Patterns
Under the _util package, there are two classes using Adapter Design Pattern. For File class, since we don't need to process the file line by line, we wrap the builtin file readlines() and read() method so that we don’t need to care about the open and close of a file. For Directory class, we wrap the original os.scandir() method so that we can easily access the contents of a directory.
Record Converter Submodule: Factory pattern
We have a Record class containing many attributes, and the original xml tag is written in Spanish. Therefore, we use a RecordFactory class to translate the xml tag into English and generate a Record instance.Categoriser Submodule: Adapter Pattern
Since we need to interact with a number of files, we use a CategorisationFileGetter using “adapter pattern” to get files needed and put the files into a list for later access.Adapter pattern
EventTypeAdapter: We need to merge the records by their event types. Therefore, instead of manually interacting with lists, we created an EventTypeAdapter to help with the situation.
Adapter Pattern
DataFrameAdapter: We have an event name and a directory containing events of a country. This class will make it easier to get pandas dataframe for the event in the country.
CountryAdapter: A class for working with countries represented as directories. We can get a list of the names of all available countries, and get the directory representing a given country.
FolderSelector: A class for selecting and accessing a specific folder
Data Storage
Since we chose to use Python to manipulate the data, and the mostly used third-party module is pandas, it would be easy for processing and visualising if we store the data into csv spreadsheets.
Folder structure
Data Flow
The flow of data from download to display, including cleaning, categorization, aggregation, and presentation as tables and curves, is depicted in the Data Flow diagram.
Packages and API Defined
xml_downloader
start_process(target_dir, clean_zip=False)
- target_dir: the folder which data will be stored- clean_zip: determine whether to delete downloader zip files. Default to False.
csv_downloader
start_download(target_dir, mode=0b000)
- target_dir: the folder which data will be stored- mode (int): an integer from 0 to 7, whose highest bit determines whether to ignore existing spreadsheets and the last two bits determine the level of ignoring of caches.
Let `ignore_cache = mode & 0b011`. If `ignore_cache` is greater than 0, the crawler will ignore cache in `caches/disasters.pkl`. If `ignore_cache` is greater than 1, the crawler will ignore cache in `caches/disasters.pkl` and `caches/disasters/*`. If `ignore_cache` is greater than 2, all caches will be ignored.
record_converter
RecordsConverter: A class to convert xml to csv
- set_data_folder(target_dir)- start_conversion()
categoriser
start_categorise(target_dir)
processor
set_data_dir(target_dir)
process(option)
- option = { 'desinventar': { 'merge': True, 'slice': True, }, 'emdat': { 'process': True, }, }visualiser
set_data_folder(target_dir)
get_available_countries()
plot_exceedance_curves(countries, events, losses, years_required)
- countries: A string or list of strings specifying the countries.- events: A string or list of strings specifying the events.
- losses: A Loss enum or list of Loss enums specifying the losses.
- years_required: An int specifying the minimum number of years of data required. Default is -1.
get_exceedance_table(countries, events,years_required)
- countries: A string or list of strings specifying the countries.- events: A string or list of strings specifying the events.
- years_required: An int specifying the minimum number of years of data required. Default is -1.