Comprehensive IFRC Data Analysis System (CIDAS) User Manual

Introduction:

Welcome to CIDAS. This user manual aims to help developers use our product effectively.

CIDAS is a powerful tool for downloading, processing and analysing the data from DesInventar. With its robust set of features and intuitive APIs, you can analyse the data quickly and easily.

System Requirements:

Installation:

To install CIDAS, please follow the deployment manual for detailed instructions.

Getting Started:

Using CIDAS

The data-downloader module

1. Activate the environment

2. Usage

This module provides functionality for downloading data from DesInventar to the target directory.

Note for the csv_downloader module

In this code snippet, target_dir is the directory where the csv files will be downloaded to.

mode is an integer from 0 to 7, whose highest bit determines whether to ignore existing spreadsheets and last two bits determine the level of ignoring of caches.

Let ignore_cache = mode & 0b011. If ignore_cache is greater than 0, the crawler will ignore cache in caches/disasters.pkl. If ignore_cache is greater than 1, the crawler will ignore cache in caches/disasters.pkl and caches/disasters/*. If ignore_cache is greater than 2, all caches will be ignored.

Example

See example.py for detail.

The data-processor module

Activate the environment

Usage

This module provides functionality for processing data from a data directory.

APIs
Detailed information

To use this module, first call set_data_dir() to set the data directory to be used by the processor. Then call process() with a dictionary option containing the following keys:

Example:

See example.py for details.

The data-visualiser module

Activate the environment

Usage

To run the example

The example shows a typical case which produce the return period - deaths & affected people graphs for floods and earthquakes in Albania and Pakistan. Data used from the past 15 years.

A typical process could be done in 3 steps:

  1. set data folder path

  2. plot graph(s)

  3. get table(s)

1. Set input data

To use default processed data:

Then you can get the available countries for analysis by calling visualiser.get_available_countries() after setting the data folder.

2. Plot graph(s)

API for plot exceedance curves:

Args:

3. Get table(s)

The tool also provide a function to extract key return period for all metrics defined and organised as a table. The table can be easily accessed by calling visualiser.get_exceedance_table():

Customising CIDAS

data-downloader

xml_downloader

The country information was obtained from the DesInventar Download page (https://www.desinventar.net/download.html). If you want to maintain the list of the countries, you need to manually go the webpage and inspect the hyperlinks to get its country code.

For example, for Comoros, the html tag is

Its country code is com.

The code containing country information is located in xml_downloader/_country_info.py. If DesInventar adds a country in the future, with name CountryName and country code ctn, then you need to append ctn to the list country_code and append CountryName to the list country_name.

csv_downloader

You can delete statement remove_empty_databases() in start_clean() function of _cleaner.py if you don't want to delete empty csv files (The contents of the files are not used).

Future development

After running __get_country_disaster_dict() in _csv_crawler.py, we have disaster types acquired from DesInventar. Therefore, there is no need to download the csv files with disaster type. However, changes need to be made for categoriser to adapt the new way of acquiring the disaster types rather than from disk.

categoriser

Categorisation information is stored in the _categorisations.py. If you want to move some subtypes to another major type, you need to modify this file.

record_converter

Currently, the record converter reads the entire xml file into the memory. Therefore, for large xml files like Sri Lanka.xml (1.2 GB), it may take more than 60 GB of RAM to process this file.

For future development, you may want to change it to parse the file element by element. Here is the information you may need:

data-processor:

When implementing the algorithm for merging the records to events, we referred THE HYBRID LOSS EXCEEDANCE CURVE. In section 4.2.1 Algorithm for grouping events together. The code related to the implementation is located in processor/_models/_event_builder.py and processor/_apps/_combiner.py.

Slice

The slicing algorithm is __slice_for_one_event() in processor/_apps/_slicer.py. Currently, we just slice out the first 5% of the events.

data-visualiser:

Add loss metrics

Currently, we only defined deaths and affected people (directly affected + indirectly affected). If you want to add more metrics, you can modify it at visualiser/_models/_loss.py.

Change folder to conduct analysis

In visualiser/_config.py, you can modify __SELECTED_FOLDER to the folder that you want to conduct analysis.

Change labels and highlight points

You can find relevant code in __add_label() method and __highlight() method for Plotter class

Add new data source

If you want to use another data source, you need to put the data source under the data directory and ensure the folder structure is:

For each csv file, the data should be parsed to contain these columns: deaths, directly_affected, indirectly_affected, start_date, and secondary_end.

For example:

deathsdirectly_affectedindirectly_affectedstart_datesecondary_end
01002001911-02-181911-02-21
5603001912-02-181912-02-21
31001001914-02-181914-02-21
102204001916-02-181916-02-21

Next, you need to add a member in visualiser/_adapters/_folders.py with value being the name of the data source folder.

Then, you need to modify __SELECTED_FOLDER in _config.py.

Note: you need to ignore or remove the labels after plot the curves if you are working with new data sources.

Troubleshooting