Back-End Features Development
This page includes all experiments and prototypes we have done of the various back-end features and details important decisions made while developing these features. Click the buttons below to jump to the corresponding section.
1. Prototypes 2. Datasets Used
Prototypes
Since we established that the system will use Pandas, an open-source Python data analysis library, to analyse the data, we decided to experiment with Pandas by attempting to implement some data cleaning features. This gave us the chance to familiarise with the various features of Pandas and also get glimpses of the problems and issues we may run into when developing these back-end features. We used iPython Notebook as the interface for experimenting with Pandas. Below are the prototypes we produced presented as iPython notebooks.
- Missing Values - This prototype focuses on dealing with missing values in a dataset and demonstrates Pandas features such as interpolation.View Notebook
- Data Normalisation and Standardisation - This prototype focuses on normalising and standardising data in a dataset.View Notebook
- Detecting Outliers - This prototype focuses on detecting potential outliers in a dataset using various metrics for center and dispersion.View Notebook
- Grouping Alternative Representations of the Same Entity - This prototype focuses on grouping multiple representations of the same entity in a dataset using the method implemented by OpenRefine.View Notebook
Datasets Used for Experimentation
Various datasets from the internet were used for experimentation during this project. Datasets used within a prototype are credited in the prototype. Below is (in no particular order) a list of datasets used during the course of this project.