Bi-Weekly Report 4

November 27, 2015

Meeting 9

Date: 17/11/2014 Time: 14-15

This was our fifth meeting with our client since the project begun. As we had not met our client for two weeks due to the scenario week and reading week, we decided to start the meeting off by updating the client on our progress. This included a summary of the research we carried on existing programs for data analysis and cleaning, namely JASP, Apache Zeppelin and Google OpenRefine.

We then went on to discuss the initial set of system requirements which we had prioritised using the MoSCoW technique. The client was happy that we were able to capture most of the requirements however felt that it would be necessary to switch the prioritisation of some features from the should-have to the could-have categories and vice-versa, with that completed we were able to finalise the requirements of the system.

We concluded the meeting by discussing the basic architecture of our system. The client suggested we use Pandas for developing the backend of our system, the reason being it provides data structures and data-analysis tools. We then decided upon the tasks we would add to our new week long sprint, these include:

Design wireframes of potential UI/UX
Investigate Pandas
Outline the architecture of the system
Investigate graph libraries to minimise the front-end UI build
Map out the Python modules for backend-features
Review Seldon 0.99 Python pipelines

Meeting 10

Date: 19/10/2015 Time: 11-13

We carried out this meeting during our lab session. We spent most of our time developing a user friendly UI for our program. initially we thought this would this would be quiet simple, however it proved to be quiet challenging as we all had different ideas. We also wanted our UI to be very simple to use for a first time user however at the same time incorporate a lot of complex functionality, therefore finding the right balance in the UI proved to be quiet difficult. After many sketches we decided to settle for an interface consisting of three static tabs for cleaning, analysing and visualising the data. With the time we had left over we were able to briefly design the architecture of our system as well as decide upon what we would use to build the frontend. We therefore decided to use Flask as our web framework and AngularJS to create our frontend. The reason we chose to use AngularJS is because it allows you to create more expressive, responsive and dynamic environments which is not achievable using HTML on its own.

Meeting 11

Date: 24/11/2015 Time: 14-15

This was a meeting we had with our client. We started off this meeting by showing the client a HTML5 simulation we created of the UI, this simulation allowed the user to click through the different pages of the interface as well as toggle certain features which in turn display a mock-up of the desired effect. The client seemed fairly happy with the prototype and felt that it was sufficient for the time being.

We then proceeded to describe to our client a description of the architecture we decided upon. As stated earlier we decided to use Pandas for the development of our backend and AngularJS to create our frontend web-app. We chose to use Flask as our web frame-work as it somewhat simpler to use as compared to alternatives such as Django. Between our Flask web frame-work and our Pandas backend we are considering the possibility of using Celery which is an asynchronous task/que job for distributed message passing. Finally we decided to use Socket.io which is a JavaScript library for real- time web apps, which allows bi-directional event based communication thus allowing us to push data to our web-app instantaneously. The client was quiet happy with our architecture, he suggested that in the coming week we build a skeleton of our architecture and deploy it into a Virtual Machine.

Finally we showed our client the current work we had done on Pandas, and how we attempted to use Pandas to carry out data cleaning on a messy data-set. This included data normalisation and standardisation, identifying outliers, identifying similar words and clustering data. Once again the client was quiet impressed with this work, and was happy with the progress we had made.

We concluded the meeting with deciding upon a set of tasks for our next sprint, these included:

Investigating Flask
Build skeleton infrastructure
Deploy skeleton into VM
First GitHub commit
Investigate graphing libraries
Map out the python modules for the backend

Shivam Dhall

Over the course of the last two weeks, my teammates and I set out on designing the user interface of our web-app. We first familiarized ourselves with UI’s of programs which provide similar functionality, we did this as we thought it would enable us to design a good UI by incorporating the best UI features from the different programs we investigated. Once we had the initial sketches, I was tasked with building a prototype of our UI. I did this using a prototyping tool called ForeUI, the reason I chose to use this tool is because it allows a user to define behaviours within the prototype using a combination of intuitive flow charts consisting of events, conditional branching and actions. I was then able to export the prototype into a working HTML5 simulation.

I have also been working actively on improving my existing knowledge on Pandas, I am doing this by watching online tutorials as well as using some of the functions within the Pandas library to carry out some simple data-cleaning features on a messy data-set. Having finalized our architecture last week, I am trying to learn about Flask and AngularJS which we will be using as our web-framework and front-end respectively.

Bandi Enkh-Amgalan

During the last two weeks, we made significant progress as a group in the planning and design stage of the project. My work focused more on researching and planning the architecture. I came up with the rough outline of an architecture with a client-server model where the user interacts with a web front-end that connects to a back-end web server that uses Apache Spark. However, our client advised us to use Pandas instead of Spark for a number of reasons, and with their feedback, I created a finalized first iteration of architecture design. In summary, it will use Angular.js as a front- end that connects via HTTP + WebSocket to a Flask backend as well as an iPython kernel + notebook server, which both in turn depend on our custom data cleaning package developed on top of Pandas in Python. They seemed happy with the proposed architecture when we presented it last Tuesday, so I spent the remainder of the week researching the components, particularly Pandas and socketio (a server + client-side WebSocket library). I also managed to build a working skeleton that contains all the components outlined in the architecture design.

Gordon Cheng

During the first week of these two weeks, I worked on the user interface for our system with my teammates. After sketching some early ideas in our meeting session on Thursday, I was assigned to produce the initial wireframes of these sketches. I produced a total of three wireframes, one for each page we worked on (ie. cleaning, visualisation, analysis). These wireframes shows the overall layout of the pages, including the placements of the toolboxes, options, and navigation. This wireframe helped with the creation of the first prototype of the user interface, which was done by Shivam. In the second week, I focused on familiarising with Python and Pandas, an open-source data analysis library. I experimented with Pandas by trying to implement a version of some data cleaning features using the functions that Pandas provides. These features included detecting outliers, dealing with missing values, data normalisation and standardisation, and grouping multiple text representation of the same entity. These experiments were quite successful and I was able to produce the first set of prototypes of these back-end features. As the editor of the team, I also continued to update our team’s project website; I uploaded both the UI and back-end features prototypes onto the website along with the full set of MoSCow requirements list.