Bi-Weekly Report 1

October 16, 2015

Meeting 1

6/10/2015 Time 14-15

Considering the fact that this was the first time our group members had met each other, we thought it would be a good idea to spend most of our time getting to know each other in terms of our skills, strengths and weaknesses. We also used this meeting to assign team roles between ourselves. They are as follows:

Team Leader and Project Manager: Shivam Dhall
Chief Editor / Documentation lead: Gordon Cheng
Chief Researcher: Shivam Dhall
Technical Lead: Bandi Enkh-Amgalan
Client Liaison: Bandi Enkh-Amgalan
Deputy Group Manager: Gordon Cheng

We chose our team roles based on experience with programming, handling of projects such as this, as well as our familiarity with the various parts of the design and development cycle, this ensured that everyone was happy and comfortable with their assigned roles. Once this was completed we contacted our client to schedule our first meeting which was on 9/10/2015. In the meantime we all decided that it would be a good idea to read up on the company and understand a little bit about what they do.

Meeting 2

Date 9/10/2015 Time 16-17

This was our first meeting with our client, Seldon. We met with Alex Housley, Clive Cox and Gurminder Sunner, who gave us an in-depth introduction to the company. This included a descriptive explanation of what Seldon is, how it worked, how it is to be used and what is expected of it. We were then told what our project would be and how it would be incorporated and used within the already existing platform.

Our project objective is to investigate and implement improvements to the data collection and analysis phase that the company carry out at the start of most projects. The implementation should also include a functional web-based UI which would allow for data loading, data analysis and visualisation.

During this meeting we were also able to identify important resources and tools which the company currently use for data collection, analysis and processing these included Pandas, JASP, and Apache Zeppelin to name a few, while iPython could be used to help create a UI which would allow for easy data visualisation.

Following the meeting, we had gathered all the information we could regarding the program, and how we were to develop it. We schedueled our next meeting on 13/10/2015. Before this meeting we all decided that it would be beneficial for us to install the Seldon virtual machine and test it using an online demo to help enhance our understanding of how it worked and was expected of it. We also downloaded some sample data to give us an idea of what we would be dealing with.

Meeting 3

Date 13/10/15 Time 15-17

This was our second meeting with our client which we carried out over skype. During this meeting the client helped reinforce our understanding of the project as well as answer some questions we had lined up. The client also gave us an insight as to how we should manage the project using an agile approach. We therefore agreed to have weekly sprints followed by a weekly sprint review meeting where we could reflect on our current progress and plan for the next sprint.

During the meeting we were also able to set up an account on JIRA (a software that allows for flexible issues and project tracking) and populate our first sprint. Collectively we decided upon a number of tasks which we thought would be appropriate to add, these included

investigate ML ETL features on Amazon and Azure
Research Data analysis methods
initial data analysis test in Apache Zeppelin and JASP
Research Kaggle Knowledge (contains some sample data)
Research data cleaning methods
Create a requirements questionnaire for Seldon and Seldon’s users.

Rather than splitting up these tasks, we thought it would be a good idea for all of us to research on them as this would provide all of us with a greater understanding of what is required, at the same time this knowledge would also act as the building blocks for the rest of this project. We schedueled our next client meeting on the 20/10/15.

In the following week we hope to have finished all of the tasks that are stated above as well as carry out some additional research that could be of help and possibly facilitate us in designing and creating the best possible system. We are also aiming to receive a fully detailed set of client requirements for the system (a result of a questionnaire we are currently preparing). Finally we are hoping to simultaneously improve on our existing website and start to populate it with documentation of our project.

Gordon Cheng

During the first week, other group members and I worked on setting up the group and the project. This included assigning roles to each member of the group. As I was assigned as the Chief Editor, who is responsible for the documentation of the group, I was tasked to create a template for our group’s website. By the end of the first week, I have completed a simple one-column website template for our group, written in pure HTML and CSS, ready to have content added to it. Also in the end of the first week, our group had a meeting with our client at Seldon; afterwards, I looked through the documentation on Seldon’s website to familiarifse with the work Seldon does which helped me better understand the problem we will be trying to solve. In the second week, after a second meeting with our client, I did some research on ETL features on existing cloud computing platforms including Microsoft Azure and Amazon Web Services. I also researched on some data analysis methods to get a better sense of the kinds of issues we may be dealing with during the development of the system.

Shivam Dhall

Over the course of the first week, my team mates and I spent a vast amount of time researching on the company and acquainting ourselves with the open source machine learning platform known as Seldon. I did this by reading some of the documentation provided on their website; subsequently I decided to download the Seldon virtual machine and follow the steps of a demo on their website to receive a multi class prediction. Following our second meeting with the client, I have been busty researching on Machine learning ETL features on Amazon Web Services and Microsoft Azure which our client pointed out would be particularly useful. I have also downloaded some sample data available on Kaggle and have investigated a few data analysis and cleaning methods. Finally I have started to familiarise myself with the Python programming language as it will be required when visualising data using the iPython command shell.

Bandi Enkh-Amgalan

In the last two weeks, we primarily focused on familiarizing ourselves with Seldon’s machine learning platform as well as conducting research into the problem. I personally worked on installing, configuring and running my own local instance of Seldon’s prediction engine by referring to the online documentation. However, as I understood that our final product is a tool to be used externally alongside Seldon, I focused more on researching data analysis tools and methods, starting with the ipython and pandas links given to us by the clients. Our clients recommended us to do perform data cleaning and analysis for ourselves on some data from a particular Kaggle competition they specified so that we may gain a first-hand insight into the domain of our problem. However, I was not very successful in doing this as I tried to code the data loader and analyzer myself in Python, so I will be resorting to using the existing tools the clients mentioned such as Apache Zeppelin.