Project Requirements

Below are the requirements in a MoSCoW format:

Must Have:

Train Service Real Time Pattern Matching Algorithm
- Use data from GPS, real time train service data (TRUST), train schedule data and train diagrams
- Operate in real-time
- Return a list of units which have been identified as not matching their planned services
- Support only basic scenarios
Prototype Application that uses the algorithm to match Rolling Stock to Services in real time
- Display the output of the algorithm

Should Have:

The prototype application simulates a real-time environment
You can select any Rolling Stock, to get more accurate information
Track all Rolling Stock on a long-term to improve accuracy
Use additional information in the headcode of the Service to make predictions
Deploy system on Heroku

Could Have:

Predict a more accurate mileage record
Support more complex scenarios, discussed in under discoveries
Visualise the output data
Use additional data streams (from Network Rail data feeds) to generate more accurate results
Test system using real data instead of test data

Would Like:

Return the data in the same form as the diagrams and highlights where changes have happened
Compare the mileage record obtained from the algorithm to the milage record calculated from all GPS events
Handle situations where Rolling Stock go to the maintenance depot after running the Services

Currently, our Proof of Concept covers all of the Must Have requirements except for supporting a real-time environment and output a list of the units which have been identified as not matching their planned services. Instead, it returns the probability of a Rolling Stock running a Service. Therefore in the current state it still needs human assistance.

However we believe that we are on track to cover all the requirements. We are already working on improving the algorithm and si,ulating a real-time environment.

Use Cases

To describe the system and its requirements to a further extend, we will use some use cases.

Use Case: ReceiveReport
ID: 1
Brief description: The System received a real-time Report (Trust or GPS) from a Data Source.
Primary actor(s): Data Source
Secondary actor(s): None
Precondition(s): System is connected to Data Source using correct protocol (STOMP)
Main flow: The use case starts when the Data Source generates a Report and sends Report to the System. System identifies the type of Report and adds row to database. If the Report is a GPS Report: The System looks up the Rolling Stock associated with Report and the Service it is currently running. If the Report is a Trust Report: The System looks up the Service associated with the Report and the Rolling Stock the System thinks is running the Service. The System checks if the Rolling Stock and Service are still going in the same direction and if they still match. If they still match, System updates values of where the Rolling Stock and/or Service is. If they do not match anymore, the System looks for another Rolling Stock that is close by that might be running the Service now and another Service that the current Rolling Stock might be running.
Postcondition(s): Values updated
Alternative flows: None

Use Case: DownloadSchedule
ID: 2
Brief description: Every morning, the System downloads a new Schedule.
Primary actor(s): Time
Secondary actor(s): Network Rail Data Feed
Precondition(s): System is connected to Network Rail Data Feeds using the STOMP Protocol.
Main flow: The use case starts at 6 am every morning. The System will send a request to the Network Rail Data Feeds Server to download new Schedule. Server sends back the file containing the Schedule. System deletes the current Schedule from its database. System imports new Schedule to database.
Postcondition(s): New Schedule in Database
Alternative flows: None

Use Case: LoadHomePage
ID: 3
Brief description: Whenever a User visits the home page, the System displays all of the Units that have been identified as not matching their planned service.
Primary actor(s): User
Secondary actor(s): Browser
Precondition(s): None
Main flow: Use case starts when the User visits the Home Page. Browser makes a request to System to get the Home Page. System finds a list of Rolling Stock, which have been identified as not matching their planned Services. System renders an HTML page, together with the data System sends HTML to browser, which then in term displays it.
Postcondition(s): User can see current output of algorithm.
Alternative flows: None

Use Case: UserSelectsUnit
ID: 4
Brief description: The GUI allows the User to get more information on a Unit
Primary actor(s): User
Secondary actor(s): Browser
Precondition(s): None
Main flow: Use case starts when the User selects a Rolling Stock Browser makes a request to server to get the appropriate page Server fetches all GPS Data on Rolling Stock and Trust Data of Services it might have run. System renders HTML page with the correct data System sends HTML to browser, which then in term displays it.
Postcondition(s): User can see information on a particular Rolling Stock.
Alternative flows: None

Prototype

The first iteration of our prototype, implemented the statistical algorithm. Essentially, it matched the Trust Reports to the GPS Reports by comparing the Tiploc codes and Event Types within a certain time limit. Next, the algorithm checked if the Unit was supposed to run that service and gave a preference to those Units. Finally, we calculated how the percentage of how likely a given Rolling Stock ran the services.

Algorithms

The first algorithm we used was a simple statistical one, and it was used in last year’s project. It calculated how likely a Service was run by a particular Rolling Stock by calculating how many Trust Reports match with a GPS Report. Next, we combined that algorithm with our visualization to see how accurate it was. After careful evaluation, we concluded that it was not accurate enough and therefore we had to move on to a more complex version.

Statistical Algorithm (see demo)

Like the first visualization, the statistical algorithm uses D3.js in order to display the results. The interface looks similar as well. In the top left corner, you can select the gps_car_id of the Rolling Stock you want to analyze. Then the algorithm lists all of the GPS Reports in the first column.

Next, for all services, we compare all of the Trust Reports to the GPS Reports. Using that data, we can calculate how many Trust Reports match with a GPS Report and calculate the probability of a Rolling Stock running the service.

Finally, we display all of the Services with a high enough probability. Every column, which represents a service, lists all of the Trust events and again, a green circle is shown if the Reports match and a red circle if no match is found.

API Endpoints

Our prototype uses two main API endpoints to serve data, one for GPS and the other for TRUST data. These are used to fuel the data visualisations. They are located at /events/gps.json and /events/trust.json

UML

Database Model

7Vptb+I4EP41fLwTEN76sdDu3um6UrX0dHefkJuYxKqJc44pZX/9juMZEoeXhRK6dxIIITyxx/Y8M8+MDa1gsnj7rFmWfFERl61uO3prBXetbnc4CODTCtZO0O/0nCDWInKiTimYim8chW2ULkXEc6+jUUoakfnCUKUpD40nY1qrld9trqQ/a8ZimrEUTEMmt6V/icgkTjrqDkr5b1zECc3cGdy4J88sfIm1WqY4X6sbzIuXe7xgpAs3+tamjaOCNQpusEPGUm9J35RaeALN89J8uF3hWyRlr96kUqQvvoGelY64rnQK7gFdrRQost8WbxMuLcKEntP0ac/Tjfk0T3Ephwf0n+csDIfRkI3CYNjp/dJ3Gl6ZXOLenvQyN06amzUBla/EQrIUWuPELCQIO/B1rlIzxU5gz3GYCBk9sLVa2tXkBkCi1jhRWnyD/owGw2Nt0CUBcNAmpJwoqax9UuXm2gyaWmU4jYPikXZtlTnRA4Ol41KUlCzLxXOxONtlwXQs0rEypkDWdqJdfarMXHpSMEbjcG04RtwOixciNPdnrhbc6LX1OHJZdAaM0BvUsCrdvTPELknF1TchytCD4o3qEl/4ghAfCTfGVQXuVhd2aQeA5wS38Pk77Mt60YAtMjCBtNj5rcc/Nu3Ye0qtolckXuEriuwcCWdRCATmppkaLdKYOsNGvP57VIA3AIQzqUJmhIKAPVdTxAFJs9TnLArIEhZ0hoKc/1sx/UlD+Su44cysM9zAJGH6fSrEAlXcMcOfbOsENRkEUcqjWcZy8FarZQxZZJ+GGrVAbBX8UNJKbrR64TUq2MEOTIrYsrbkc6vBxqmA3HKLYqOsU+YZCwGTh6LPXa+UfMVgs6JVIgyfgtyuaQWpFmQK9M2lWoEkEVHEQeVYK8MMc5xiCSRTIjVFgPbH8IawmbR/7bf6sK8JtCHWqQ1v212biUphfwzGwTAOhLXilrSO45rd/L1NPkg2lEd/RDadEZYNjZINZtoK2UzDhEdLsN41vVwovQRUz/wQ8t4l8gutZleCuWx2Waa2GHs3B/8PktN7iDkU8xmU+Mes5crKzbFywXznszId5poNUVRaCdHPj9MrIzfFyP2Bz8gbaKtwE/n6cB9gZJzuKxzCWRrbDHrKfNTFO2CgF9JsTBquU+CYsT1V51s+ttnpO91u+6T5QZkhzvJZyPTsOBq8XKF99jmhUqlfOfziHO548iQS3xXVF6msO9ulNawwtleEFqYHKkGupN4QqQdEqoT+DdLZj+CncrxR+Ol+cYtLd1HMJSjVBpdVTsTNY7jUPI3MUgjY5FgdV15rkNdOL053OXb/ErzWxZ8zPMd2LgEQAJ85h/gTznpPCqrWLyzL/Dxa7Xalv8Zq2r5Pf71gdJyXEG026yXbJ5h9dwA/v568Uldz1OXo4Wzq6l2EuoY7nLIGPk+jW/ujKbRCCVflAlK1Y5BtccVD+Jswf1sMwLiu9Q8i4mbg0daPrLXAhlWopS7wPnBRCwuJ+T7bF7bdNn3F1sQRXpZAmeYSKtJXf5kHzrmP1tUqt5w1/glGNV5x28NR1d8/a4q67ZqioKbI2WBLURMn4C6aueoQgBylHlsPqVilTN6XUohBOIxzq9LivcMp6HvhEs5BUlhjxV9s0z61PRtyGLqpOOgx+wqNo33h3bFIoH6gpbueqRuMTeS0y1q6GAocxOwo6oB0vzcoe208AxH94l8rStScxsYCKEDebhZWK94L6/BSsNJ12H81gnb9UeCYbFY18B4eIoO7Rz2gxRIAAuaRa7jmggvJn2D1fVXGx6S6AR3uqdSu3wofm+oGtSuLgC6BPyDV0fHgHPc5HvLDaO65xvsYNLsjHEJojrBMPBXNze+69Ec/KmTORhOa5f/PXPfyr4bB/Xc=

Application Model

7Vxrb+K8Ev41SLuvtFVCuPVj6WXfy+6raunRnvOpchMDUUOck5jS7q/fcTJDboaGkkAvQQjI4Nix5/Ez4xknHet88fg1ZMH8u3C41+kazmPHuuh0u8OBBZ9K8JQI+mYvEcxC10lEZiqYuL84Cg2ULl2HR7mCUghPukFeaAvf57bMyVgYilW+2FR4+VYDNqMWU8HEZl5Z+tN15DyRjrqDVP4nd2dzatkcnCb/3DH7fhaKpY/tdbrWNH4lfy8Y1YUdfTSSQzr/CY97eBwwP3dFv4RY5AQhj9LRw966+QHx2UOuTc/17/PjcydCh4eZQtYlKDcUAipSvxaP59xTCiblJTVdbfh3PXoh9/FStp8wODXsoc1Yr2/1WHdqfVGFVRUPzFti5xJBJJ9IR3O58OCX2bHGWJiHkiMCNZcQi7D9r1wsuAyflAYQsjg4BFg8XKXaXxeZZzTfIyHDEZ2ta077Cz+wy1W7j5UervvWEGfs5v73R8ND9R8vP9P9y8cA5jl3Lh+gQ9+ZtOeuPyuNSbRyFx7z4WicGZ6p8OUEC6lj5rkzNatsqEqhfqxGzoXJf4Z/SBGAFJrwnG/sSSxVXyMJM5uOxnMRur+gWkZtwN+hRB4DlsiWmKgzQQwze5zM1mtSizozEX1jkRKoMrbwPBZE7t36ghcsnLn+WEgZT35ViHp65XreufBEMnmJa2qBxKifQ0TX0kyJnmZKmITeeqdECRJflJ0Ioh88EKGEps7guNMff72edPrw/8BTinLcB/g5Uz9VeRkuI1k840YJM+fAtWVOK4EMRjSGQAqwSIbinpMafBEDcJrRDIoIeB6fqhr0sIsCZgO2v8VlLnqp5AcOsRKt5q7kE5Cra1qBCQaZgPqmnliBZO46Docqx6GQTLIESQo2gXB91fML6Da8QVHnxkk/7vs5HAMg6Bjeqngoz4UP/WNwHpzGAaYrrqBaDWH6mV2GHGEM7etzELPQm6gVYUiA2zgXLGeBWxLVk48Qs83Oel+AtlQLpOgbhYMLGKoiGKwyGJSoqGSP3XHvWkSudIWqP0zKFpR/DP0OKlLIqAH1ImQKBAIOmT3/pDgDJkDXiKngc8IMExnGNkZLJJ5gzgWM+ics/CnDPKqiDK183sQrpYo7XXgbTshWUG31kwZsoea/fxepr5bFjs1iA4TvcyjvI8HUCnNcN2RgnuCwdZUadJV6o/zqwTTKRGcS+eV8pWbcZ93yIWEFmIV+nm6gFqCrmMT+gt6m7JEvuoGs5pw5Nqz+dZRZke9A+6CyW0/YLDFY+9bkcNCcXIb7XBREGuCC9qgg4v/XDWqVU7la49zKpwA7cD5n4cuqcBdYBZgqfqOOdqgmgEnjc+c2YBH6ymMIwbTmpXnzEnP4Lk6ynlsa8ZJNbF3DLXeEhtidWsNkLX6BDYpJfi/jokHYe7A3+WiNRUvuLCa0a3MyDTpMYHM/INDC/Bl0cd1en/BF9o3MVqY9HQQJqtQa88CL8IGLxipCGpWAuO7pC7GJAaocNtHGFbm4RJkQSri1WXhLZV/E+jVQ996WJ8P9LVc3vxSIIb4TWWtXvE1wNXVgG1dfeSy6hzJnQeCB7mIPrAp3l0ETs+VG1ajRZ0sJUYmYm7OLiyrowZiIgkoN/Nk3celF0W5cNWW1REyZ47NGIpvlyERJSz85/PxYOjIpBfEadESx74yO/oVVzxXnaMXapfVhshAjrODZJISFcKkXBmVCXWcV/o5o7Rr/qpp/wOgMznflLGyy7up80Huw3NCU3j1obX9ztj9hhb2zGb1GshkVMuhtOmNPDWuWQQfLZ1i6fNWmJcEjDIktY66hjEWFxKc69W4J9ucm5FxlONKUxjaiq5LGaGnocCA9ZjrCwsYzuuYObLXCQxiLuZgJn3mXqRRGGKIDXFWpxjcDCv7oyv9mfv9PFYGxhSMfrlT9ZZwY5ikJkv8NNTqJ4JqHsC6O/a247uTi1BVtd0+gA2IZxsjY5heA+zXjG1eKG9QUcg9c+of8FdSrhHJSyDg5+eN9rRd6hfWCaZTNL7nxja8XaH4dDvZ5yKv/asI2ssIrhXav7JF/AGjTZr2jQFvnWBbCFRN7zp2liiC3WYDmlsbDAiwAKOUQiW7T6joCWS8wdA5pKfoOLeN+FURG9uj6n/Vxfm/L9oj30lc7nl8cMH8DOeyX5G9td3oL2+irXEvrjdfnjSf8uFNCQDtH6Y6Jeudo2RNsowL7sfCAyJQUTAbyuUB1E0EBsggZ/ZKP9058oUEFo0d+T+O+UL9s8t7ZcA/Jr3wVw63L7+v2tRXc0f+Af3AjYFfKdxYEeTOYcUs3bHxrvdWaJm7hDqueWU7laFPjjWyR7CP7lpzVsi/ZhMe622aT1j2rzz1LOGTv/RqN5GwGuv0arXe2F+uQISLW0eyUI4+iaedsUI6cvDfnrOAt9Iwjegt0D3p+uHX3z2zIve9Mw2/Yja6qKVJwrZoa4h6PeqPlxgnUmwmY57NEZm+YDZmbJ4axThsVs0Tcd87UQwVSOgWJcpXqzSKRe7s11J5E2Y4Rah+Wd36382m9TircAWSdajZ/HGw+lddJ721ZWhxu3T13hzI0lGA/ZrIvQ1E27OiNXNhBnmGp9CkRe7MU2ont+cANTnZlkopPhQ4xdRYVQDymNV8rQSZNVojEWRirWD915Lnyo0SRKQ6SK6jt/oThx8jDF+5NGaB1P0aykmgiM+IQP5FuFI9G+zCRY23j1YFiwz7eRmCh38fbPk3kjQVzkun9Oh8nMmo34Dav4WM+UITuwWzmiSIV8/zqTDAXZNHImO30/JD2oSPvgeyqPnWEopD1zgVk0I+xBCL34ZXuiSRf64MoA2n4DSxIe/iUzwMtSOEwfYRnUjx9WKt1+Rs=

User Interface

The user interface of our application contains a map that visualises all of the reported data points and colours indicate whether the algorithm has found a match. The section to the right would list all of the units (of rolling stock) that have been identified as not matching their services. The algorithm parameter option allows the user to change the matching algorithms parameters like time and distance tolerance between Tiplocs. The user can select a particular mismatched service and the system will display details of the mismatched reports and it’s predictions.

A rough UI wireframe is shown below:

HCI Considerations

As the client did not have any explicit usability requirements (as the project itself is an algorithm rather than an application), we explored a few UI styles that may help make interpreting the output of the algorithm easier.

The map is a solution to a common problem that we have been experiencing with train services data. It helps the user see the whole picture, as to where the GPS or TRUST reports happened, and what the algorithm has decided. In odd situations where trains loop around the user is also able to see it graphically as opposed to a string of text that doesn’t tell you much.

The right section displays the most important information that the client needs in this project, which is which services aren’t being run by the planned rolling stock. The details pane below will initially be hidden, but once the user selects a particular mismatched service, the details of what the algorithm has found is displayed.

The button that is used to export the corrected diagrams is also easily accessible and the only button on the page to help emphasise the most important feature.

User Testing

We will provide a group of users with background and context and ask them to perform common tasks on the test application and then observe their behaviour. After this we will have a brief interview with them to gather feedback on specific points of the application.

Some open questions we would ask:

How easy was it trying to find the new corrected diagrams?
Do you feel like you could easily find mismatched services?
What do you feel about the representation of data points on the map?
Are you able to tell what is happening to the services as a whole at a particular time?

As our main target audience for this algorithm are people working in the train industry, our focus would be to test on people in this audience, the most important one being our client.

Testing Strategies

We have researched techniques and tools for testing which we plan to use once we start developing and experimenting with more complex algorithms. Our plan is to automate testing as much as possible, by connecting it to our version control system and testing on every change, so we detect errors early.

We plan to use Travis, which is popular, free and has easy integration with Github for continuous integration.

Below are some of our test plans for Term 2:

Unit Tests

We will have a test suite written for the main algorithm itself which includes:

The matching of station names and the geographical locations
The matching of the time within x minutes
Matching of event type
Matching headcodes with specific train types and rough destinations
Matching services to rolling stock with Genius Allocations

Functional Tests

Aside from the algorithm itself, we plan on using Selenium to automatically test the web application.

Once we do get the dataset that outlines what actually happened,that was generated manually by the TOC’s, for the sample data that we were given, we can also write tests that can take multiple data streams and results from historic data to test the accuracy of the algorithm.