Summary of achievements
This is a summary of the major achievements of our project:
  • Web application-based UI which can be accessed through any web browser
  • Streamlined data collection system which can be remotely controlled through the UI
  • Automatic feature extraction
  • Machine learning model which can distinguish between 4 scents and air at up to 80% accuracy
  • UI to create and train models with customisable parameters
  • Identification of scents in a file using any user-created model

MoSCoW Table
Requirement Priority Completed? Contributors
ML system to identify gas based on pattern of change in resistance across different heater temperatures, at different concentration levels Must Stanley
Data-synthesis system which collects data to establish unique fingerprints for different scents through feature extraction pipeline Must Stanley
Interface for interacting with olfactometer, reading from sensor, and using ML algorithm for identification Must Charlotte, Divyanshu
Feature extraction through different wavelengths of gases recorded* Should Divyanshu, Charlotte
Scents database to uniquely identify and store scents Should Charlotte
Support vector machines to identify gases Could N/A
ML algorithm to recognize and identify scents in real time Could N/A

Percentage of key functionalities (MUST & SHOULD) completed: 100%

Percentage of optional functionalities (COULD) completed: 0%

*Note: After testing various wavelengths, we found a better case where the algorithm can learn the representation of environmental readings when exposed to scent stimuli and that is more generally applicable for all use cases than hand crafted features. After a discussion with our client, we agreed to focus on implementing that case instead.

Known bugs
ID Bug description Priority
1 Stopping the data collection cycle from UI does not account for time taken for all threads to be killed before UI considers system to have been stopped. Low
2 Model training occasionally underestimates time left for training duration in loading progress message Low
3 Relevance table may be set to None when all features are not relevant to the set number of classes Low


Individual contribution
Task Stanley Divyanshu Charlotte
Client liason 30 10 60
Requirements analysis 33 33 34
HCI 20 40 40
Pitch presentations 33 33 33
Research 50 25 25
Coding 40 30 30
Testing 60 20 20
Project website 33 34 33
Blog 0 50 50
Video editing 0 0 100
Overall 34 33 33
Main roles Data synthesis system, research, data analysis, machine learning Frontend development, systems integration, website and blog Frontend development, database development, video editor


Evaluation
Functionality
The functionality of the entire package is sound and offers a generalised process for any scent identification task. We also save the necessary attributes to add additional training to existing model or conduct further analysis, using the version of the model which had the best test accuracy in the previous session. We also store only the necessary attributes to pick up the task from the previous session, while not saving the less important attributes to ensure the saves do not overload the lightweight hardware’s storage. We recognise that utilising better hardware can offer speedup in training while potentially yielding better performance as it can leverage more data. However, constrained by the cost of better hardware, we have delivered on the requirements demonstrated to a sufficient degree the feasibility of the automated system where clients may take the project further with appropriate hardware if they find use cases which deem it necessary.

Stability
From our testing we observe that test accuracy remains stable across sessions when given sufficient amount of data, where consistently low test accuracy can always be resolved by collecting more high quality data. Constrained by time, we have yet to experiment in more diverse set of environments to ensure this test accuracy can hold up in environments which may have higher variability in sensor readings or sees large changes in environmental temperature and humidity and have more air flow, especially given our constraint of using a single MOS sensor as opposed to an array of sensors. However, throughout our testing in indoors environments with air conditioning, the test accuracy has held up well despite being near windows which is particularly prone to temperature changes when heaters are on when the building is occupied and when the building is vacant. We do note that Raspberry Pi can sometimes shut off without warning, and is hence inaccessible remotely when left on for months.

Maintainability
Our code is extensively documented with docstrings describing the role of every function and its parameters and how they interact with other function in the class, as well as the prerequisites which are not apparent from the function itself. We have also written the code to be modular where each function can be taken out and tested in isolation in Jupyter Notebook, which is particularly useful if an extension is necessary at a later date. Where function docstrings is not sufficient, we describe the roles of each component and how they interact with each other in the documentation and implementation section of this website.

Project management
Each step of this project is extensively documented in our shared workspace which we use to delegate tasks, organise and share information, and track progress and set deadlines. We would also like to express gratitude to our clients and the team at OWidgets for their frequent, timely, and detailed feedback, who had tremendous patience and shown us good spirit of collaboration.

Future Work
As our time working on this project was limited, there are some features which we believe would improve the project further, which we were unable to implement due to time constraints.

Experimenting with larger datasets
Data collection for this project was time consuming. Thus, due to time constraints, we worked with a relatively small dataset. We managed to achieve relatively high accuracies with the data we collected, but for future developers, collecting more data and building a model from a larger dataset could produce even more accurate results.

Live identification
Our project currently only allows for identification of scents from data stored in a file. In future work, it would be beneficial to implement a feature which allows for scents to be identified in real time without the data having to be stored in a file, which could be useful in other practical applications as well.

Hyperparameter search
Due to time constraints, we did not test a wide range of hyperparameters. Future work could involve testing a wider range of various hyperparameters to see if the model can be improved further in order to find the best performing model. This could also involve training the model to recognise the concentration of a single scent.

More heater profiles
Due to time constraints, we were not able to experiment with many heater profiles for the BME680 sensor, and used an existing heater profile from Bosche. Future work could involve experimenting with different heater profiles to see if a profile yielding better data for training a model can be found.

Multi-scent identification
Currently, our system can only identify one scent at a time. Future developers could build on our project to allow for the identification of a mixture of scents, as well as the concentrations of each scent in the mixture.