OWidgets

This is a summary of the major achievements of our project:

Web application-based UI which can be accessed through any web browser
Streamlined data collection system which can be remotely controlled through the UI
Automatic feature extraction
Machine learning model which can distinguish between 4 scents and air at up to 80% accuracy
UI to create and train models with customisable parameters
Identification of scents in a file using any user-created model

Requirement	Priority	Completed?	Contributors
ML system to identify gas based on pattern of change in resistance across different heater temperatures, at different concentration levels	Must	✓	Stanley
Data-synthesis system which collects data to establish unique fingerprints for different scents through feature extraction pipeline	Must	✓	Stanley
Interface for interacting with olfactometer, reading from sensor, and using ML algorithm for identification	Must	✓	Charlotte, Divyanshu
Feature extraction through different wavelengths of gases recorded*	Should	✓	Divyanshu, Charlotte
Scents database to uniquely identify and store scents	Should	✓	Charlotte
Support vector machines to identify gases	Could	✗	N/A
ML algorithm to recognize and identify scents in real time	Could	✗	N/A

Percentage of key functionalities (MUST & SHOULD) completed: 100%

Percentage of optional functionalities (COULD) completed: 0%

*Note: After testing various wavelengths, we found a better case where the algorithm can learn the representation of environmental readings when exposed to scent stimuli and that is more generally applicable for all use cases than hand crafted features. After a discussion with our client, we agreed to focus on implementing that case instead.

ID	Bug description	Priority
1	Stopping the data collection cycle from UI does not account for time taken for all threads to be killed before UI considers system to have been stopped.	Low
2	Model training occasionally underestimates time left for training duration in loading progress message	Low
3	Relevance table may be set to None when all features are not relevant to the set number of classes	Low

Task	Stanley	Divyanshu	Charlotte
Client liason	30	10	60
Requirements analysis	33	33	34
HCI	20	40	40
Pitch presentations	33	33	33
Research	50	25	25
Coding	40	30	30
Testing	60	20	20
Project website	33	34	33
Blog	0	50	50
Video editing	0	0	100
Overall	34	33	33
Main roles	Data synthesis system, research, data analysis, machine learning	Frontend development, systems integration, website and blog	Frontend development, database development, video editor

Functionality
The functionality of the entire package is sound and offers a generalised process for any scent identification task. We also save the necessary attributes to add additional training to existing model or conduct further analysis, using the version of the model which had the best test accuracy in the previous session. We also store only the necessary attributes to pick up the task from the previous session, while not saving the less important attributes to ensure the saves do not overload the lightweight hardware’s storage. We recognise that utilising better hardware can offer speedup in training while potentially yielding better performance as it can leverage more data. However, constrained by the cost of better hardware, we have delivered on the requirements demonstrated to a sufficient degree the feasibility of the automated system where clients may take the project further with appropriate hardware if they find use cases which deem it necessary.

Stability
From our testing we observe that test accuracy remains stable across sessions when given sufficient amount of data, where consistently low test accuracy can always be resolved by collecting more high quality data. Constrained by time, we have yet to experiment in more diverse set of environments to ensure this test accuracy can hold up in environments which may have higher variability in sensor readings or sees large changes in environmental temperature and humidity and have more air flow, especially given our constraint of using a single MOS sensor as opposed to an array of sensors. However, throughout our testing in indoors environments with air conditioning, the test accuracy has held up well despite being near windows which is particularly prone to temperature changes when heaters are on when the building is occupied and when the building is vacant. We do note that Raspberry Pi can sometimes shut off without warning, and is hence inaccessible remotely when left on for months.

Maintainability
Our code is extensively documented with docstrings describing the role of every function and its parameters and how they interact with other function in the class, as well as the prerequisites which are not apparent from the function itself. We have also written the code to be modular where each function can be taken out and tested in isolation in Jupyter Notebook, which is particularly useful if an extension is necessary at a later date. Where function docstrings is not sufficient, we describe the roles of each component and how they interact with each other in the documentation and implementation section of this website.

Project management
Each step of this project is extensively documented in our shared workspace which we use to delegate tasks, organise and share information, and track progress and set deadlines. We would also like to express gratitude to our clients and the team at OWidgets for their frequent, timely, and detailed feedback, who had tremendous patience and shown us good spirit of collaboration.

As our time working on this project was limited, there are some features which we believe would improve the project further, which we were unable to implement due to time constraints.

Experimenting with larger datasets
Data collection for this project was time consuming. Thus, due to time constraints, we worked with a relatively small dataset. We managed to achieve relatively high accuracies with the data we collected, but for future developers, collecting more data and building a model from a larger dataset could produce even more accurate results.

Live identification
Our project currently only allows for identification of scents from data stored in a file. In future work, it would be beneficial to implement a feature which allows for scents to be identified in real time without the data having to be stored in a file, which could be useful in other practical applications as well.

Hyperparameter search
Due to time constraints, we did not test a wide range of hyperparameters. Future work could involve testing a wider range of various hyperparameters to see if the model can be improved further in order to find the best performing model. This could also involve training the model to recognise the concentration of a single scent.

More heater profiles
Due to time constraints, we were not able to experiment with many heater profiles for the BME680 sensor, and used an existing heater profile from Bosche. Future work could involve experimenting with different heater profiles to see if a profile yielding better data for training a model can be found.

Multi-scent identification
Currently, our system can only identify one scent at a time. Future developers could build on our project to allow for the identification of a mixture of scents, as well as the concentrations of each scent in the mixture.