Related Projects Review

What are the existing technologies that we could learn from?

Before starting a big project, it is always a good practice to look for existing solutions for sub-problems of the project. Here we present some key literature reviews that we have conducted on some aspects of our entire project.

How to form a dataset

Before starting the project, our client Professor Joseph showed us the HAM10000 dataset on Kaggle, which is a large collection of dermatoscopic images of pigmented lesions. We were also led to the nature paper[1] about this dataset and gain some more insight into the pipeline process to form a proper medical dataset.

The goal of creating datasets at this scale is for potential researchers to use classification algorithms to efficiently conduct research on their interested topics, and reduces the time wasted of manual analysis.

Through this pre-research on this sample dataset, we understood the difficulties that a clinician or even researcher in general could face when gathering, cleaning and formalizing big datasets, which requires various ground truth annotation and quality review by professionals and is very time consuming. We also found out the long term benefits of doing such practices and how it could be done faster using modern technologies.

In addition, we now understand the demand for the project that we are working on, which allows easy exchange of datasets and models for NHS clinicians and data scientists using an online platform.

Backend

Regarding the architecture of the backend, we have referred to IBM's Model Asset eXchange [2] which allows on-site predictions such as the MAX-Object-Detector-Web-App [3], that retrieves ROI objects and predict its class along with the confidence of the prediction. As you can see with the image below, user would also be able to adjust the confidence threshold of the predictions for more or less advanturous predictions.

With the design of MAX-Object-Detector-Web-App the user would access the web app through a web user interface which communicates with the FLASK REST API endpoints. The REST API would interact with a trained Tensorflow model. Both components are dockerized and served on IBM's public cloud.

After discussing with our client with this initial design, we realised that it does not fully match our client's requirements, as our project is more focused on exchange of datasets and models rather than creating classification models for images. Therefore we would like to develop on top the this design with more emphasis on exchanging actions.

Our backend would support uploading of datasets and models, and it will dockerize the classification models automatically when it's being uploaded. Uploaded datasets and models can be downloaded and the downloaded model should be a docker image that can be booted up locally for making predictions. Allowing for less stress on the server's compute power compared to the MAX-Object-Detector-Web-App, and therefore cheaper to scale up.

Frontend

There are existing web applications that have similar functionalities to our project, one of them that is worth mentioning is Kaggle. Kaggle is an "online community of data scientists and machine learning practitioners that allows users to find and publish data sets, explore and build models in a web-based data-science environment, work with other data scientists and machine learning engineers, and enter competitions to solve data science challenges."[4]

We believe that the UI of Kaggle is a great starting point for our UI design, which looks clean, professional and easy to use.

References

[1] P. Tschandl, C. Rosendahl, and H. Kittler, “The HAM10000 dataset, a large collection of multi-source dermatoscopic images of common pigmented skin lesions,” Nature News, 14-Aug-2018. [Online]. Available: https://www.nature.com/articles/sdata2018161. [Accessed: 05-Nov-2021].

[2] “Model Asset eXchange,” IBM Developer. [Online]. Available: https://developer.ibm.com/exchanges/models/. [Accessed: 05-Nov-2021].

[3] IBM, “IBM/MAX-object-detector-web-app: Create a web app to visually interact with objects detected using machine learning,” GitHub. [Online]. Available: https://github.com/IBM/MAX-Object-Detector-Web-App. [Accessed: 05-Nov-2021].

[4] “Kaggle,” Wikipedia, 04-Mar-2022. [Online]. Available: https://en.wikipedia.org/wiki/Kaggle. [Accessed: 05-Nov-2021].

-->