Clinical Tabular Data QnA Bot

Related Projects Review

General Chatbots

Chatbot applications allow the interaction between humans and technology. They have evolved outside of the traditional question and answering logic into automation and self-service and have been utilised in all areas from business, customer support and service to also an organisation's operational efficiency.

Companies are taking on these tools to improve all aspects of their business, and increased usage is inevitable. By 2020, 85% of interactions with customers are expected to be performed without human agents and about 40% of millennials say they chat with Chatbots on a daily basis. [1]

There are a number of successful chatbots developed by established companies such as Google, Apple, Microsoft and Meta with 300,000 chatbots on Facebook Messenger alone [1]. With this vast number of existing chatbots, we can learn UI designs and technical solutions from them.

There are two types of chatbots:

Rule-Based Chatbot

They can hold conversations with ‘if/then’ logic without understanding the context or intent. These are the most common are users will interact with on a daily basis. [1]

AI-Powered Chatbots

Based on NLP (Natural Language Processing) and ML (Machine Learning) algorithms. They learn as they go on. [1]

We chose to make our bot AI-powered, TaPas is our pre-trained model on 6.2 million tables where it learns to answer natural language questions from semi-structured tables using semantic parsing. With TaPas embedded our Bot will be able to recognise the intents and entities of a question and learn from its answer.

Medical Chatbots

To cater a general chat bot to the medical domain, the bot must familiarise itself with medical terminology and knowledge. Training the model with medical dialog is likely to yield better results. Researching through some medical existing chatbots some use datasets while others use external APIs in-order to get data. In-order to build chatbots, some use tools like Dialogflow or LINE/Telegram channels (Android or Web), while some hand-code the bot themselves using languages like Java, Python etc. [2]

Algorithms Review

TaPas Table Parsing Algorithm

Research groups from all around the globe have been working on tabular data processing for a couple of years now. During this time successful semantic parsing and question answering models have been developed. TaPas, the model we have chosen to work with on our project, performs the best out of all available models. Its architecture is extended from the BERT model [3], which only supports text and paragraphs, where additional embeddings are added to TaPas so that information of the data's tabular structure are kept. By doing so, the model could easily determine the relationship between rows and columns, therefore making TaPas compatible with tabular data.

Various datasets such as SQA, WikiSQL and WTQ are prepared in different sizes to tune the model for different purposes. On the conversational Sequential Question Answering, TaPas improves state of-the-art accuracy from 55.1 to 67.2, and achieves on par performance on WIKISQL and WikiTableQuestions (WTQ) datasets. [3] As of present, TaPas is the newest and best available tool that suits our goal, which is processing questions and retrieving specific answers from various tabular data.

There are various versions of pretrained TaPas models available on Hugging Face which are respectively trained on different kinds and sizes of datasets, for instance, tiny-tapas-random-wtq, means the tapas was pretrained on a tiny dataset based on WTQ. Since our project puts a focus on real life type of conversation, it is important that a user can ask questions based on previous questions and the answers provided by the model. Moreover, we allow users to ask aggregation questions that involve for example sum, average or counting. Therefore, we chose the TaPas model that had been pretrained on a large WTQ dataset.

In order to further explore the TaPas model with better accuracy, we carried out research and experiments to test possible approaches. They are listed below:

Google default WTQ-tuned vs. Custom fine-tuned

As the current TaPas models available are all in the general domain, we attempted fine-tuning with medical data to test if the accuracy of the replies could be further raised if the TaPas is tuned to be in the clinical domain. To begin, we generated around 1000 csv documents with medical tabular data. Then we created pre-training data with a few dozens of questions and answers retrieved from the relevant generated documents.
Unfortunately, the custom-tuned model did not perform well as the accuracy is lower than the default models. After analysing the outcomes, we concluded that the main reasons are the lack of data and question-and-answer pairs for the model to learn new patterns during fine-tuning, and insufficient reliable medical data reports due to its private and confidential nature, which makes it hard to obtain.

TaPas vs. Clinical BERT

We found out that it might be possible to integrate and convert clinical BERT to TaPas after some research. This is due to the fact that the architecture of TaPas is extended based on BERT’s architecture. It might be possible to integrate or feed a pre-trained BERT model into TaPas to generate a new version.
Clinical BERT is a BERT model trained for the clinical domain by using all note types of electronic records from ICU patients. It is initialised from the standard BERT-Base model from Google, then trained and tuned with the data and notes prepared. [4]
However, due to the limited resources and information, the experiment was not successful as the clinical BERT embeddings and configurations are different from the TaPas version. This is due to the difference in the structure of TaPas, where its embeddings resemble a table whereas clinical BERT's embeddings are designed to support text and paragraphs. The major difference for table and text tokenizing and embedding might be the obstacle for the integration and conversion.

In a nutshell, we stick to the Google-improved TaPas model, pre-trained on large WTQ datasets following the failed experiments. We chose this version of the model as it has the highest accuracy among its variations. [5] Furthermore, we chose WTQ over SQA as aggregation operator is supported in the WTQ-tuned version. This allows the users to ask more types of questions.

Possible future work for exploring the TaPas model in the clinical domain can be found in the Evaluation section.

Frameworks and API Review

Rasa Open Source

Before deciding on the final bot framework to use, we research on the available frameworks in order to decide on the best one to use. We shortlisted two frameworks, which are Rasa Open Source and Microsoft Bot Framework, then further compared their pros and cons. For easier comparison and understandings, we present the information in a table as shown below:

From the above information collected, our preferences leaned to Rasa Open Source due to the major advantage of NLP/NLU internal support. Furthermore, Rasa chatbot could be integrated into Microsoft Bot Framework if needed but this could not be done the other way round. Using Rasa Open Source allows the project to be more independent as an Azure account is not needed and NLP/NLU support does not need to be added in through external resources.

Django

To implement post-processing for enhancing the returned answer of the bot, a database is required to store information of lab tests and their respective acceptable range value as well as test comments. We chose Django as our database framework due to the following reasons:

Supports different databases and data migrations

There are several choices of databases such as PostgreSQL, MariaDB, MySQL, Oracle and SQLite [9] provided in Django to suit the developers’ preferences. It could be configured accordingly to suit the projects’ needs. Furthermore, data migration is available if the user wants to move data from another database or a collection of semi-structured data. Once the files are prepared, a few simple commands which are configured as SQL equivalent could be executed easily for migrating external data into the internal database. [10]

Built-in administrator support

Django is friendly to non-technical people as it provides administrator page support, which is a built-in user interface for administrative purposes such as viewing all objects, adding, editing and deleting. [11] This feature is important as the application is meant to be medical staff-friendly, who might not be tech-savvy.

Flexibility

In the case where additional information has to be kept for advanced post-processing or other processing pipelines, Django is convenient for creating new sub-databases within the same project and only one server is required. [12]

REST API

As our chatbot is made available on the browser, the user questions and predicted answers are transported between frontend and backend through HTTP and network. In order to achieve this, we implemented the REST API, which is a set of rules that conforms to the representational state transfer architecture style. Some conventions include using GET request for retrieving records, POST request to create a record, PUT request to update an edited record, and DELETE request to remove one [13]. This allows a uniform interface as requests made on the endpoint work the same every time.

Moreover, REST API promotes client-server decoupling, which is the independence of clients and servers. This means that clients could only make requests to servers for data retrieval, without modifying or interacting with the servers’ internal mechanisms. The same applies to the servers where they should only return the requested data without interacting directly with the clients.

Research

Related Projects Review

General Chatbots

Medical Chatbots

Algorithms Review

TaPas Table Parsing Algorithm

Frameworks and API Review

Rasa Open Source

Django

REST API

Languages and Libraries Review

Python

Scalable Minds Chatroom UI

Summary of Technical Decisions

References