Research

Related Project Review

Project Name
HeartBot - A Key Statistics Chatbot for The BHF

Main Features
Speed up answering questions by looking at FAQs data, Data Retrieval that filters BHF's compendium based on the query and outputs the appropriate table of results, and can be integrated into any website due to HeartBot being a web app.

What could we learn from this application?
HeartBot's ease of use and intuitiveness are the key focus points of the product. Even users with limited technical experience and knowledge can easily use this highly advanced and functional application. HeartBot is also easily integrated into any website, thanks to it being a web application. A further feature that sets the chatbot apart is its Data Retrieval which connects with the FAQs feature. These two features are using advanced Python libraries such as FuzzyWuzzy, NLTK, and pandas. On the Data Retrieval implementation, they used a specific formula called Levenshtein Formula to calculate the distance between two strings and make out whether they are similar enough. This complex yet impressive technology stack makes this chatbot very useful, especially for BHF website users.

Website Link: http://students.cs.ucl.ac.uk/2021/group20/index.html

Project Research & Technology Review

General Solution

After analysing the three options for developing a chatbot generation service for the NHS, we opted for the third option since it has significant benefits over the other two.

Due to the short development period of four months, the original approach of creating a chatbot framework from scratch was discarded. This time limit is insufficient for us to create both a chatbot framework and a chatbot generation service from scratch. Furthermore, designing, testing, and maintaining a bespoke chatbot framework necessitates a substantial amount of time and effort.

The second solution was using an existing chatbot framework provided by IBM Watson or Microsoft Azure and allowing our users to generate and manage a bot directly in our service without creating an extra account for an external cloud service. This solution was also rejected as it would require us to develop an account management system and a subscription system to allow the NHS trusts to register an account and pay a subscription fee. This approach would require a substantial investment in time and resources for ensuring account security and money transferring security.

The third and most preferred method was to utilise an existing chatbot framework and allow users to construct a chatbot file that could be published to those cloud services using their own accounts. This method was finally chosen because it allowed us to concentrate on developing a simple service that delivers dependable chatbots at a minimal cost. We may use their experience and development resources by utilising an existing chatbot framework supplied by IBM Watson or Microsoft Azure. Furthermore, enabling users to submit their chatbot files to their own cloud accounts relieves account management and subscription systems of the effort. This method is an efficient and successful way of providing a service for generating chatbots for NHS trust websites.


Bot Framework

For the bot framework, we were choosing between IBM Watson Assistant and Microsoft Azure Health Bot. After researching and trying both chatbot frameworks, we found out that Watson Assistant was a better fit for our NHS Chatbot Generation Service.

One major advantage is the affordable price. For a relatively large hospital like Great Ormond Street Hospital (GOSH), which receives 280,000 patients annually [1], there should roughly be 1,200 monthly active users and 4,000 questions being asked monthly. In this case, hosting the Watson Assistant (Plus tier) chatbot on the hospital's website only needs a £168 monthly fee [2], while the Azure Health Bot Standard plan would cost £419 per month for the chatbot itself [3], and a £151 monthly cost for Azure App Service Premium v2 Service Plan to host this chatbot [4].

Another advantage is that IBM Watson Assistant is very easy to monitor and maintain, even for non-IT people, thanks to its user-friendly interface and robust monitoring tools. On the other hand, the Azure Health Bot has a much more complex interface that is harder to learn especially for someone without any experience using Azure services.

Finally, IBM Watson Assistant is very easy to integrate into a website, with a wide range of plugins and integrations available to make the process quick and seamless. To embed a fully working Watson Assistant chatbot into a website, the only thing needed is to copy and paste a pre-generated code from the Watson Assistant management portal [5]. But for the Azure Health Bot, we have to first deploy it on an Azure App Service, manually modify some code in the App Service server, use an iframe to embed the chatbot into the website, and program the logic for opening/closing the chatbot [6].

Overall, although Azure Health Bot is also one of the best chatbot solutions on the market, IBM Watson Assistant is a better choice for us to develop a chatbot generation service for NHS trusts who are looking for a powerful, affordable, and easy-to-use chatbot.


Real-time Information Retrieval Tool

We have chosen to use Bing API and Azure Cognitive Service for Language as a real-time information-retrieving tool for Watson Assistant instead of using Watson Discovery or Azure Cognitive Search due to cost considerations. Both Watson Discovery and Azure Cognitive Search have a high monthly fee of £500 [7] and £410 [8], respectively, while the Bing API and Azure Language Service cost only £17 [9], [10] per month in total.

Additionally, instead of requiring information to be acquired during the chatbot construction phase, integrating Bing API and Azure Cognitive Service for Language enables instant online searches and information retrieval throughout the discussion. We can considerably reduce costs while still offering a dependable and effective real-time information retrieval solution for the Watson Assistant chatbots we created by utilising Bing API and Azure Cognitive Service for Language.


Web Scraping and Information Filtering

For this part, we first explore a library called Selenium because it contains a web driver which can simulate a real web browser to visit the website we want to scrape instead of visiting by HTTP request in code. This can prevent our program from being blocked if there is a cookie requirement or other dynamically build content on the website which can not get by normal HTTP requests [11]. But finally, we switched to our current library Scrapy because we also need to do web crawling for this part. By using Scrapy, we can implement our web crawling tool much easier as Scrapy has a fast high-level web crawling framework [12]. Another thing is although Scrapy uses HTTP requests and responses to visit the webpage URL [13] during our testing, we found that not many NHS trust websites really need a web driver to visit. Furthermore, according to our research, it is possible to integrate Selenium's web driver into Scrapy [14]. Therefore, if it is a problem in future development, we can quickly integrate these two libraries together.

As for information filtering, we choose to use IBM NLU API to help us. We use it because the raw text contains many spaces, new lines, and tab characters. This kind of unstructured text makes us difficult to filter out useful information in these raw texts. So we need a well-trained model to help us tokenize it into pieces. We have tested through two commonly used libraries, one is called "spaCy", and the other is called "nltk", but neither of them can do tokenization to these kinds of raw text. And finally, we find out IBM NLU can achieve what we want.

The pricing of IBM NLU is very reasonable, if we use 30,000 NLU items per month, it is free. For our cases, 1 NLU item means we can tokenize 10,000 characters. In most cases, there are 1,000 - 2,000 characters per page. And we just need to use this to extract information from about 5 - 10 pages. So 30,000 NLU items are definitely enough for most of the small clinics. If the user gets over this limit, the user only needs to pay £0.0025 extra for each NLU item which is still affordable [15].


Web Framework and Database

For the NHS chatbot generation service, we have chosen to use React.js and Django as the frontend and backend frameworks and MongoDB as the database for storing chatbot generation history. React.js was selected because it allows us to leverage Material UI components, which can be readily tweaked to create a lovely and simple Interface. Moreover, React.js's usage of JSX makes it possible to build reusable components, which speeds up the development process. Django was also chosen since we need to use the Scrapy Python web scraping tool. The chatbot files created by our service are in JSON format, which can be stored more effectively in a JSON NoSQL database like MongoDB, which is why MongoDB was chosen as the database. These technologies enable us to create a scalable and reliable chatbot creation solution that satisfies NHS requirements.

Technical Decisions

We came to our final conclusions about how NHS Auto-chatbot should be constructed after conducting research using a variety of technologies and methodologies while evaluating the advantages and disadvantages of each.

Tech Decision
Languages Python & JavaScript
Bot Framework IBM Watson Assistant
WebApp Framework Django & React
Database MongoDB
Web Scraping Framework Scrapy
External API Azure Cognitive Service for Language, Bing API, IBM NLU, Watson Assistant API
External Tool Azure Functions

References

[1] “Great Ormond Street Hospital - Who We Are,” GOSH | About. [Online]. Available: https://www.gosh.nhs.uk/about-us/who-we-are/. [Accessed: 10-Feb-2023].

[2] “IBM Watson assistant - pricing,” IBM. [Online]. Available: https://www.ibm.com/products/watson-assistant/pricing. [Accessed: 13-Feb-2023].

[3] “Pricing - health bot: Microsoft Azure,” Pricing - Health Bot | Microsoft Azure. [Online]. Available: https://azure.microsoft.com/en-us/pricing/details/bot-services/health-bot/. [Accessed: 13-Feb-2023].

[4] “App Service Pricing: Microsoft Azure,” App Service Pricing | Microsoft Azure. [Online]. Available: https://azure.microsoft.com/en-gb/pricing/details/app-service/windows/?OCID=AIDcmm3bvqzxp1_SEM_35b8025258d312af90cf6645462127a6%3AG%3Asamp;ef_id=35b8025258d312af90cf6645462127a6%3AG%3Asamp;msclkid=35b8025258d312af90cf6645462127a6. [Accessed: 13-Feb-2023].

[5] “Deploying your assistant | IBM cloud docs,” IBM Cloud Docs. [Online]. Available: https://cloud.ibm.com/docs/watson-assistant?topic=watson-assistant-deploy-assistant. [Accessed: 15-Feb-2023].

[6] “Embedding covid-19 healthcare bot into a web page,” YouTube, 07-Apr-2020. [Online]. Available: https://www.youtube.com/watch?v=NSWgYx1byYo. [Accessed: 15-Feb-2023].

[7] “IBM Watson Discovery - Pricing: IBM Cloud,” IBM Watson Discovery Pricing | IBM Cloud, 15-Oct-2017. [Online]. Available: https://www.ibm.com/cloud/watson-discovery/pricing/. [Accessed: 20-Feb-2023].

[8] “Pricing - search: Microsoft Azure,” Pricing - Search | Microsoft Azure. [Online]. Available: https://azure.microsoft.com/en-us/pricing/details/search/. [Accessed: 20-Feb-2023].

[9] “Pricing - language service: Microsoft Azure,” Pricing - Language Service | Microsoft Azure. [Online]. Available: https://azure.microsoft.com/en-us/pricing/details/cognitive-services/language-service/. [Accessed: 20-Feb-2023].

[10] “Bing Search API Pricing: Microsoft Bing,” Bing APIs. [Online]. Available: https://www.microsoft.com/en-us/bing/apis/pricing. [Accessed: 20-Feb-2023].

[11] I. Ramen, “Scrape content from dynamic websites,” GeeksforGeeks, 05-Sep-2020. [Online]. Available: https://www.geeksforgeeks.org/scrape-content-from-dynamic-websites/. [Accessed: 10-Jan-2023].

[12] “Scrapy 2.8 documentation,” Scrapy 2.8 documentation - Scrapy 2.8.0 documentation, 02-Feb-2023. [Online]. Available: https://docs.scrapy.org/en/latest/. [Accessed: 10-Jan-2023].

[13] “Requests and responses,” Requests and Responses - Scrapy 2.8.0 documentation, 02-Feb-2023. [Online]. Available: https://docs.scrapy.org/en/latest/topics/request-response.html. [Accessed: 15-Jan-2023].

[14] “The Scrapy Selenium Guide: ScrapeOps,” ScrapeOps RSS, 10-Feb-2022. [Online]. Available: https://scrapeops.io/python-scrapy-playbook/scrapy-selenium/. [Accessed: 17-Feb-2023].

[15] “Natural language understanding - IBM cloud,” IBM Cloud. [Online]. Available: https://cloud.ibm.com/catalog/services/natural-language-understanding. [Accessed: 17-Feb-2023].