Project Name
HeartBot - A Key Statistics Chatbot for The BHF
Main
Features
Speed up answering questions by looking at FAQs data, Data Retrieval that filters BHF's
compendium based
on the query and outputs the appropriate table of results, and can be integrated into any
website due to HeartBot being a web app.
What
could we learn
from this application?
HeartBot's ease of use and intuitiveness are the key focus points of the product. Even users
with limited technical experience and knowledge can easily use this highly advanced and
functional application. HeartBot is also easily integrated into any website, thanks to it
being a web application. A further feature that sets the chatbot apart is its Data Retrieval
which connects with the FAQs feature. These two features are using advanced Python libraries
such as FuzzyWuzzy, NLTK, and pandas. On the Data Retrieval implementation, they used a
specific formula called Levenshtein Formula to calculate the distance between two strings
and make out whether they are similar enough. This complex yet impressive technology stack
makes this chatbot very useful, especially for BHF website users.
Website Link: http://students.cs.ucl.ac.uk/2021/group20/index.html
General Solution
After analysing the three options for developing a chatbot generation service for the NHS,
we opted for the third option since it has significant benefits over the other two.
Due to the short development period of four months, the original approach of creating a
chatbot framework from scratch was discarded. This time limit is insufficient for us to
create both a chatbot framework and a chatbot generation service from scratch. Furthermore,
designing, testing, and maintaining a bespoke chatbot framework necessitates a substantial
amount of time and effort.
The second solution was using an existing chatbot framework provided by IBM Watson or
Microsoft Azure and allowing our users to generate and manage a bot directly in our service
without creating an extra account for an external cloud service. This solution was also
rejected as it would require us to develop an account management system and a subscription
system to allow the NHS trusts to register an account and pay a subscription fee. This
approach would require a substantial investment in time and resources for ensuring account
security and money transferring security.
The third and most preferred method was to utilise an existing chatbot
framework and allow users to construct a chatbot file that could be published to those cloud
services using their own accounts. This method was finally chosen because it allowed us to
concentrate on developing a simple service that delivers dependable chatbots at a minimal
cost. We may use their experience and development resources by utilising an existing chatbot
framework supplied by IBM Watson or Microsoft Azure. Furthermore, enabling users to submit
their chatbot files to their own cloud accounts relieves account management and subscription
systems of the effort. This method is an efficient and successful way of providing a service
for generating chatbots for NHS trust websites.
Bot
Framework
For the bot framework, we were choosing between IBM Watson Assistant and Microsoft Azure
Health Bot. After researching and trying both chatbot frameworks, we found out that Watson
Assistant was a better fit for our NHS Chatbot Generation Service.
One major advantage is the affordable price. For a relatively large hospital like Great
Ormond Street Hospital (GOSH), which receives 280,000 patients annually [1], there should
roughly be 1,200 monthly active users and 4,000 questions being asked monthly. In this case,
hosting the Watson Assistant (Plus tier) chatbot on the hospital's website only needs a £168
monthly fee [2], while the Azure Health Bot Standard plan would cost £419 per month for the
chatbot itself [3], and a £151 monthly cost for Azure App Service Premium v2 Service Plan to
host this chatbot [4].
Another advantage is that IBM Watson Assistant is very easy to monitor and maintain, even
for non-IT people, thanks to its user-friendly interface and robust monitoring tools. On the
other hand, the Azure Health Bot has a much more complex interface that is harder to learn
especially for someone without any experience using Azure services.
Finally, IBM Watson Assistant is very easy to integrate into a website, with a wide range of
plugins and integrations available to make the process quick and seamless. To embed a fully
working Watson Assistant chatbot into a website, the only thing needed is to copy and paste
a pre-generated code from the Watson Assistant management portal [5]. But for the Azure
Health Bot, we have to first deploy it on an Azure App Service, manually modify some code in
the App Service server, use an iframe to embed the chatbot into the website, and program the
logic for opening/closing the chatbot [6].
Overall, although Azure Health Bot is also one of the best chatbot solutions on the market,
IBM Watson Assistant is a better choice for us to develop a chatbot
generation service for NHS trusts who are looking for a powerful, affordable, and
easy-to-use chatbot.
Real-time Information Retrieval
Tool
We have chosen to use Bing API and Azure Cognitive Service for Language as
a real-time information-retrieving tool for Watson Assistant instead of using Watson
Discovery or Azure Cognitive Search due to cost considerations. Both Watson Discovery and
Azure Cognitive Search have a high monthly fee of £500 [7] and £410 [8], respectively, while
the Bing API and Azure Language Service cost only £17 [9], [10] per month in total.
Additionally, instead of requiring information to be acquired during the chatbot
construction phase, integrating Bing API and Azure Cognitive Service for Language enables
instant online searches and information retrieval throughout the discussion. We can
considerably reduce costs while still offering a dependable and effective real-time
information retrieval solution for the Watson Assistant chatbots we created by utilising
Bing API and Azure Cognitive Service for Language.
Web
Scraping and Information Filtering
For this part, we first explore a library called Selenium because it contains a web driver
which can simulate a real web browser to visit the website we want to scrape instead of
visiting by HTTP request in code. This can prevent our program from being blocked if there
is a cookie requirement or other dynamically build content on the website which can not get
by normal HTTP requests [11]. But finally, we switched to our current library Scrapy because we also need to do web
crawling for this part. By using Scrapy, we can implement our web crawling tool much easier
as Scrapy has a fast high-level web crawling framework [12]. Another thing is although
Scrapy
uses HTTP requests and responses to visit the webpage URL [13] during our testing, we found
that not many NHS trust websites really need a web driver to visit. Furthermore, according
to our research, it is possible to integrate Selenium's web driver into Scrapy [14].
Therefore, if it is a problem in future development, we can quickly integrate these two
libraries together.
As for information filtering, we choose to use IBM NLU API to help us. We use it because the
raw text contains many spaces, new lines, and tab characters. This kind of unstructured text
makes us difficult to filter out useful information in these raw texts. So we need a
well-trained model to help us tokenize it into pieces. We have tested through two commonly
used libraries, one is called "spaCy", and the other is called "nltk", but neither of them
can do tokenization to these kinds of raw text. And finally, we find out IBM NLU can achieve
what we want.
The pricing of IBM NLU is very reasonable, if we use 30,000 NLU items per month, it is free.
For our cases, 1 NLU item means we can tokenize 10,000 characters. In most cases, there are
1,000 - 2,000 characters per page. And we just need to use this to extract information from
about 5 - 10 pages. So 30,000 NLU items are definitely enough for most of the small clinics.
If the user gets over this limit, the user only needs to pay £0.0025 extra for each NLU item
which is still affordable [15].
Web
Framework and Database
For the NHS chatbot generation service, we have chosen to use React.js and
Django as the frontend and backend frameworks and MongoDB
as the database for storing chatbot generation history. React.js was selected because it
allows us to leverage Material UI components, which can be readily tweaked to create a
lovely and simple Interface. Moreover, React.js's usage of JSX makes it possible to build
reusable components, which speeds up the development process. Django was also chosen since
we need to use the Scrapy Python web scraping tool. The chatbot files created by our service
are in JSON format, which can be stored more effectively in a JSON NoSQL database like
MongoDB, which is why MongoDB was chosen as the database. These technologies enable us to
create a scalable and reliable chatbot creation solution that satisfies NHS requirements.
We came to our final conclusions about how NHS Auto-chatbot should be constructed after conducting research using a variety of technologies and methodologies while evaluating the advantages and disadvantages of each.
Tech | Decision |
---|---|
Languages | Python & JavaScript |
Bot Framework | IBM Watson Assistant |
WebApp Framework | Django & React |
Database | MongoDB |
Web Scraping Framework | Scrapy |
External API | Azure Cognitive Service for Language, Bing API, IBM NLU, Watson Assistant API |
External Tool | Azure Functions |
[1] “Great Ormond Street Hospital - Who We Are,” GOSH | About. [Online]. Available: https://www.gosh.nhs.uk/about-us/who-we-are/. [Accessed: 10-Feb-2023].
[2] “IBM Watson assistant - pricing,” IBM. [Online]. Available: https://www.ibm.com/products/watson-assistant/pricing. [Accessed: 13-Feb-2023].
[3] “Pricing - health bot: Microsoft Azure,” Pricing - Health Bot | Microsoft Azure. [Online]. Available: https://azure.microsoft.com/en-us/pricing/details/bot-services/health-bot/. [Accessed: 13-Feb-2023].
[4] “App Service Pricing: Microsoft Azure,” App Service Pricing | Microsoft Azure. [Online]. Available: https://azure.microsoft.com/en-gb/pricing/details/app-service/windows/?OCID=AIDcmm3bvqzxp1_SEM_35b8025258d312af90cf6645462127a6%3AG%3Asamp;ef_id=35b8025258d312af90cf6645462127a6%3AG%3Asamp;msclkid=35b8025258d312af90cf6645462127a6. [Accessed: 13-Feb-2023].
[5] “Deploying your assistant | IBM cloud docs,” IBM Cloud Docs. [Online]. Available: https://cloud.ibm.com/docs/watson-assistant?topic=watson-assistant-deploy-assistant. [Accessed: 15-Feb-2023].
[6] “Embedding covid-19 healthcare bot into a web page,” YouTube, 07-Apr-2020. [Online]. Available: https://www.youtube.com/watch?v=NSWgYx1byYo. [Accessed: 15-Feb-2023].
[7] “IBM Watson Discovery - Pricing: IBM Cloud,” IBM Watson Discovery Pricing | IBM Cloud, 15-Oct-2017. [Online]. Available: https://www.ibm.com/cloud/watson-discovery/pricing/. [Accessed: 20-Feb-2023].
[8] “Pricing - search: Microsoft Azure,” Pricing - Search | Microsoft Azure. [Online]. Available: https://azure.microsoft.com/en-us/pricing/details/search/. [Accessed: 20-Feb-2023].
[9] “Pricing - language service: Microsoft Azure,” Pricing - Language Service | Microsoft Azure. [Online]. Available: https://azure.microsoft.com/en-us/pricing/details/cognitive-services/language-service/. [Accessed: 20-Feb-2023].
[10] “Bing Search API Pricing: Microsoft Bing,” Bing APIs. [Online]. Available: https://www.microsoft.com/en-us/bing/apis/pricing. [Accessed: 20-Feb-2023].
[11] I. Ramen, “Scrape content from dynamic websites,” GeeksforGeeks, 05-Sep-2020. [Online]. Available: https://www.geeksforgeeks.org/scrape-content-from-dynamic-websites/. [Accessed: 10-Jan-2023].
[12] “Scrapy 2.8 documentation,” Scrapy 2.8 documentation - Scrapy 2.8.0 documentation, 02-Feb-2023. [Online]. Available: https://docs.scrapy.org/en/latest/. [Accessed: 10-Jan-2023].
[13] “Requests and responses,” Requests and Responses - Scrapy 2.8.0 documentation, 02-Feb-2023. [Online]. Available: https://docs.scrapy.org/en/latest/topics/request-response.html. [Accessed: 15-Jan-2023].
[14] “The Scrapy Selenium Guide: ScrapeOps,” ScrapeOps RSS, 10-Feb-2022. [Online]. Available: https://scrapeops.io/python-scrapy-playbook/scrapy-selenium/. [Accessed: 17-Feb-2023].
[15] “Natural language understanding - IBM cloud,” IBM Cloud. [Online]. Available: https://cloud.ibm.com/catalog/services/natural-language-understanding. [Accessed: 17-Feb-2023].