NHS Chatbot Generation Service is mainly composed of two parts: the chatbot generation
service itself and the Watson Assistant chatbot generated. The chatbot generation service
uses React.js as the frontend framework that allows users to access the generation service
from a browser. For the backend, we chose Django as its framework. We use Django to handle
HTTP requests from the React.js frontend; web scrap all the text information from an NHS
trust's website by Scrapy and use the information retrieving tools as well as the Regular
Expression (Regex) matching method to filter the target information that needs to be stored
in the chatbot cache. After getting the chatbot JSON file, Django will save it in the
MongoDB database for the generation history.
Our users can upload the chatbot file generated by our service to IBM Watson Assistant and
get a customised chatbot for their trust's website. This chatbot is built with a webhook
that consists of Bing API and Azure Cognitive Service for Language. During a real-time
conversation, this webhook handles questions that cannot be easily answered by the cached
information. It first uses Bing API to do an instant web search over the trust's domain.
Next, it will apply Azure Language Service to process the top search results from Bing and
summarise a reasonable answer to reply.
In addition to the two main components, Azure Function is also a key part of the NHS Chatbot
Generation Service because it helps seamlessly exchange information between the various
services we use. For example, the frontend is connected to a Watson Assistant instance for
previewing using a set of Azure Functions. Moreover, every Watson Assistant chatbot we
generated is connected to the instant information-retrieving webhook by an Azure Function.
Overview
Chatbot Generation Service
Chatbot Framework
The chatbot generated by our service uses cached information for predefined questions and Bing API and Azure Cognitive Service for Language for general ones. For the general questions, the chatbot evaluates search results from Bing API and provides a reasonable answer summarised from the top 5 search results by Azure Language Service if the confidence score is high or the top search result with a source link if the confidence score is moderate. If the confidence score is low, the chatbot informs the user it cannot answer the question with a standard response.
If the user's question is recognized as a predefined question, then our chatbot will answer
it based on our web scraping result, as explained in the Dialogue Flow diagram. For this
part, we create an information-retrieving process to fetch all the information we need from
the given NHS Trust website. For specific information such as phone number opening times and
address, our process starts by taking the domain from the frontend then we crawl all the
URLs under the domain. After this step is finished, we get content from the Domain. This
part contains two sub-process: filter URL by its title and does web scraping based on the
filtered URL. Then we moved on to the next step called Data Extraction. This step contains
two sub-process as well, Tokenization by IBM NLU to split the unstructured data we get on
the webpage into pieces and match the key information we need by Regex.
Finally, we combined everything and make it into our DialogJSON. We also managed to cache
some general answers locally with the aid of our Azure Functions to avoid further costs for
Bing API. It can help us to remain consistent in our answers as well.