Implementation of key features

Following on from the system overview given on the system design page, we will now describe implementation details in greater depth on this page.

Voice assistant

Full code documentation for the Ask Bob voice assistant framework has been included within this report and is available by clicking the “code documentation” link on the sidebar.

Speech module

Audio listening

The role of the audio listeners (“utterance services”) is to collect together audio frames comprising complete spoken utterances.

There is a base class askbob.speech.listener.listener.UtteranceService containing logic shared across all utterance services, crucially code used to yield audio frames in the audio frame buffer queue deemed to be speech by the WebRTC voice activity detector, and from which concrete utterance service implementations extend. This base class also resamples audio frames should they be at a frequency other than the 16kHz used by Mozilla DeepSpeech and applies a bandpass filter to the audio signal to taper off audio frequencies below 65Hz and above 4000Hz. To do this, we used the bandpass Butterworth filter included within scipy of the 4th order with a Nyquist frequency of half the sample rate (i.e. 8kHz).

The askbob.speech.listener.file.FileUtteranceService takes a single-channel RIFF WAV file – preferably recorded at a 16kHz sample rate so as to avoid resampling artifacts – using the 16-bit signed integer format and read its audio frames in chunks of 320 into the audio frame buffer queue used by the base class.

The askbob.speech.listener.mic.MicUtteranceService uses pyaudio to open an input stream to the configured microphone input device that reads audio frames into the aforementioned buffer queue. This stream is destroyed when the voice assistant application is torn down.

Speech transcription

The speech transcriber (askbob.speech.transcriber.Transcriber) processes speech audio frames yielded by a concrete utterance service implementation (either for file or microphone listening) and yields any determinable speech that could be transcribed as text, as well as either a START_UTTERANCE event or an END_UTTERANCE event encoded within a TranscriptionEvent enum to provide additional information about the transcription state. To transcribe the speech audio, we used Mozilla DeepSpeech in conjunction with their English pre-trained model and external scorer.

For debugging purposes, the speech transcriber also has the ability to output recognised speech as WAV files. This could be useful for users wishing to test other DeepSpeech models in conjunction with Ask Bob.

Speech synthesis

The text-to-speech service class (askbob.speech.synthesiser.TextToSpeechService) within the speech synthesis submodule uses pyttsx3 to vocalise text passed to its say(text: str) method using a voice configured by the user in the config.ini file found at the base of an Ask Bob project folder (or alternatively, the default voice otherwise).

Query response using Rasa and SpaCy

There is a base askbob.action.responder.ResponseService class defining the interface for Ask Bob response services that is extended by the askbob.action.responder.RasaResponseService class, which uses Rasa and SpaCy for intent classification and entity extraction, as described in our research. The RasaResponseService first loads the Rasa model trained by the user when running the Ask Bob setup tool (python -m askbob --setup), which trains a model from the plugins installed in a particular Ask Bob project folder, as well as the configured SpaCy model, and then starts accepting queries from Ask Bob’s voice or server interfaces. The Rasa custom action server is used to execute custom plugin action code, as detailed in the next section.

We are using the following Rasa natural language processing task pipeline:

More information on available Rasa pipeline components are available at https://rasa.com/docs/rasa/components/. We selected the pipeline components that, in our opinion, best matched our use case.

Plugins

An Ask Bob plugin consists of a plugin folder containing at very minimum a config.json file containing plugin metadata as well as all of the data needed to train a Rasa model. Plugins may also contain Rasa custom action code, which allows Ask Bob plugins to run arbitrary Python code when certain spoken intents are triggered by the user (in either interactive or server modes). Plugins are installed by copying the plugin folders into the plugins folder of an Ask Bob project folder (containing the plugins and configuration of a particular Ask Bob project).

Files containing such custom action code must be listed within a Python list inside the __init__.py file at the root of the plugin folder in the following way (e.g. for a file containing custom action code called actions.py):

__all__ = ["actions"]

Plugin action code must both extend the rasa_sdk.Action class and be decorated by @askbob.plugin.action, which is a Python class decorator that we use to properly name the plugin action according to Rasa naming conventions (with the plugin name appended to the end to prevent naming conflicts).

Continuing with the example, within actions.py, custom action code takes the following form:

from typing import Any, Text, Dict, List
from rasa_sdk import Action, Tracker
from rasa_sdk.executor import CollectingDispatcher

import askbob.plugin


@askbob.plugin.action("plugin_name", "action_name")
class ActionHelloWorld(Action):

    def run(self, dispatcher: CollectingDispatcher,
            tracker: Tracker,
            domain: Dict[Text, Any]) -> List[Dict[Text, Any]]:
        from datetime import datetime

        dispatcher.utter_message(text="Hello, world")

        return []

More detail on Rasa custom actions may be found at the Rasa SDK documentation.

On startup, the query responder launches the askbob.action.server module, which starts the Rasa custom action server. This registers all of the Rasa custom action handlers in preparation to invoke actions triggered by users’ queries.

Plugin example

One simple example is the creation of a time plugin that adds a single skill that tells the user the current time.

Within plugins/time/config.json, there would be the following plugin JSON metadata and training data:

{
    "plugin": "time",
    "intents": [
        {
            "intent_id": "ask_time",
            "examples": [
                "What time is it?",
                "What time is it right now?",
                "What time is it now?",
                "Tell me the time",
                "Tell me the time right now",
                "Tell me the time now"
            ]
        }
    ],
    "actions": [
        "fetch_time"
    ],
    "skills": [
        {
            "description": "give the system time",
            "intent": "ask_time",
            "actions": [
                "fetch_time"
            ]
        }
    ]
}

The plugins/time/__init__.py file would contain __all__ = ["actions"] and plugins/time/actions.py would contain the following Python code:

from typing import Any, Text, Dict, List
from rasa_sdk import Action, Tracker
from rasa_sdk.executor import CollectingDispatcher

import askbob.plugin


@askbob.plugin.action("time", "fetch_time")
class ActionFetchTime(Action):

    def run(self, dispatcher: CollectingDispatcher,
            tracker: Tracker,
            domain: Dict[Text, Any]) -> List[Dict[Text, Any]]:
        from datetime import datetime

        dispatcher.utter_message(text="The time is {0}.".format(
            datetime.now().strftime("%H:%M")))

        return []

This would give the following file structure inside plugins/time:

Further examples of plugins may be found in our repository within the examples/plugins folder.

Interfaces

Ask Bob, once installed, must be run from the root of an Ask Bob project folder containing at least the following:

There may also be a Dockerfile, an additional README.md specific to that Ask Bob project (where the Ask Bob voice assistant framework is used with a particular set of skills plugins within the project of a third-party developer) and potentially additional requirements related to the plugins being used in that project.

We have included an example Ask Bob project folder in our main repository found at the following link: https://github.com/UCL-COMP0016-2020-Team-39/AskBob/tree/main/examples/demo

Another example of an Ask Bob project folder is the usage of Ask Bob by team 25 (concierge web services), found at the following link: https://github.com/UCLComputerScience/COMP0016_2020_21_Team25/tree/main/AskBob%20Concierge

Interactive mode

Ask Bob may be run in interactive mode (python -m askbob) from an Ask Bob project folder. In interactive mode, users can speak directly to the voice assistant and hear audible responses as part of a read-evaluate-print (REPL)-style loop.

At its core, the interactive loop boils down to the following code snippet:

print("Listening (press Ctrl-C to exit).")
for state, text in transcriber.transcribe():
    if state == TranscriptionEvent.START_UTTERANCE:
        spinner.start()
    elif state == TranscriptionEvent.END_UTTERANCE:
        spinner.stop()

        if text:
            print("==", text)
            async for response in responder.handle(text):
                if "text" in response:
                    print("=>", response["text"])
                    speaker.say(response["text"])

A spinner is used to show whether the voice assistant has determined that someone may or may not be speaking. When the transcriber yields an END_UTTERANCE event and the speech was discernable, it is passed to be handled by the query responder and textual responses are then synthesised and played by the TextToSpeechService.

The interactive loop can handle responses yielded by the asynchronous handle generator of the ResponseService with the text type, as in the following example:

[
    {
        "text": "One joke, coming right up!"
    },
    {
        "text": "Without geometry life is pointless."
    }
]
Web server mode

We used the Sanic Python web microframework library to create a web server interface for Ask Bob written using asynchronous, non-blocking code. The endpoints for the web server are both documented in our repository README and in our Ask Bob user manual.

When the server is running in “voiceless” mode, users can send requests to the Ask Bob server with a sender identifier and a message in text form to be interpreted. When the server is running in “voice-enabled” mode, users can additionally upload 16kHz single-channel RIFF WAV files to the Ask Bob server instead of a textual message. In this case, the audio is stored in a temporary file and then fed through the transcriber using the FileUtteranceService for transcription. The transcribed speech is then used just as in the “voiceless” flow and passed to the query responder to produce a response.

As an example, a request to the /query endpoint with the sender as "askbob" and the message as "tell me a joke" could produce the following response from an Ask Bob server with the example puns plugin installed:

{
    "query": "tell me a joke",
    "messages": [
        {
            "text": "One joke, coming right up!"
        },
        {
            "text": "Without geometry life is pointless."
        }
    ]
}

Some plugins may emit responses with the custom type, allowing actions to emit custom JSON responses, which could be useful when integrating Ask Bob into the projects of third-party developers. This was the approach taken when developing the FISE Lounge and FISE Concierge integration plugins, for example, the FISE Lounge integration plugin produces the following response for the query “call John”:

{
    "query": "call John",
    "messages": [
        {
            "text": "Calling John."
        },
        {
            "custom": {
                "type": "call_user",
                "callee": "John"
            }
        }
    ]
}

Configuration generator web app

Using Fomik in forms

Most of the form components are made using Formik to help handle state and Material-UI for styled, responsive components and Icons. The AddSkillForm.js, AddSlotForm.js, and WithForm.js files all use Formik and Material-UI.

The Formik component imported from Formik accepts an object that represents the initial values of the form, a submit, and, as a child node, validate functions and a function that returns the actual form. Changes to input fields are handled by Formik and error messages are passed appropriately to the correct input that caused them.

The FormSelect.js and FormTextField.js files export components that use Formik and receive props (values passed down when called) from a parent component that include error messages and error state.

Using Formik has allowed us to reduce bugs as complex error state handling code is handled with Formik.

Using Material UI

Material-UI was used in most files that exported components. We used this library to create a user-friendly, responsive user interface that provided clear feedback with stylish hover and focus animations.

Sortable.js and the story form component

Sortable.js was used in the AddStoryForm.js file to add drag-and-drop features to the AddStoryForm Component exported from the file. The ReactSortable component is exported from react-sortable. ReactSortable accepts the steps variable that represents the steps of a story and the functions used to change the steps. It also accepts as a child node the steps mapped out as a list of components. ReactSortable then allows users to sort the steps of a story by dragging and dropping the input fields. This changes the steps variable.

Redux and state management

We used Redux to manage the state of the configuration file. We use our custom reducer factory to make reducers that control the state. The redcuerFactory.js file exports this reducer factory. In the index.js in the reducers folder, the combineReducers from redux is used to make a reducer. This main reducer is a combination of reducers for intents, synonyms, regexes, lookups, responses, skills, stories, and slots made using the reducerFactory function.

The state of each of these reducers is an object with an items property, a mode property, and a currentItem property. The items represent the current data for part of the configuration file. For example, for the intent’s reducer, the items list is a list of objects that each contain a name and a list of examples. The items listed here represent a list of intents.

The currentItem property of the state of the reducer represents the current item being edited in the form. If a user is editing an intent, the name and examples of the intent are stored in the state of the intents reducer as the currentItem. The mode property of the reducer is either ADD_MODE_{name} or EDIT_MODE_{name} where {name} is the name of the reducer (intents, response, etc). This value controls whether forms add or edit values.

In the store.js file, the redux store is created and exported. In the src/index.js file, Provider from react-redux is wrapped around the App component to allow redux and functions for getting and changing state to be used. The Provider function takes in the store exported from store.js.

The result is that all components can use the useSelector redux function to get the current state and the useDispatch state to dispatch actions and change the state.

The reducer factory makes reducers that can add items to the state, update items in the state, or delete items from the state. These reducers can also change the mode of the state and the current items. lastly, the reducers can store and load the state into local storage.

These reducers handle actions dispatched to them. Here is an example of an action being dispatched to the store.

const action = {
  type: "ADD_INTENT",
  intent: { name: "greeting", examples: ["hi", "hello there"] },
};

dispatch(action);

The result is that the current state and this action are passed to the reducer and the new state is returned. The src/actions folder contains files that export action creators, functions that return actions. Using action creators is more for code organisations but actions can be entered directly into the dispatch function.

The combineReducer function passes the action to the correct reducer that can handle the action depending on its type (or if the action has an unrecognised type, just returns the state).

The result is that the state is controlled by a series of reducers made by the reducer factory that all handle similar actions.

Skills Viewer

Using Material UI

Material-UI was used in most files that exported components. We used this library to create a user-friendly, responsive user interface that provided clear feedback with stylish hover and focus animations.

The Accordion component from Material-UI is used to make the Category.js and Plugin.js components accordions. An Accordion is a piece of UI with a visible title that the user can click on to show more data. This provides a user-friendly way to see all user skills.

The Typography component Material-UI is also used to style the Selector.js component.

Using Query String and React Router DOM

React-Router-DOM is used to get the search query from the current URL. The query string library is used to parse the search query, before the search value for ‘URL’ is entered into the search input on the main page.

As a result, it is easy to use the skills viewer as it uses responsive, well-designed components (due to Materi-UI and custom CSS). It can also be used easily as an iframe as if there is a ‘URL’ search term, the app will automatically enter that value into the search input, and then fetch and display the skills fetched from the URL.

Reducers for the configuration generator web app

Model of Reducers

For the configuration generator web app, we decided to use a reducer to manage the state. A reducer is a function that takes in the current state and an action and returns the new state. An action is an object that has a type and may have a payload (data attached to it). The reducer uses the action type to determine how it should return the new state.

For example, a reducer may take a list of first names (such as [“John”, “Steve”, “Liza”]) as the current state and an action of type “ADD” with a payload of “Mary”. Then, the reducer may return a new list [“John”, “Steve”, “Liza”, “Mary”].

This way, complex state can be managed and changed consistently. As reducers are pure functions (they always return the same state given the same previous state and same action), using them allows us to change the state in a reliable expected way. The specific way reducer and action creator code are organised means Redux scales well with larger applications. It is easy to separate Redux code from the rest of the app and debug it as well.

Redux debugging Developer tools also help as the actions a user takes in a session can be tracked and any errors easily spotted. Coding with Redux made testing and debugging considerably easy.

Problems with the configuration file

If there are not enough examples in an intent, the intent may not be properly recognised by Ask Bob. A note is present in the app to advise users to use intents with at least five examples.