Design

System Architecture

Our system is mainly composed of the SOTA robot, the Google Speech service, the MySQL database, the Oxford Dictionary API, the Watson Conversation API, and the Watson Text-To-Speech API.

The SOTA robot is responsible for recording the child’s response and creating an audio file to store this response. It is also responsible for playing the response audio file to speak back, as well as moving its different body parts using multi-threading.

The Google Speech API receives the audio file that is generated by SOTA, connects to the Google Cloud service, and converts the speech into text.
The MySQL database is then responsible for storing the response of the child to identify improvements in their language.
The Oxford Dictionary API will define words which the child does not recognise.
The Watson Conversation facility is responsible for identify the child’s response and deciding which branch of the story to follow.
The Watson Text-To-Speech service receives the text to be spoken and generates the appropriate audio file.

Class Diagram

Design Patterns

Singleton

The Singleton design pattern prevents a class from having more than one object instantiated from it during the program life cycle. This is beneficial in scenarios where more than one object would not be allowed to exist. In our project, we made use of the Singleton design pattern in our implementation of classes which control access to paid services. In particular, these classes are the GoogleSpeechToText, WatsonConversation and WatsonTTS classes. We created these classes with private constructors and with static methods to return their instances. As a result of this, we are able to restrict the initialisation of these paid services into one initialisation, such that efficiency will not be compromised and multiple services which incur costs will not run at the same time.

Chain of Responsibility

The Chain of Responsibility design pattern allows objects in our project to be aligned together, similarly to a chain, where each object does not need to be aware of the other objects that will receive its request in the chain. This is beneficial as it allows loose coupling between objects, and also adheres to the Single Responsibility Principle, which states that each object needs to oversee only one particular function of the entire project. In our case, we have implemented the Chain of Responsibility design pattern by creating a chain of command where the Main class interacts with the GoogleSpeechToText class, which in turn makes a decision on whether the request needs to be sent to the OxfordDictionary class or the WatsonConversation class. Ultimately, the WatsonTTS class receives the command at the end of the chain and produces the appropriate audio file.

Facade

The Facade design pattern allows us to wrap the source code of a complex sub-system into a class which acts as a simpler interface to this body of code. As a result, it becomes much easier to conduct the operations and functions provided by the facade class from other classes, which do not need to be aware of specific implementation details to make use of of the class’ functionality. These classes will only need to interact with the simple interface provided by the facade class. We used the Facade design pattern in our project during our implementation of the Database class, which provides a simple functionality to be able to add and print records stored in our database, which involve a series of complex operations.

Mediator

The Mediator design pattern allows a Mediator class to be created which essentially handles and encapsulates how different classes communicate with each other. Indeed, in our project, communication between the Speech-To-Text, Dictionary, Conversations facility, and Text-To-Speech classes was required. Each of these classes had a particular role in the flow of the conversation, and the output of one class represents the input of the next. Creating a Mediator class would allow a centralised class which handles all of this communication, hence, reducing coupling between the aforementioned classes. The Mediator class would therefore communicate between the different objects in their behalf. This is a design pattern which we strongly considered using, however, we felt that it would attach too much responsibility towards the Main class and would not delegate responsibility effectively.

Private Class Data

The Private Class Data design pattern enables a programmer to restrict access to the attributes of a class by encapsulating them in a Data class. Essentially, this means that attributes which are intended to be immutable can be protected, as even methods from the class they belong to will not have write access for them. We considered using this design pattern in our class, as certain attributes such as the access tokens for IBM Watson services can benefit from the Private Class Data design pattern. However, we decided not to use this pattern as it would create unnecessarily large classes which would be difficult to read.

Data Storage

Our database schema for storing the user responses is very simple, it contains three fields. These fields are NoOfBranch, which specified the stage in the story that the response was received, the actual response, as well as the Date that the response was received

Key Functionalities

Speech To Text Service

In order to implement the Speech To Text service, we created a class called GoogleSpeechToText. This class makes use of the environment variable, GOOGLE_APPLICATION_CREDENTIALS, which stores the path to the JSON file which contains the account details of the Google Cloud subscription. The class has a function which instantiates a new client based on the environment variable, and sends a stream of audio bytes to be recognised by Google Speech. The function then returns a String containing the recognised result.

Dictionary Service

As for the dictionary function, we wanted to enable the function by simply asking, “What is the meaning of …..?”, and Sota will answer users with the definition in the dictionary. To achieve the goal, we made use of Oxford Dictionary API. It sends a Http connection via GET method and returns the meaning of the word in a JSon array. We parse the sentence of users and pass the word after “meaning of“ as the param to look up in the dictionary.Then we passed the returned definition of the word to Watson Text to Speech to make Sota say it aloud.

Natural Language Processing

To understand the response of the child, and match it to a list of intents that we expect, we needed to perform Natural Language Processing (NLP) on the child’s response. We identified the Watson Conversation Facility as an appropriate tool to perform NLP, as it is able to identify the meaning of a response, regardless of which words or sentence structure are used. As long as some key words or their synonyms are included in the response, it will be recognised correctly. Once the meaning of the input is identified, Watson Conversations is then able to follow a given tree to represent the flow of the conversation. In our case, this is the story that SOTA is telling. We have created a class WatsonConversation for this task. This class creates a new Conversation service and sends the text generated by Google Speech To Text to be processed, in order to receive a JSON response which contains the part of the story which SOTA will now speak.

Database

To achieve MySQL database and Java connection, we first intended to use Java JDBC. However, our database is hosted on a free website, 000webhost, which does not support Java JDBC unless we upgrade to premium account. In order to save the cost of development, we found out another way to achieve it. We built up the Java/PHP bridge by sending a Http request from Java to call PHP file hosted on the website. Then the PHP file will access to database and return the reference data as a JSon Array or insert new data into database according to different requests.

Text To Speech Service

To implement the Text To Speech service, we used IBM Watson Text To Speech. We created a class called WatsonTTS which creates a new service based on the account details of the particular user. The class has a function which receives the text to be spoken from either the Dictionary class or the WatsonConversation class. This function generates an audio file in WAV format from the given text, based on the voice selected, which in our class is Female, American English. We are then able to use the SOTA library CPlayWave which uses SOTA’s speakers to play the audio file.

SOTA Movement

To be able to use SOTA to its full capabilities, we needed to make SOTA move its hands, face and body at the same time as speaking. Therefore, in order to implement this, we needed to use the Java multi-threading facilities. We used the SOTA movement libraries CSotaMotion and CRobotPose to control which body parts we move and the coordinates of the movements. The SOTA virtual robot facility helped us visualise the movements before implementing them on the real robot. We then used multi-threading by placing the SOTA movement and speech in separate threads which are started simultaneously, and are timed such that the overall time taken for the movements is the same as the time taken for the speech.