We have summarised our research for Sota on this page. As our project focusses on creating new ideas and use cases for the Sota robot, most of our research is conducted in the process of evaluating ideas and identifying user needs.
To implement our Cooking Assistant idea, we require three different types of APIs for the following functions:
As Sota is run with Java, we decided to use Java on the Eclipse IDE, to adhere to the choice of IDE used in NTT Data.
In this section, we look at the research we conducted into several APIs for each area and the justification of why we chose to use these APis.
Google Cloud Speech API [source]
Microsoft Cognitive Services Bing Speech API [source]
API.AI [source]
API.AI provides Natural Language Understanding Tools and allows integration with external apps and integrations with Alexa, Cortana and messaging platforms.
Conclusion - For the Speech API we decided to use Google Cloud Speech API. Although Microsoft offers quite a few addons such as voice authentication, we feel that is not a core or necessary requirement for our project. Google Cloud Speech API offers the widest variety of languages, which would be beneficial for our client in the future, if they decide to expand the market for this product. It is described as being “accurate in noisy environments”, which is important in a kitchen environment. Furthermore, reports suggest that Google Cloud Speech API is the best speech API in the market.
Google Cloud Vision API
This API includes “image labeling, face and landmark detection, optical character recognition (OCR), and tagging of explicit content.” [source]
Clarifai [source]
Conclusion - For the image recognition API we chose Clarifai. We chose Clarifai over the Google Vision API since Clarifai has a dedicated “Food recognition Model”, which is the most important image recognition feature that we need. Moreover, Clarifai helps to determine how healthy your food is based on the photo, something that would make Sota even more helpful to users. Additionally, although Google Vision API offers many different image recognition abilities, most are not needed for our program. Clarifai’s ability to automatically tag images makes scanning its environment quicker and more accurate.
Spoonacular API
Yummly API
Yummly API uses the world’s largest and most powerful recipe search site. It includes:
Conclusion - For the Recipe API we decided to use Spoonacular API. This API offers numerous features, as described before, and is considered one of the best APIs regarding recipes. Yummly API uses a single site to gather the data while Spoonacular uses multiple sites, which enables a more extensive search and returns a more diverse collection of recipes.
Problem: In order to parse strings accurately and allow more natural conversations between Sota and the user, we needed to analyse the sentence to enable the program to understand the request.
For example, when parsing for ingredients to use in querying the API, some ingredients can be made of two words.
“Find a recipe using sweet corn and flour” -> “corn” and “flour”
This is not ideal, as adjectives can be significant to the description of an ingredient.
Solution: We used an NLP API to recognise the parts of speech that each word belongs to and use these tags in the logic of our functions.
For example, in the previous case case, we used NLP to recognise parts of speech and avoid dropping adjectives that come right before nouns in the output.
“Find a recipe using sweet corn and flour” -> “sweet corn” and “flour”
While exploring the idea of using NLP, we looked at several NLP APIs.
Apache Open NLP [source]
Open NLP is an Apache-licensed machine learning based toolkit for “the processing of natural language text”. It supports part-of-speech tagging, which is the NLP feature that we require. As it is open source, we are able to use it with no additional costs to the project.
As our client has specified the Three Pillars that their company focuses on, we have research these pillars and how these are used in products similar to the NTT Sota.
Sensors and actuators embedded in physical objects—from roadways to pacemakers—are linked through wired and wireless networks, often using the same Internet Protocol (IP) that connects the Internet [source].
Information & Analysis - use the information gathered from sensors in IoT devices to find patterns & enhance decision making process
Diagram 1 - A visualisation of how an Agent interacts with its environment [link]
A voice-user interface (VUI) makes human interaction with computers possible through a voice/speech platform in order to initiate an automated service or process. The elements of a VUI include prompts, grammars, and dialog logic (also referred to as call flow).The prompts, or system messages, are all the recordings or synthesized speech played to the user during the dialog.
The following are a few of the competitors that we have identified for the Sota robot and, specifically, our use cases.
Following evaluation of our research and using the Decision Criteria detailed in the Requirements section, we have decided, as a team, on one use case and have chosen JUnit as our main test for unit testing. We have decided to prioritise unit testing and user acceptance testing, as we feel these tests would add most value to identifying flaws in and improving our project. Furthermore, we have used our research into external APIs to narrow our list of APIs down to Google Cloud Speech API, Clarifai and Spoonacular for implementing voice user interface, image recognition and connection to recipes respectively.
[1] Whitenton, K. (2016) The most important design principles of voice UX. Available at: https://www.fastcodesign.com/3056701/the-most-important-design-principles-of-voice-ux (Accessed: 1 March 2017).
[2] Benefit from a good user manual (no date) Available at: http://technicalwriting.eu/benefit-from-a-good-user-manual/ (Accessed: 1 March 2017).
[3] Pearl, C. (2016) Cathy Pearl. Available at: https://www.oreilly.com/ideas/basic-principles-for-designing-voice-user-interfaces (Accessed: 4 March 2017).