Testing

Testing Strategy

Manually testing webhook responses. Oliver did this via the Azure Function command line which
allowed him to enter different types of JSON query and check that the webhook returned the appropriate response.

Manually testing the chatbot using the Watson GUI. Lilly did this, using the use case maps in the design section.
Each query would be entered, both with a recognized entity and with an unrecognized entity and checking the
appropriate response. A screenshot of testing via the Watson GUI is included below. With more time we might have
been able to build an automated testing net for Watson using a service like POSTMan.

Manually testing the AR functionality by repeatedly building it to a mobile device and using Development debugging
in the unity console. This was our main method of testing the pipeline of the entire app and solving issues with the AR interface.
This was quite a clunky way of testing because each build takes some time, however the Unity Live Debugging platform
for mobile unfortunately does not yet support AR, so we did not have any other way of debugging and testing this functionality.

We stress tested our app in a noisy environment, in the lobby of a concert hall. This was definitely not as successful as testing
it in a live environment, with the avatar having difficulties in recognizing users’ speech. This is probably an issue to do with
Watson Speech-to-Text not being very good at handling these kinds of environments.

To the best of our ability, we tested it with members of different generations, with age range 19 – 59.
We tested our app on a Samsung Galaxy S8, a One Plus and a Moto G6.

People understood well what the avatar was and how to use it, and how to get help if necessary. Apart from
a few crashes (see known bugs), users were mostly able to complete tasks given to them, as long as the task
was sufficiently specific. General tasks like “find out about IBM” were less successful because people asked
questions that were not recognized by the avatar.

The range of spoken requests was much larger than expected. Even when given a specific task like “find a
book by Agatha Christie”, people would phrase this in different ways, some of which were not recognized by
the chatbot. In future we would need to expand our avatar’s vocabulary of recognized requests in order for it
to be fully deployable in a real environment.

The response times from Watson are sometimes too slow for people’s patience, especially when Watson is
fetching a complicated webhook query (like checking in for an appointment in a doctor’s surgery). This is
unsurprising given the number of different data channels required in one of these complex requests.
In future, we would want to find a way to optimize this further.

People didn’t move around the avatar as much as we thought they would. We thought people would try to look
at the avatar from different angles or move around, but most people were focused on asking questions and listening,
and seemed to want to keep about 1-2m distance from the avatar, the average for a normal human conversation.

The conversation flow is quite stilted, with generally one spoken request and one spoken answer. People seemed to
disengage after having asked a few questions and getting over the novelty factor of the avatar. In the future we
could use more complex animation or back-and-forth conversation flow to make the avatar more engaging.