Team 25 - VirtualTherapy

Virtual Therapy

"Cognitive Rewiring through mixed-reality"

Virtual Therapy is a 3D audio feedback system created in Unity to help those with visual impairment and lower-back disability. It focuses on improving and correcting the posture of user's through various iterations of pleasant sounds.

The problem is the need for automatic computer feedback in therapy sessions that would allow the user understand the alignment of his joints. Our approach uses sensors to gather users joint position and rule-based algorithms for pose detection. 3D sound was used as a form of intuitive feedback on how to correct the pose.

We successfully developed a system that tackles this open research problem using innovative technologies. We hope this can be used globally, and positively affect those who require it.

Key Features

3D-Sound

3D sound. The way natural sound is heard from the left and right ear.

Pose Correction

Helps the user to correct his/her posture through pleasant sound iterations. For Example trying to stand straight.

Store Positions

Store the current posture into a database.

Voice Commands

A nice user-interface with voice-activated commands

Features video

Meet the Team

Rares Dolga
Team Leader
Front End Developer
Back End Developer
< rares.dolga.16@ucl.ac.uk >

Fazaan Hassan
Tester
Back End Developer
Head of Research
< fazaan.hassan.16@ucl.ac.uk >

Cavan Black
Audio and Visual Lead
Researcher
Documentation
< cavan.black.16@ucl.ac.uk >

Requirements

Project Background

The assigned name of the project is “Cognitive rewiring through mixed reality”. The affiliated organisation is Microsoft whom require an application for their user Mark Pollock. VirtualTherapy (Name of the application) is created in such a way, that it can be used alongside the current mechanism of a user correcting his/her position with the help of a psychiatrist. The user-interface is developed in mind of further improvements for software engineers in order to provide a friendly interface when using the application, but also for the trainer/psychiatrist to start and stop the program if required so.

Client and User

The group of people part of this project and whom the solution is intended for are as follows: Jarnail Chudge – Client and Project Support, Mark Pollock – Client and Intended user, and finally Dimitry Sayenko – Subject matter expert. Through email communication and skype meetings, we were guided to the right path in terms of the correct technologies and intended hardware to be used for this project.

Project Goals

The project is designed to tackle the issue of correcting a user’s posture given the fact that they are visually impaired and paralyzed waist down. The application is trying to give the user more freedom by relying only on pleasant sound iterations to indicate if they are correctly aligned.

Requirement Gathering

Approach 1 - Structural Interview

In order to obtain client requirements, we had used the method of email communication to obtain more knowledge of the project. Below we will talk about why we chose this and why it was appropriate for our given situation.

Reasons for choosing this method?

Our client was situated in California
Questions and answers were delivered and responded to quickly
A record was kept of our communication and any misconceptions were handled quickly.
We were able to link (‘Cc’) other people related to the project, for example our supervisor Simon.
Below is an example of the questions we asked:

What should we aim to develop?
- If targeted user's have sight problems, should we just audio to guide them?

From the research papers, we understood that you want a game that helps disabled people exercise for reconditioning.
- How should VR or mixed reality have a role into this?

From the brief and papers you sent, we understood that the end user is Mark.
- Should we build the app just for disabled people who are blind?
- Or should the target user's be people with Spinal Cord Injury?

Reasons for choosing this Initial Method

Due to no prior knowledge the questions had started of basic. This allowed us to gain an in-depth overview from the client of what is expected
The structure of the questions meant we could pinpoint specific areas of the project
Allowed us to be prepared for our skype interview as we had gained some knowledge of the client requirements

What Next?

To be able to create abstract user requirements from the information gathered
Have a face to face interview over Skype. This will allow us to have a more informal discussion and understand the project better

Approach 2 - Online Questionnaire

An online Questionnaire was sent to the Mark's physiatrist to retrieve specific information in order to create the solution that was required to help him. The information was valuable because as our project would circulate around these requirements. Based on this we could focus on certain joints of the human body opposed to the whole skeleton which would have been beyond the scope of the project.

Approach 3 - Skype Interview

After gaining an insight of what the project entails we finally arranged a suitable date to skype our client. Questions were thought of prior to the interview. These skype meetings were set every fornite to update our client and user on our progress.

Why did we choose this?

Build a friendly and trustworthy relationship with the client
Clear any doubts about current mock-requirements
Clarify our budget, skills and what is doable in the given time frame

Personas

Mark Pollock

Occupation: Athlete

Role: End User

When Mark was five, he lost the sight of his right eye and was forced during the remainder of his childhood to avoid contact team sports to preserve the vision in his left eye. He later went on to study Business and Economics in Trinity College, Dublin, where he became the champion of the institution rowers team. At age 22 he lost the sight in his left eye and was then left blind. In 2012, just weeks before his wedding, Pollock fell from an upstairs window, injuring his back and fracturing his skill. This caused internal bleeding on the brain and resulted in long term paralysis.

James Gordon

Occupation: Researcher

Role: Secondary End User

James Gordon was 8 years old when he had a severe head trauma leaving him blind in both eyes and paralysed waist down. He was educated at the Nottingham High School before graduating and began research into the area of guide dogs and its potential traffic dangers for the blind. He has taken extensive research within the domain of accessibility for patients that are visually impaired. He has contributed to the "ParaEye" society in UK giving up much of his time to young kids. His work has been renowned by many communities.

Storyboard

MoSCoW Requirements

ID	Requirement	Type
Must
R1	Gather 3D coordinates of joints from the environment	Functional
R2	Filter inaccurate data from the sensors to be removed	Functional
R3	Reconstruct the skeleton in the Unity environment and map the data received from the sensors	Non-Functional
R4	Consider the following parts of the body: knee, heel, toe, ankles, hips and mid-back	Non-Functional
R5	Develop an algorithm that recognizes wrong alignment of specific joints	Non-Functional
R6	Transform coordinates and angles into 3D sound modulation	Functional
R7	Guide User to stand correct in relation to: Trunk / mid-back Extension; Hip Extension; Knee Extension	Functional
Should
R1	Create a Audio User Interface - Main Menu	Functional
R2	Guide the User to stand correctly in relation to: Trunk / mid-back Extension; Hip Extension; Knee Extension	Functional
Could
R1	Consider other body parts such as shoulders and chest	Non-Functional
R2	Save multiple posture that are targeted	Non-Functional
R3	Create a server for connecting to the database	Functional
R4	Create a mongoDB database	Non-Functional

User Cases

Use case Diagram

User cases

Use Case 1

Use Case	Calibration of the application
ID	UC1
Brief Description	Ensures headphones are correctly worn and 3D sound can be heard distincly from each ear
Primary Actors	User
Secondary Actors	System Trainer
Preconditions	None
Main Flow	1. The user runs the application on the machine 2. The system displays the main page with voice instructions guiding the user to select an option 3. Calibration is chosen (Vocally or is clicked on)
Postconditions	None
Alternative Flows	None

Use Case 2

Use Case	Correct User's Posture
ID	UC2
Brief Description	The user will move his joints according to sounds in 3D space until he is standing correctly
Primary Actors	User
Secondary Actors	System Trainer
Preconditions	The kinect sensor must be enabled/working for it to recongize movement. It must also be placed at an appropriate angle and height.
Main Flow	1. After the calibration process, the user says "Start" as indicated by the instructions 2. The screen changes to an avatar and smaller screen showing the image of the user's joints. This information is for the trainer/psychiatrist. 3. A quick rundown of the sounds are played. These indicate which sound is for which joint 4. The pose correction algorithm is started and the user is guided to correctly stand.
Postconditions	None
Alternative Flows	Invalid Query: Another option is selected
ID	UC2.1
Brief Description	The user exits the system
Primary Actors	User
Secondary Actors	System
Preconditions	None
Main Flow	1. The user says "Exit" 2. Application closes

Use Case 3

Use Case	Store 3D coordinates of the current position
ID	UC3
Brief Description	The system captures the current position of the user and stores it into a database.
Primary Actors	User Trainer
Secondary Actors	System
Preconditions	None
Main Flow	1. The user says "Save" 2. Rotation, localRotation, position and local position are captured for each joint 3. A JSON document is created for each "save" and stored in a collection on a mongoDb server
Postconditions	None
Alternative Flows	None

Research

Potential Devices

Sensors are a key aspect of our project. We required high quality six degrees of freedom devices to get the position in 3D space of the user's joints. If the data received is not accurate enough the given feedback will have faults.

Polhemus Fast-Track

It is a promising motion track solution that uses the magnetic source to detect the position of smaller devices mounted on the user's joints. Data received from this sensor is in real time, and their experiments suggest that there is no latency. Also, the occlusion does not represent a problem in this scenario, because main joints have individual sensors. [1]

Although this is a good solution in terms of data precision, it is very expensive and exceeds our budget for the project. Furthermore, it requires a considerable amount of time to integrate it with a software that can produce 3D sounds.

XSENS MV

It is a full body motion analysis system capable of giving 3D joint angles, the orientation of bones and centre of mass [2]. The first disadvantage is can be seen by cost; 35,350 Euro is way beyond our budget. Secondly, it has a small delay in sending data by wireless communication and using cables would not be practical for the user. Although, compared with Polhemus it has the advantage that it gives the centre of mass information.

Notch Sensors

Our supervisor Simon suggested them as they are cheap and offer reasonable results when gathering joint position. Main obstacle we faced with this sensor is that it display real-time data only using a android application. No existing API makes it considerably hard, to take a continuous flow of body segments positions and use them in our own software for producing 3D sound. Being composed of 6 sensors, it would be impossible for us to separately track all 20 joints that we need for calculating the pose of a user [3].

Kinect V2

Kinect is a depth sensor created by Microsoft that gives the skeleton position of a user in 3D space, using infrared light and a RGB camera. It is a cheap and robust solution, with good documentation and online support. We decided to go with Kinect, because compared with all the previous solutions, it offers the possibility of connecting with Unity and because it provides a perfect balance between cost and performance. This part was vital for us because we needed a game engine or other software capable of producing mixed reality applications. Also, it provides an SDK that returns the position of up to 25 joints [4]. The device has very performant audio capturing functions as well. This is an important aspect of designing a voice command-based user interface for visually impaired persons. When compared with the first solutions it does not offer such good results for occlusion. However, we try to reduce this impact by approximating the position of interfered joints and filtering data.

Potential Frameworks

The VirtualTherapy project is a mixed reality application, so we required an editor that would facilitate the construction of such a software. This can done by different gaming engines. They are described below:

Unreal

Unreal is a powerful game engine and has support for VR/AR applications. However, we have not used it because it does not offer support for integrating Kinect. This is a major downside, since Kinect needs wrapper functions to transform the raw data from the sensors.

Unity

Unity is a game engine that has a great support for virtual reality applications. The biggest advantage is that Kinect has plugins for Unity. This makes it possible to transfer data from the real world to the virtual world. We have chosen Unity, because it has great built in functions for creating immersive 3D sound feedback but also an accurate depth imager. The choice was also heavily affected by the large amount of documentation available for Unity and the support given on forums. Problems could be solved more quickly as they may have previously been encountered.

Potential Programming languages

Front End Development

Boo
JavaScript
C#

The last 2 are the most popular henceforth we have chosen to program in C#. The reason being, we are familiar with object-oriented programming languages such as Java, which is similar to C#. If we had chosen to write in JavaScript, there would have been issues with script compilation for premade libraries which we will discuss later below. Apart from this, there are no performance advantages of one language over the other.

Server Side Development

Node JS
PHP
Ruby

For the server part we have used Node JS. Alternatives include Django, PHP, Ruby, but node.js has multiple advantages over all. We have decided to use Node JS due to previous experience. This saved time as we did not have to learn a new backend programming language. It also allows the construction of real time web application, in which both the client and the server can initiate requests. Being an extension of javascript the documentation provided showed that connection of database drivers were can be easily implemented. Also, it can handle a huge number of simultaneous requests which means high scalability. This is important as Microsoft would like to further develop on the application in the future.

Potential Databases

MySQL Relational Database:

MySQL is a powerful database, known for high performance and data security. It is one of the best options for complex queries. However, data must follow the same structure. The structure of our data might change, based on the pose chosen by the user. This means the relationship between joints and their number will differ a lot [7]. Also, we have a lot of relationships between the data which will mean introducing many new tables and linking them together in the most efficient manner. Due to this, running queries on this form of data will be expensive.

MongoDB

This database represents an advantage for us because it allows the structure of data to change. Each pose we save is stored as a JSON styled document, making it easier to process. Also, this database scales horizontally, in comparison with first which scales vertically. This means that more traffic is handled by adding more servers. This is more appropriate for out type of data. Considering this we decided to choose Mongo Db [8].

Libraries and API's

Retrieving Joint Positions From The User:

OpenNI NITE

OpenNi is an API used to initialise devices, that contain depth sensors compatible with PrimeSens, form application code. We need it to start the depth sensor when the unity application is run, get data from it, and stop it on user’s command [6]. For detecting the human body and accurate joint position from Kinect depth images, NITE library is used.

Unity Kinect MS-SDK

This is a library that allows us to map data from Kinect to an avatar that represents the human body in unity world. It contains wrapper classes that make this transformation, using basic mathematical operations like matrix multiplications. As our application needs to initialise and stop the device, it contains drivers necessary for openNI and our sensor. Also, it provides higher level functions for using Kinect features. Currently, it does not have any other competitor libraries that use Kinect for skeleton tracking, hence why we are limited to use this one.

OpenPose plus SMPLify

An alternative to the OpenNI, NITE and MS combination is OpenPose plus SMPL. It is a 2D machine learning, real-time pose estimation library, where up to 18 body key points [9] are determined. This library yields great accuracy, hence dealing well with occlusion. A rule-based algorithm can detect joint alignment just with 2D data, for example, detect how bent the knees are without depth data, by calculating the relative position of angle joints.

However, we have not chosen this solution, because the creation of 3D audio feedback involves applying sound on objects in 3D space. Without the third axis, it is impossible to create objects that mimic our user and apply sound towards it from different directions. A solution to this problem would be the use of SMPLify. This is a machine learning software that can create 3D realistic human models from 2D data.

A common problem with this software is its’ lack of integration with Unity. This is required for the augmented part of our project causing problems as no documentation is provided on how to connect the output of library with the game engine we use. This might be considered as an alternative solution in the future of the project because SMPLify announced that they are working on creating a Unity plugin. Therefore, the only big problem remaining is connecting the avatar to continuous data. However, this cannot be achieved as SMPLify only works on a single image [10]. It is worth mentioning that this library uses high optimisation suggesting that the computation costs for each image are high. Computing a continuous stream of images would not give real-time results as the lag in the application would be high.

Speech Recognition

Google Speech API

This is a known and powerful API that recognizes around 100 languages. It runs on cloud services which we considered this as a disadvantage. If no internet connection is present, then the application becomes impractical for visually impaired users, as they have no way of interacting with it.

Microsoft Runtime Speech recognition

We needed a service for speech recognition to implement the voice command functionality. A disadvantage is that the user must install them on the local device to make our app work. We have chosen this SDK because it runs directly in windows 10. It does not require any calls over the internet so the delay in response time is minimal. This is an important factor for a reliable User Interface [11].

3D Sound Effect

Oculus Rift Spatializer

This plugin can be used in many game engines and has good documentation. Also, it offers the possibility to customize the effects of the sound. It has the disadvantage that it does not imitate the reflexion of sounds from walls very well. In this case, the virtual experience we are trying to achieve will not be immersive.

Microsoft HRTF (Head related transfer function) spatializer

Unity already contains this plugin, making it easy to use and configure. We have chosen the HRTF because it offers functions to amplify and reduce noise for different room sizes. Sounds can be heard from the left and right directions with walls also reflecting noise signals. Because of this, the user can easily locate the source of a given sound. This is a clear advantage over the Oculus Rift technology making it suitable to achieve our requirements.

algorithms

Pose prediction algorithms are vital for any VR/AR application that include an avatar. Only using the current position of a user to compute new skeleton tracking images for each frame might result in desynchronization [12]. To solve this, we consider some well-known mathematical algorithms.

Exponential Filter

This is a smoothing filter used for data when a sudden change appears. It applies one exponential function to smooth the input from the Kinect. After a period of experimentation, we had observed that this filter does not perfectly follow the data trend in the abstract movement of joints in motion [13].

Double Exponential Filter

This method predicts various joint positions using a simple linear regression function. We try to estimate the parameters of the equation when studying the movement of the user. The difference between parameters decreases exponentially over time, prioritising the new data from sensor. This method is preferred because, by applying the exponential function twice the data follows the trend of the real input [12].

Final Decision

Our final solution consists of a Kinect V2 Sensor to get joint position from the user, and Unity for creating the 3D sound effect. Our language of choice was C# for the local application and nodeJs for the backend programming. We chose to use MS-DK Kinect from the Unity Asset Store with OpenNi to control the Kinect sensor and retrieve data from it. This library also contained wrappers for transforming input into “Unity spatial coordinates” allowing us to map to an avatar. We also decided to use the double Exponential Filter to predict the continuous flow data and reduce the phenomenon of occlusion as much as possible. We used the Microsoft HRTF spatializer to create the 3D sound effect.

Design

System Architecture Diagram

Component A - User

This part of the system architecture diagram shows how the user interacts with the overall system. The headphones indicate the usage of 3D as well as granting the immersive expierence. The user here is intended to be Mark Pollock.

Component B - Kinect Sensor

This represents the Kinect 2 Sensor which retrieves data from component A (the user), using its depth sensor. It captures continuous frames of movement. It must be noted that this piece of hardware must be placed at a suitable height and a distance from the user to fully capture the skeleton image.

Component C - Database Module

It has the role of organising the bones of the avatar in serializable data structures that can be transformed into JSON format. In addition, it has classes that handle the transmission of the JSON to the server component in an asynchronous way.

Component D - 3D Sound

This component represents the output of the previous module and is what the patient hears. It can be modelled separately from the other parts, by choosing different sound scripts and by modifying their settings in the Unity Editor.

Component E - Avatar and Therapy Module

This is the core module of our application. It has the role of analysing the posture of the avatar and give feedback based on that. Furthermore, it coordinated which sound scripts to play and when allowing the correction of one joint at a time. In this way, we avoid a mixture of different noises that would not tell anything useful for the user.

Component F

This part receives the request from the main application and saves the JSON documents in the database. It acts as a secure bridge between the app and database. By running on the server it makes difficult for third-party to access the security information about the database.

Component G - Database

We used the database to store the poses a user might want to achieve during the exercises. It contains just the positions and rotations of the user in relation to the avatars bones

Component H - Kinect Module

This section gets the data from the sensor and transforms it to the Unity format. Also, we map it or our avatar so that it can mimic the patient’s movements.

Class Diagram

Sequence Diagram

Design Patterns used

Observer Pattern

This pattern is used to notify the observers when an event has occurred. We create loosely coupled code by just notifying the listeners when a changed has occurred, without calling specific methods. In our implementation, we used an Observer pattern to update the avatar when a changed is seen in the patient’s position. Also, we needed it for classes that wait for user’s command in order to execute some code. Once a word is recognised, the state of the subject changes and all its observers are alerted.

Command Pattern

This pattern was used to separate the invoker of an action from the receiver. The code is separated such that the object that calls the command has no idea about the specific implementation of it. A concrete example is the voice command menu. When we require to save something to the database, the code that recognises the user’s voice will invoke the “save pose” function call on the Button Manager class. This will call the execute method of “DataSender” class that contains the code for sending information to the server.

Singleton

A singleton was used when it was unnecessary to have more objects of the same type. For example, having multiple speech managers on runtime would mean having more calls on code that executes commands, while the user wanted to perform the action only once. Despite this, the design pattern has a clear scope, it is dangerous and not beneficial to overuse it - For example it can break thee SOLID principles of object-oriented design. Also, if not used adequately it can cause numerous bugs, especially in multithreaded applications.

Strategy Pattern

We had used this as a stronger alternative to the Template method pattern because it allows the code to be more flexible as it respects the open-closed principle. This means that future developers should just add new implementations that obey the current interfaces, instead of changing the current work. This considerably reduces the maintenance costs. The pattern was applied to the algorithm that recognises different poses and to the part that chooses which sound is played, making it possible to easily change the approach, if advanced solutions are found in the future.

Adapter Pattern

The adapter was required for connecting the Sensors raw data format into Unity spatial coordinates. We used some existing wrapper classes, which are the Adapters to call a specific function on the Kinect interface and change the returned result in a way that fulfilled our needs.

Decorator Pattern

This pattern allowed to regain more control on sounds that are played during runtime. The simple Unity behaviours “Play ()” and “Stop ()” were not enough to coordinate multiple sound sources on the same Game Object. We had to create a decorator class called “Sound Settings” that would add extra functions to the original methods of Unity’s AudioSource component. Doing this we avoided direct modifications of existing work, which would have been impossible.

Dependency Injection Pattern

TDD (test driven development) made use of loosely coupled code. We had to use the dependency injection to avoid creation of an object in concrete classes. Instead, we passed references in the constructors, making the code more flexible, reusable and testable.

Data Storage

Although this was not one of the main requirements, we needed a someway to store different postures that user might want to achieve during the exercise. We used the Mongo database which was deployed on mLab (a cloud service for mongoDB) because of its fast and easy configuration. To connect with it we created a server file in node.js that was deployed on Azure. For deploying we created a GitHub webhook that would put the new code on the cloud each time a pushed was made on the production branch.

Key functionalities

Gather 3D joints from the environment and map them to an Avatar

In order to track the movement of the user’s body in real time we used the Kinect device and the Kinect SDK created by Microsoft. It must be mentioned that the tracking algorithm implemented in the sensor works with face detection, which means that the patient must face the Kinect to have his limbs position monitored. Once data is received from the device, we must process it and transform from the raw format into Unity spatial coordinates. This transformation is done using wrapper classes from the MS-SDK library we imported.
Furthermore, we filtered the converted data using the double exponential filter method, described in the research part. The process of filtering is necessary because occlusion (misalignment of joints) can occur, meaning that Kinect gives inaccurate results when compared with real coordinates. Also, an approximation might be needed when some joint changes its position between different frames. If no approximation is done, a sudden difference in avatar’s position will be noticeable, making it desynchronized from the user’s movements.
After correctly configuring the above steps, we must map the retrieved information to game objects which are represented as bones of the avatar. All virtual bones are arranged into a hierarchy which denotes the human skeleton.

Develop an algorithm that detects incorrect pose of the user

This requirement had been quite challenging due to the numerous ways of tackling this problem. We required a unique which applied to a simplified version of our task. There are many machine learning algorithms that can indicate and perceive the pose of the user, but are computationally expensive. This factor is an obstacle for giving real time feedback. To avoid this problem, we designed a rule-based system, that gives feedback on the pose by considering the relation between multiple joints. For example, to determine how bent the left knee is, we calculate the angle between 3 points: left hip, ankle and knee. Those joints form 2 vectors in 3d space, which makes it easy to calculate the angle using this formula:

$𝜃 = \cos^{- 1} \frac{|P| x |Q|}{P x Q}$ where P, Q are the vectors defined by the joint points
We know that the knee has a correct position when the value of the angle is inside an interval of error.

As the definition of correct joint alignment depends on the user’s preferences, we define the conditions as variables which are calculated from the predefined choice of the avatar model. In our case the clients clearly defined what a correct pose means. Below is an example of what a imbalanced and balanced skeleton looks like.

Different body parts can affect human position independently of the other limbs, which is the case of spine base. The solution is to simply take their rotation in space. Despite there are distinct conditions, our system can handle all different approaches, because there are just a limited number of possibilities to consider.

Transform feedback in 3D sound modulation

As our end user was visual impaired, he required audio guidance to know how to correctly align his joints. Simple verbal instructions were not enough, because they could not reflect how the user was progressing in terms of his current state. In other words, the verbal instructions did not suggest when knees are bent at an angle of 5 degrees and when at 60 degrees. There was not differentiation between the two. Instead, we took the approach of trying to localise the joints with 3D sound. In real life, we can distinguish sources of noise by how far they are and from which direction. To avoid an unnecessary large learning curve, we can allocate keywords to each joint. This means we can combine the 3D sound affect principal along with this to create a tailored experience for the user. To accomplish an immersive experience, we used the Microsoft’s HRTF spatializer (Discussed in the research part) and attached sound to each bone of the avatar. When he moved the sound source moved as well.

As the role of avatar has been clearly defined, we need to create the 3D sound effects that represent the source of noise to be present in the scene. Taking the head of the humanoid (The avatar) as listener, position of those objects can be easily deduced from sound modulation. The patient will have the feeling of being substituted in place of the avatar in the virtual world hence being be able to realise where his joints are localised in space.

Despite the solution being good, we faced numerous challenges. Firstly, the spatializer was built to imitate reality as much as possible. This meant that large distances have a noticeable effect on the sound volume. In our case, even 5 degrees of motion were extremely important and needed to be signalled to be changed. To solve this, we created mock joints. In the Unity scene, these move exponentially in relation to the user’s joints on a specific axis. For example, if the left hip is not properly aligned with the rest of the body (pushed too far on the left) than there is a difference Dx between the correct location and the current location on the x axis. Considering that value of Dx is constrained by the human body shape and is rigid (i.e it belongs to [-1, 1]) we can compute a new position for a mock hip by applying the function $Exp(Dx) = e^{10xDx}$ . The newly created point will have the same values on y and z axis as the human joint while the x value will be given the previously mentioned function. The new object will then have a sound source attached to it. The schema below explains the concept:

Secondly, a wrong pose can be the result of multiple joints that need correction. In this case, we cannot play at once all the various noises mapped to those joints. This will cause large amount of confusion for the user. Instead we try to choose the correct joint that has the maximal error. However, different limbs have different acceptance intervals of distinct ranges. We solved the issue by considering the proportions of joints and limbs and calculating the percentage error obtained from each proportion.

Percentage Error
Main Menu with Voice activated Commands

All users should be able to interact with the application in a friendly and easy-to-use manner. Since our target audience are visually impaired people we had agreed with the client to construct a menu based on speech recognition. Our solution utilises a small grammar of several instructions combined with the speech recognition features of Windows 10. Furthermore, Kinect has a very performant microphone that allowed us to create a reliable user interface. The only problem with the speech recognition system is its vulnerability to external sounds. This is the case for all speech engines. Below is the getVoice command.

Database and Server

We needed a database to store possible posture that user might want to achieve. To connect that with our application we needed a server. On the client side we send post request to the server using C# WWW forms: The nodeJs part had the role to parse the JSON received, and save it to the database if the correct form was recognized. This consisted of the joints having 4 features: rotation, local-rotation, position and local-position. Local-position and local-rotation are just rotations and positions with respect to the parent bone.

Testing

Various testing stratigies were used to ensure that functionalities operated smoothly. We will discuss the various testing techniques below and the situation in which they were performed.

Functional Testing

This testing strategy is used to ensure that every function of the system works in relation to the initial requirements. The main technique used is called black-box [16] testing which does not involve any source code. The test is run on the input requirement which is then checked against a test with a defined expected output.
We performed functional testing using a step by step procedure [16]:

Identify the requirements. We had done this at a previous phase of our project iteration
Create appropriate input data for the test cases. Because the combination of joint alignments is a large number, we had to identify which joints are the most common. We then stored their positions and provided an input for our test
Determined the output depending on how functionality was defined: We had to consult with Dr Sayenko to find out what are the possible ways a patient tries to correct the pose. This is because the order of joints that are corrected affects the output of our application. In addition, we had a clear vision of how a corrected posture would like due to numerous previous meetings with our supervisors.
Run the test: Depending if our test was automatic or manual, we executed the test cases and obtained an output
Compare the output against the expected values: In this section we could define if the test was failing or not, giving us an idea of what was wrong with our software.

How we approached Testing

By now we had tested the functional requirements (described in the Requirement section) and the UI. We mainly used the black box technique as we were interested in how certain interfaces behave and not how they are implemented. When we discovered a test that failed and was hard to debug we had to refer to the Transparent technique (White box testing) and investigate the internal behaviour of the code. The main argument for not using it as much - is because the first method tends to focus on the implementation details, rather than of the main functionality. In the near future, we plan to test the application with our end user. His location in Ireland prevented us to do so. However, we discussed it with our clients and hope to ship the software to him or find a way to arrange a meeting.
Different Companies have their own way of defining the next test types, so we follow the ISTQB (International Software Testing Qualifications Board) definitions.

System Testing

We had to test our application from one end to another, so after we finished the requirements, Unit and Integration tests we performed this type of testing. In Unity and other Game Engines, it is very hard to automate testing, because you need to check interactions between objects. Unity provides a Test Runner tool, but it is appropriate for only Unit and Integration Tests. To automate the verification of the entire app we had to recreate the scenes from runtime, which would have been time consuming and inefficient. Instead, we manually checked the behaviour running through all features once. When the found faults were fixed we repeated the cycle, gradually improving the software.

Stress Testing

Stress testing was used to observe specific results for parameters near the breaking point. Regarding the project, it used to verify if the system handles the post requests from the unity application and successfully stores the data into a database under thousands of requests per second. On the Unity part of the app, we tested how well the algorithm gives feedback on small and fast changes in pose. Each test case has 4 different graphs named as Performance, Throughput, Errors and Tests. The user load is static throughout with the test duration lasting 1 minute. The performance graph indicates the average response time along with the throughput which shows how many requests it is receiving within a second. The error and test graph are self-explanatory.

Post request to server hosted on Azure

Post request to server hosted locally

On the Unity side, we test how the program detects normal movements of the joints at a constant rate. Furthermore, we examined how the speech recognition API behaves with different voices or accents

Performance Testing

We carried performance tests to see how the system behaves under conditions we expect, which are in the specification of the application. It was used to determine the QOS (quality of service) based on a large number of user requests. We created virtual users which simultaneously accessed the URL. It granted us critical information such the average response time and the number of errors generated. It increased our confidence in the scalability and reliability of the system as it could handle a large number of requests. The results below show a total of 2923 requests being received over a span of two minutes.

URL receiving large amount of requests

Unit Testing

We followed the TDD (test driven development) approach to design our application. We first defined unit tests to check each behaviour and then created code to pass the tests. By doing so we developed code with fewer bugs and better design. The tests were automated from the beginning using the NSubstitute [17] framework and Unity Test Edit Mode. [18] To test the components of the node.js part we used Google’s ARC tool to examine how it handles post requests.

Unit tests were also implemented on the nodeJS file to ensure correctness and robustness. Two main libraries were used which are Chai and Mocha. Chai is an assertion library which checks if the given input matches the pre-defined output. It follows the style of test driven development. However Mocha is javascript framework allowing one to create asynchronous tests which run serially. Below are example test cases which were run on the server file. It shows the Test, its expected result and whether it failed or not (Indicated by a green tick and Red cross).

Integration Testing

We used integration tests to see how Kinect module binds with the component that gives feedback based on the pose. In addition, we had to check the integration with the database. The Bottom-Up method was used to code the lower level components first and then join them together to form bigger clusters. Those combined components were tested which allowed it to move upwards in the hierarchical structure of the program, creating a fully functional app.

Acceptance Testing

Since we could not reach our end user we had to test the app on our colleagues and friends.

Test ID	Test Scenario
Test 1	Start Pose Correction
Test 2	Exit the application
Test 3	Save Pose
Test 4	Calibrate Headphones
Test 5	Repeat Insutructions
Test 6	Navigate Menu
Test 7	Left hip wrong position
Test 8	Right knee wrong position
Test 9	Pose Correct achieved

Client and User Feedback

From the Client (Jarnail Chudge)

"From my perspective, I think you and the team, but you in particular, as the leader of your group, have done an outstanding job. You were set a very difficult and challenging project, something new, something exploratory... and not only did you engage really well and sensitively with your key client contacts Mark and Dimitry, but the way in which you absorbed the information they provided and rose to the challenge to create a working prototype has been fantastic. You handled the technical challenges extremely well... reached out with questions and concerns when you had them, made sure there was a regular stream of project updates and statuses, which from a communication point of view is really important... because it can be all too easy to bury your head in the technology and lose sight of the impact on a person’s life you are trying to have. To the great credit of you and your team, your passion and commitment has been unrelenting over the course of this project. You set yourselves a high and challenging goal, and have made tremendous progress which I think has exceeded what we thought was possible given the challenges you were dealing with "

From the User (Mark Pollock)

“The complexities of multiple sensory impairment are difficult to understand for most people. Dolga and his team managed to appreciate those complexities and apply logical thinking to the problem. Both the concept and practical solution developed are way beyond what I expected. I believe that this solution will be a significant step along the path towards a cure for paralysis.” Mark Pollock – Explorer & Collaboration Catalyst at the Mark Pollock Trust"

From Microsoft Supervisor (Michael Vermeersch)

" Passion, seeking to understand, research, attention for detail, keeping us informed and engaged every step of the way, going beyond in helping us to next steps beyond the current project. I loved the outcome and I want to make sure that we can give it due attention for next steps. I can see it helping with the original use case, but also in other applications, such as vestibular rehabilitation, proprioception, convalescence,…"

Compatibility Testing

Compatibility testing was carried out to ensure the application did not have any discrepancy when working on a different OS.As the application is only intended for a windows platform we decided to test it on two of the main releases. These are windows 8.1 and windows 10. The results are shown below.

Operating System	Version	Architecture	Result
Windows	8.1	64 bit	Successful
Windows	10	64 bit	Successful

Responsive Design Testing

We tested our app on two main devices which have different features (e.g. screen size, memory). We had to make sure our app responds well to these changes. The results are summarised within the table below.

Device Model	Resolution	RAM Memory	Processor	Result
Lenovo G-50	1600 x 900p	4	Intel® Core™ i7-8700K Processor	Successful
Dell 8080	2560 x 1440p	8	Intel® Core™ i5-8600K Processor	Successful

Automated and Continous Integration Testing

As previously mentioned in Unit and Integration Testing we used NSubstitute to create automated tests. In addition, we wanted to have automatic builds as well. To achieve this, we used Travis CI, a continuous integration tool. We linked GitHub project with our Travis account and created bash scripts to build and run tests every time we pushed to GitHub. If the build or at least one test failed a notification email with details is sent to the developer who pushed to the branch. After fixing the failed test we reran all the tests to ensure that no new bug were introduced.

Evaluation

Bug Table

Achievement Table

Contribution Table

Critical Evaluation

The user interface and their experience is one of the most important things to bear in mind when developing any application. The user interface and overall feel of the application can determine whether the application is ultimately successful or not. During the course of the project there have been several changes to the user interface and immersive expierence, these have been both stylistic and structural changes. Our iterative design directed us to choosing a vocal user interface specially designed for visually impaired users.

Functionality

The user interface is what the user sees and interacts with. Functionality is what the user is able to do with the application. These are both equally weighted hence an important aspect to consider when developing a application. The main purpose of our application is to correct the posture of paralysed patients during therapy sessions. This was achieved by breaking down the overall functionality into several subsections. First we had to integrate the Kinect sensor with Unity and process the error-filled input data. It was used alongside the pose analysis algorithms we developed - Another important aspect of the applications functionality. Lastly, generating 3D sounds using the output of the previously mentioned algorithms was a core functionality. This was successfully integrated for a more immersive experience. Since we approached an open research problem, our solution is specific for our scenario and subject area, thus, it is not perfect. Voice Command features is another feature of our functionality that we added as an extra requirement.

Stability

The application we developed is stable and does not crash. Testing has ensured an error free software. Depending on the hardware components of the computer used, the application can run slower than normal, but it is still stable. Regarding reliability, our application gives thorough reliable feedback for pose correction. The only problem, as explained in previous sections, is occlusion. If this phenomenon appears, then the output from our algorithms is not completely correct. This is related to the input data received from the sensor not being accurate.

Efficiency

VirtualTherapy was designed to be as efficient as possible, especially since we have to give feedback in real time. We could not afford to use methods which were expensive in terms of time when researching potential pose recognition algorithms and 3D sound generation files. Furthermore, we optimized the algorithms we used as much as possible, both from a time and memory perspective.

Compatibility

Our software was designed using Microsoft technologies and is therefore targeted for their operating system. This means Windows 10 is the only OS on which the application will run. In terms of hardware, the computer running the application should have at least 4GB of RAM and a USB3 port.

Maintainability

We tried to develop the code following the SOLID design principles. This implies that our code is flexible, reusable and maintainable. By separating the code in multiple components we created a structure that is easy to understand. Furthermore, it is easy to change some components like the Kinect section, as technology is rapidly improving.

Project Management

We organised our project based on 3 major deadlines. The first was on the 12th December 2017, by which we had to complete the research phase of our project and propose a credible solution. The second point in our timeline was the 3rd of March 2018, on this date we had to present a prototype of our work. Finally, the last major due date is the 22nd of March 2018, whereby a complete version of the application needs to be submitted. Along with these milestones, we created internal deadlines to assure we would deliver the product on time. We created a Gantt chart that shows how we organised our time.

Future Work

The project could be improved in various ways if an extra 3 months were given. The quality of the sounds used would be adjusted in terms of replicating natural audio. With regards to the avatar, we would remove the anime-type avatar and replace it with a realistic representation of a human avatar. Although this would not benefit the user, it would provide the application with a more professional feel. One of the main problems we had faced was the obscurity caused by occlusion. With extra time given we could improve the way we deal with occlusion. One possible solution would be to implement deep learning. However, the deep learning algorithm should be developed from scratch and use RGB depth images as input. Another way we could take advantage of those 3 months would be if a more cost-effective sensor were to become available. In such a case, a more efficient and accurate sensor would give us more accurate data from the user, creating a more effective application. Although not required by the client, a possible addition to the application would be to add a wider variety of physiotherapeutic exercise, such as taking a step forward or backwards. This would add more functionality to the application, opening it up to a larger user base.