Virtual Therapy is a 3D audio feedback system created in Unity to help those with visual impairment and lower-back disability. It focuses on improving and correcting the posture of user's through various iterations of pleasant sounds.
The problem is the need for automatic computer feedback in therapy sessions that would allow the user understand the alignment of his joints. Our approach uses sensors to gather users joint position and rule-based algorithms for pose detection. 3D sound was used as a form of intuitive feedback on how to correct the pose.
We successfully developed a system that tackles this open research problem using innovative technologies. We hope this can be used globally, and positively affect those who require it.
The assigned name of the project is “Cognitive rewiring through mixed reality”. The affiliated organisation is Microsoft whom require an application for their user Mark Pollock. VirtualTherapy (Name of the application) is created in such a way, that it can be used alongside the current mechanism of a user correcting his/her position with the help of a psychiatrist. The user-interface is developed in mind of further improvements for software engineers in order to provide a friendly interface when using the application, but also for the trainer/psychiatrist to start and stop the program if required so.
The group of people part of this project and whom the solution is intended for are as follows: Jarnail Chudge – Client and Project Support, Mark Pollock – Client and Intended user, and finally Dimitry Sayenko – Subject matter expert. Through email communication and skype meetings, we were guided to the right path in terms of the correct technologies and intended hardware to be used for this project.
The project is designed to tackle the issue of correcting a user’s posture given the fact that they are visually impaired and paralyzed waist down. The application is trying to give the user more freedom by relying only on pleasant sound iterations to indicate if they are correctly aligned.
In order to obtain client requirements, we had used the method of email communication to obtain more knowledge of the project. Below we will talk about why we chose this and why it was appropriate for our given situation.
Reasons for choosing this method?An online Questionnaire was sent to the Mark's physiatrist to retrieve specific information in order to create the solution that was required to help him. The information was valuable because as our project would circulate around these requirements. Based on this we could focus on certain joints of the human body opposed to the whole skeleton which would have been beyond the scope of the project.
After gaining an insight of what the project entails we finally arranged a suitable date to skype our client. Questions were thought of prior to the interview. These skype meetings were set every fornite to update our client and user on our progress.
Why did we choose this?
ID | Requirement | Type |
---|---|---|
Must | ||
R1 | Gather 3D coordinates of joints from the environment | Functional |
R2 | Filter inaccurate data from the sensors to be removed | Functional |
R3 | Reconstruct the skeleton in the Unity environment and map the data received from the sensors | Non-Functional |
R4 | Consider the following parts of the body: knee, heel, toe, ankles, hips and mid-back | Non-Functional |
R5 | Develop an algorithm that recognizes wrong alignment of specific joints | Non-Functional |
R6 | Transform coordinates and angles into 3D sound modulation | Functional |
R7 | Guide User to stand correct in relation to: Trunk / mid-back Extension; Hip Extension; Knee Extension | Functional |
Should | ||
R1 | Create a Audio User Interface - Main Menu | Functional |
R2 | Guide the User to stand correctly in relation to: Trunk / mid-back Extension; Hip Extension; Knee Extension | Functional |
Could | ||
R1 | Consider other body parts such as shoulders and chest | Non-Functional |
R2 | Save multiple posture that are targeted | Non-Functional |
R3 | Create a server for connecting to the database | Functional |
R4 | Create a mongoDB database | Non-Functional |
Use Case | Calibration of the application |
---|---|
ID | UC1 |
Brief Description | Ensures headphones are correctly worn and 3D sound can be heard distincly from each ear |
Primary Actors | User |
Secondary Actors |
|
Preconditions | None |
Main Flow |
1. The user runs the application on the machine 2. The system displays the main page with voice instructions guiding the user to select an option 3. Calibration is chosen (Vocally or is clicked on) |
Postconditions | None |
Alternative Flows | None |
Use Case | Correct User's Posture |
---|---|
ID | UC2 |
Brief Description | The user will move his joints according to sounds in 3D space until he is standing correctly |
Primary Actors | User |
Secondary Actors |
|
Preconditions | The kinect sensor must be enabled/working for it to recongize movement. It must also be placed at an appropriate angle and height. |
Main Flow |
1. After the calibration process, the user says "Start" as indicated by the instructions 2. The screen changes to an avatar and smaller screen showing the image of the user's joints. This information is for the trainer/psychiatrist. 3. A quick rundown of the sounds are played. These indicate which sound is for which joint 4. The pose correction algorithm is started and the user is guided to correctly stand. |
Postconditions | None |
Alternative Flows | Invalid Query: Another option is selected |
ID | UC2.1 |
Brief Description | The user exits the system |
Primary Actors | User |
Secondary Actors | System |
Preconditions | None |
Main Flow |
1. The user says "Exit" 2. Application closes |
Use Case | Store 3D coordinates of the current position |
---|---|
ID | UC3 |
Brief Description | The system captures the current position of the user and stores it into a database. |
Primary Actors |
|
Secondary Actors | System |
Preconditions | None |
Main Flow |
1. The user says "Save" 2. Rotation, localRotation, position and local position are captured for each joint 3. A JSON document is created for each "save" and stored in a collection on a mongoDb server |
Postconditions | None |
Alternative Flows | None |
Sensors are a key aspect of our project. We required high quality six degrees of freedom devices to get the position in 3D space of the user's joints. If the data received is not accurate enough the given feedback will have faults.
It is a promising motion track solution that uses the magnetic source to detect the position of smaller devices mounted on the user's joints. Data received from this sensor is in real time, and their experiments suggest that there is no latency. Also, the occlusion does not represent a problem in this scenario, because main joints have individual sensors. [1]
Although this is a good solution in terms of data precision, it is very expensive and exceeds our budget for the project. Furthermore, it requires a considerable amount of time to integrate it with a software that can produce 3D sounds.
It is a full body motion analysis system capable of giving 3D joint angles, the orientation of bones and centre of mass [2]. The first disadvantage is can be seen by cost; 35,350 Euro is way beyond our budget. Secondly, it has a small delay in sending data by wireless communication and using cables would not be practical for the user. Although, compared with Polhemus it has the advantage that it gives the centre of mass information.
Our supervisor Simon suggested them as they are cheap and offer reasonable results when gathering joint position. Main obstacle we faced with this sensor is that it display real-time data only using a android application. No existing API makes it considerably hard, to take a continuous flow of body segments positions and use them in our own software for producing 3D sound. Being composed of 6 sensors, it would be impossible for us to separately track all 20 joints that we need for calculating the pose of a user [3].
Kinect is a depth sensor created by Microsoft that gives the skeleton position of a user in 3D space, using infrared light and a RGB camera. It is a cheap and robust solution, with good documentation and online support. We decided to go with Kinect, because compared with all the previous solutions, it offers the possibility of connecting with Unity and because it provides a perfect balance between cost and performance. This part was vital for us because we needed a game engine or other software capable of producing mixed reality applications. Also, it provides an SDK that returns the position of up to 25 joints [4]. The device has very performant audio capturing functions as well. This is an important aspect of designing a voice command-based user interface for visually impaired persons. When compared with the first solutions it does not offer such good results for occlusion. However, we try to reduce this impact by approximating the position of interfered joints and filtering data.
The VirtualTherapy project is a mixed reality application, so we required an editor that would facilitate the construction of such a software. This can done by different gaming engines. They are described below:
Unreal is a powerful game engine and has support for VR/AR applications. However, we have not used it because it does not offer support for integrating Kinect. This is a major downside, since Kinect needs wrapper functions to transform the raw data from the sensors.
Unity is a game engine that has a great support for virtual reality applications. The biggest advantage is that Kinect has plugins for Unity. This makes it possible to transfer data from the real world to the virtual world. We have chosen Unity, because it has great built in functions for creating immersive 3D sound feedback but also an accurate depth imager. The choice was also heavily affected by the large amount of documentation available for Unity and the support given on forums. Problems could be solved more quickly as they may have previously been encountered.
The last 2 are the most popular henceforth we have chosen to program in C#. The reason being, we are familiar with object-oriented programming languages such as Java, which is similar to C#. If we had chosen to write in JavaScript, there would have been issues with script compilation for premade libraries which we will discuss later below. Apart from this, there are no performance advantages of one language over the other.
Server Side Development
For the server part we have used Node JS. Alternatives include Django, PHP, Ruby, but node.js has multiple advantages over all. We have decided to use Node JS due to previous experience. This saved time as we did not have to learn a new backend programming language. It also allows the construction of real time web application, in which both the client and the server can initiate requests. Being an extension of javascript the documentation provided showed that connection of database drivers were can be easily implemented. Also, it can handle a huge number of simultaneous requests which means high scalability. This is important as Microsoft would like to further develop on the application in the future.
Potential DatabasesMySQL is a powerful database, known for high performance and data security. It is one of the best options for complex queries. However, data must follow the same structure. The structure of our data might change, based on the pose chosen by the user. This means the relationship between joints and their number will differ a lot [7]. Also, we have a lot of relationships between the data which will mean introducing many new tables and linking them together in the most efficient manner. Due to this, running queries on this form of data will be expensive.
This database represents an advantage for us because it allows the structure of data to change. Each pose we save is stored as a JSON styled document, making it easier to process. Also, this database scales horizontally, in comparison with first which scales vertically. This means that more traffic is handled by adding more servers. This is more appropriate for out type of data. Considering this we decided to choose Mongo Db [8].
OpenNi is an API used to initialise devices, that contain depth sensors compatible with PrimeSens, form application code. We need it to start the depth sensor when the unity application is run, get data from it, and stop it on user’s command [6]. For detecting the human body and accurate joint position from Kinect depth images, NITE library is used.
This is a library that allows us to map data from Kinect to an avatar that represents the human body in unity world. It contains wrapper classes that make this transformation, using basic mathematical operations like matrix multiplications. As our application needs to initialise and stop the device, it contains drivers necessary for openNI and our sensor. Also, it provides higher level functions for using Kinect features. Currently, it does not have any other competitor libraries that use Kinect for skeleton tracking, hence why we are limited to use this one.
An alternative to the OpenNI, NITE and MS combination is OpenPose plus SMPL. It is a 2D machine learning, real-time pose estimation library, where up to 18 body key points [9] are determined. This library yields great accuracy, hence dealing well with occlusion. A rule-based algorithm can detect joint alignment just with 2D data, for example, detect how bent the knees are without depth data, by calculating the relative position of angle joints.
However, we have not chosen this solution, because the creation of 3D audio feedback involves applying sound on objects in 3D space. Without the third axis, it is impossible to create objects that mimic our user and apply sound towards it from different directions. A solution to this problem would be the use of SMPLify. This is a machine learning software that can create 3D realistic human models from 2D data.
A common problem with this software is its’ lack of integration with Unity. This is required for the augmented part of our project causing problems as no documentation is provided on how to connect the output of library with the game engine we use. This might be considered as an alternative solution in the future of the project because SMPLify announced that they are working on creating a Unity plugin. Therefore, the only big problem remaining is connecting the avatar to continuous data. However, this cannot be achieved as SMPLify only works on a single image [10]. It is worth mentioning that this library uses high optimisation suggesting that the computation costs for each image are high. Computing a continuous stream of images would not give real-time results as the lag in the application would be high.
Speech RecognitionThis is a known and powerful API that recognizes around 100 languages. It runs on cloud services which we considered this as a disadvantage. If no internet connection is present, then the application becomes impractical for visually impaired users, as they have no way of interacting with it.
We needed a service for speech recognition to implement the voice command functionality. A disadvantage is that the user must install them on the local device to make our app work. We have chosen this SDK because it runs directly in windows 10. It does not require any calls over the internet so the delay in response time is minimal. This is an important factor for a reliable User Interface [11].
3D Sound EffectThis plugin can be used in many game engines and has good documentation. Also, it offers the possibility to customize the effects of the sound. It has the disadvantage that it does not imitate the reflexion of sounds from walls very well. In this case, the virtual experience we are trying to achieve will not be immersive.
Unity already contains this plugin, making it easy to use and configure. We have chosen the HRTF because it offers functions to amplify and reduce noise for different room sizes. Sounds can be heard from the left and right directions with walls also reflecting noise signals. Because of this, the user can easily locate the source of a given sound. This is a clear advantage over the Oculus Rift technology making it suitable to achieve our requirements.
Pose prediction algorithms are vital for any VR/AR application that include an avatar. Only using the current position of a user to compute new skeleton tracking images for each frame might result in desynchronization [12]. To solve this, we consider some well-known mathematical algorithms.
This is a smoothing filter used for data when a sudden change appears. It applies one exponential function to smooth the input from the Kinect. After a period of experimentation, we had observed that this filter does not perfectly follow the data trend in the abstract movement of joints in motion [13].
This method predicts various joint positions using a simple linear regression function. We try to estimate the parameters of the equation when studying the movement of the user. The difference between parameters decreases exponentially over time, prioritising the new data from sensor. This method is preferred because, by applying the exponential function twice the data follows the trend of the real input [12].
Our final solution consists of a Kinect V2 Sensor to get joint position from the user, and Unity for creating the 3D sound effect. Our language of choice was C# for the local application and nodeJs for the backend programming. We chose to use MS-DK Kinect from the Unity Asset Store with OpenNi to control the Kinect sensor and retrieve data from it. This library also contained wrappers for transforming input into “Unity spatial coordinates” allowing us to map to an avatar. We also decided to use the double Exponential Filter to predict the continuous flow data and reduce the phenomenon of occlusion as much as possible. We used the Microsoft HRTF spatializer to create the 3D sound effect.
As part of the HCI requirement we created sketches to visualise how our solution would work. However, as the app does provide visual interaction for the user, we have made an overview of the components that it may consist of. The sketches represent how (before realising that we would require a different approach) the sensors will interact with movement. The initial high level sketch shows the position of all the sensors that will be used to track the position of the body in 3D space, thus allowing us to, in terms of those positions, know whether or not the user is standing using a correct pose or not. The Sensor representation shows how movement will be processed by the sensor, then outputted as useful data that we can then manipulate to make the user stand correctly . Below are the 2 main sketches.
The sketches then allowed us create addtional architectural designs. It is important to note that this is not the way the solution is implemented, but it is fact the visualisation of two main design iterations. The final desgin was crated after conversing with the client and project support. We have placed further information regarding the final iteration in our Design Section. In the designs shown, we created an initial sketch that had the sensors outputting their data to an Arduino, which is an open software based on simple hardware and software, ideal for a project such as this, where the amount of data collected isn't too large. This Arduino would then process the data using algorithms that our team has created, to produce 3D sound and guide the user into a standing position. After some communication with the client, we decided to make use of the Hololens as a 3D sound device to output sounds to the user.
The wireframes below show the last form of iteration we took. The sketches were a preliminary design, displaying our initial idea of approaching the project. We must stress the fact, that the application is not based on visual input as it is a 3D sound project. Hence the lack of suitable wireframes.
Below are the prototype images of our application.
This part of the system architecture diagram shows how the user interacts with the overall system. The headphones indicate the usage of 3D as well as granting the immersive expierence. The user here is intended to be Mark Pollock.
Component B - Kinect SensorThis represents the Kinect 2 Sensor which retrieves data from component A (the user), using its depth sensor. It captures continuous frames of movement. It must be noted that this piece of hardware must be placed at a suitable height and a distance from the user to fully capture the skeleton image.
Component C - Database ModuleIt has the role of organising the bones of the avatar in serializable data structures that can be transformed into JSON format. In addition, it has classes that handle the transmission of the JSON to the server component in an asynchronous way.
Component D - 3D SoundThis component represents the output of the previous module and is what the patient hears. It can be modelled separately from the other parts, by choosing different sound scripts and by modifying their settings in the Unity Editor.
Component E - Avatar and Therapy ModuleThis is the core module of our application. It has the role of analysing the posture of the avatar and give feedback based on that. Furthermore, it coordinated which sound scripts to play and when allowing the correction of one joint at a time. In this way, we avoid a mixture of different noises that would not tell anything useful for the user.
Component FThis part receives the request from the main application and saves the JSON documents in the database. It acts as a secure bridge between the app and database. By running on the server it makes difficult for third-party to access the security information about the database.
Component G - DatabaseWe used the database to store the poses a user might want to achieve during the exercises. It contains just the positions and rotations of the user in relation to the avatars bones
Component H - Kinect ModuleThis section gets the data from the sensor and transforms it to the Unity format. Also, we map it or our avatar so that it can mimic the patient’s movements.
This pattern is used to notify the observers when an event has occurred. We create loosely coupled code by just notifying the listeners when a changed has occurred, without calling specific methods. In our implementation, we used an Observer pattern to update the avatar when a changed is seen in the patient’s position. Also, we needed it for classes that wait for user’s command in order to execute some code. Once a word is recognised, the state of the subject changes and all its observers are alerted.
Command PatternThis pattern was used to separate the invoker of an action from the receiver. The code is separated such that the object that calls the command has no idea about the specific implementation of it. A concrete example is the voice command menu. When we require to save something to the database, the code that recognises the user’s voice will invoke the “save pose” function call on the Button Manager class. This will call the execute method of “DataSender” class that contains the code for sending information to the server.
SingletonA singleton was used when it was unnecessary to have more objects of the same type. For example, having multiple speech managers on runtime would mean having more calls on code that executes commands, while the user wanted to perform the action only once. Despite this, the design pattern has a clear scope, it is dangerous and not beneficial to overuse it - For example it can break thee SOLID principles of object-oriented design. Also, if not used adequately it can cause numerous bugs, especially in multithreaded applications.
Strategy PatternWe had used this as a stronger alternative to the Template method pattern because it allows the code to be more flexible as it respects the open-closed principle. This means that future developers should just add new implementations that obey the current interfaces, instead of changing the current work. This considerably reduces the maintenance costs. The pattern was applied to the algorithm that recognises different poses and to the part that chooses which sound is played, making it possible to easily change the approach, if advanced solutions are found in the future.
Adapter PatternThe adapter was required for connecting the Sensors raw data format into Unity spatial coordinates. We used some existing wrapper classes, which are the Adapters to call a specific function on the Kinect interface and change the returned result in a way that fulfilled our needs.
Decorator PatternThis pattern allowed to regain more control on sounds that are played during runtime. The simple Unity behaviours “Play ()” and “Stop ()” were not enough to coordinate multiple sound sources on the same Game Object. We had to create a decorator class called “Sound Settings” that would add extra functions to the original methods of Unity’s AudioSource component. Doing this we avoided direct modifications of existing work, which would have been impossible.
Dependency Injection PatternTDD (test driven development) made use of loosely coupled code. We had to use the dependency injection to avoid creation of an object in concrete classes. Instead, we passed references in the constructors, making the code more flexible, reusable and testable.
Although this was not one of the main requirements, we needed a someway to store different postures that user might want to achieve during the exercise. We used the Mongo database which was deployed on mLab (a cloud service for mongoDB) because of its fast and easy configuration. To connect with it we created a server file in node.js that was deployed on Azure. For deploying we created a GitHub webhook that would put the new code on the cloud each time a pushed was made on the production branch.
In order to track the movement of the user’s body in real time we used the Kinect device and the Kinect SDK created by Microsoft. It must be mentioned that the tracking algorithm implemented in the sensor works with face detection, which means that the patient must face the Kinect to have his limbs position monitored. Once data is received from the device, we must process it and transform from the raw format into Unity spatial coordinates. This transformation is done using wrapper classes from the MS-SDK library we imported.
Furthermore, we filtered the converted data using the double exponential filter method, described in the research part. The process of filtering is necessary because occlusion (misalignment of joints) can occur, meaning that Kinect gives inaccurate results when compared with real coordinates. Also, an approximation might be needed when some joint changes its position between different frames. If no approximation is done, a sudden difference in avatar’s position will be noticeable, making it desynchronized from the user’s movements.
After correctly configuring the above steps, we must map the retrieved information to game objects which are represented as bones of the avatar. All virtual bones are arranged into a hierarchy which denotes the human skeleton.
This requirement had been quite challenging due to the numerous ways of tackling this problem. We required a unique which applied to a simplified version of our task. There are many machine learning algorithms that can indicate and perceive the pose of the user, but are computationally expensive. This factor is an obstacle for giving real time feedback. To avoid this problem, we designed a rule-based system, that gives feedback on the pose by considering the relation between multiple joints. For example, to determine how bent the left knee is, we calculate the angle between 3 points: left hip, ankle and knee. Those joints form 2 vectors in 3d space, which makes it easy to calculate the angle using this formula:
As our end user was visual impaired, he required audio guidance to know how to correctly align his joints. Simple verbal instructions were not enough, because they could not reflect how the user was progressing in terms of his current state. In other words, the verbal instructions did not suggest when knees are bent at an angle of 5 degrees and when at 60 degrees. There was not differentiation between the two. Instead, we took the approach of trying to localise the joints with 3D sound. In real life, we can distinguish sources of noise by how far they are and from which direction. To avoid an unnecessary large learning curve, we can allocate keywords to each joint. This means we can combine the 3D sound affect principal along with this to create a tailored experience for the user.
To accomplish an immersive experience, we used the Microsoft’s HRTF spatializer (Discussed in the research part) and attached sound to each bone of the avatar. When he moved the sound source moved as well.
As the role of avatar has been clearly defined, we need to create the 3D sound effects that represent the source of noise to be present in the scene. Taking the head of the humanoid (The avatar) as listener, position of those objects can be easily deduced from sound modulation. The patient will have the feeling of being substituted in place of the avatar in the virtual world hence being be able to realise where his joints are localised in space.
Despite the solution being good, we faced numerous challenges. Firstly, the spatializer was built to imitate reality as much as possible. This meant that large distances have a noticeable effect on the sound volume. In our case, even 5 degrees of motion were extremely important and needed to be signalled to be changed. To solve this, we created mock joints. In the Unity scene, these move exponentially in relation to the user’s joints on a specific axis. For example, if the left hip is not properly aligned with the rest of the body (pushed too far on the left) than there is a difference Dx between the correct location and the current location on the x axis. Considering that value of Dx is constrained by the human body shape and is rigid (i.e it belongs to [-1, 1]) we can compute a new position for a mock hip by applying the function
.
The newly created point will have the same values on y and z axis as the human joint while the x value will be given the previously mentioned function. The new object will then have a sound source attached to it. The schema below explains the concept:
All users should be able to interact with the application in a friendly and easy-to-use manner. Since our target audience are visually impaired people we had agreed with the client to construct a menu based on speech recognition. Our solution utilises a small grammar of several instructions combined with the speech recognition features of Windows 10. Furthermore, Kinect has a very performant microphone that allowed us to create a reliable user interface. The only problem with the speech recognition system is its vulnerability to external sounds. This is the case for all speech engines. Below is the getVoice command.
Database and ServerWe needed a database to store possible posture that user might want to achieve. To connect that with our application we needed a server. On the client side we send post request to the server using C# WWW forms: The nodeJs part had the role to parse the JSON received, and save it to the database if the correct form was recognized. This consisted of the joints having 4 features: rotation, local-rotation, position and local-position. Local-position and local-rotation are just rotations and positions with respect to the parent bone.
This testing strategy is used to ensure that every function of the system works in relation to the initial requirements. The main technique used is called black-box [16] testing which does not involve any source code. The test is run on the input requirement which is then checked against a test with a defined expected output.
We performed functional testing using a step by step procedure [16]:
By now we had tested the functional requirements (described in the Requirement section) and the UI. We mainly used the black box technique as we were interested in how certain interfaces behave and not how they are implemented. When we discovered a test that failed and was hard to debug we had to refer to the Transparent technique (White box testing) and investigate the internal behaviour of the code. The main argument for not using it as much - is because the first method tends to focus on the implementation details, rather than of the main functionality.
In the near future, we plan to test the application with our end user. His location in Ireland prevented us to do so. However, we discussed it with our clients and hope to ship the software to him or find a way to arrange a meeting.
Different Companies have their own way of defining the next test types, so we follow the ISTQB (International Software Testing Qualifications Board) definitions.
We had to test our application from one end to another, so after we finished the requirements, Unit and Integration tests we performed this type of testing. In Unity and other Game Engines, it is very hard to automate testing, because you need to check interactions between objects. Unity provides a Test Runner tool, but it is appropriate for only Unit and Integration Tests. To automate the verification of the entire app we had to recreate the scenes from runtime, which would have been time consuming and inefficient. Instead, we manually checked the behaviour running through all features once. When the found faults were fixed we repeated the cycle, gradually improving the software.
Stress Testing
Stress testing was used to observe specific results for parameters near the breaking point. Regarding the project, it used to verify if the system handles the post requests from the unity application and successfully stores the data into a database under thousands of requests per second. On the Unity part of the app, we tested how well the algorithm gives feedback on small and fast changes in pose.
Each test case has 4 different graphs named as Performance, Throughput, Errors and Tests. The user load is static throughout with the test duration lasting 1 minute. The performance graph indicates the average response time along with the throughput which shows how many requests it is receiving within a second. The error and test graph are self-explanatory.
On the Unity side, we test how the program detects normal movements of the joints at a constant rate. Furthermore, we examined how the speech recognition API behaves with different voices or accents
Performance TestingWe carried performance tests to see how the system behaves under conditions we expect, which are in the specification of the application. It was used to determine the QOS (quality of service) based on a large number of user requests. We created virtual users which simultaneously accessed the URL. It granted us critical information such the average response time and the number of errors generated. It increased our confidence in the scalability and reliability of the system as it could handle a large number of requests. The results below show a total of 2923 requests being received over a span of two minutes.
We followed the TDD (test driven development) approach to design our application. We first defined unit tests to check each behaviour and then created code to pass the tests. By doing so we developed code with fewer bugs and better design. The tests were automated from the beginning using the NSubstitute [17] framework and Unity Test Edit Mode. [18] To test the components of the node.js part we used Google’s ARC tool to examine how it handles post requests.
Unit tests were also implemented on the nodeJS file to ensure correctness and robustness. Two main libraries were used which are Chai and Mocha. Chai is an assertion library which checks if the given input matches the pre-defined output. It follows the style of test driven development. However Mocha is javascript framework allowing one to create asynchronous tests which run serially. Below are example test cases which were run on the server file. It shows the Test, its expected result and whether it failed or not (Indicated by a green tick and Red cross).
We used integration tests to see how Kinect module binds with the component that gives feedback based on the pose. In addition, we had to check the integration with the database. The Bottom-Up method was used to code the lower level components first and then join them together to form bigger clusters. Those combined components were tested which allowed it to move upwards in the hierarchical structure of the program, creating a fully functional app.
Since we could not reach our end user we had to test the app on our colleagues and friends.
Test ID | Test Scenario |
---|---|
Test 1 | Start Pose Correction |
Test 2 | Exit the application |
Test 3 | Save Pose |
Test 4 | Calibrate Headphones |
Test 5 | Repeat Insutructions |
Test 6 | Navigate Menu |
Test 7 | Left hip wrong position |
Test 8 | Right knee wrong position |
Test 9 | Pose Correct achieved |
From the Client (Jarnail Chudge)
"From my perspective, I think you and the team, but you in particular, as the leader of your group, have done an outstanding job. You were set a very difficult and challenging project, something new, something exploratory... and not only did you engage really well and sensitively with your key client contacts Mark and Dimitry, but the way in which you absorbed the information they provided and rose to the challenge to create a working prototype has been fantastic. You handled the technical challenges extremely well... reached out with questions and concerns when you had them, made sure there was a regular stream of project updates and statuses, which from a communication point of view is really important... because it can be all too easy to bury your head in the technology and lose sight of the impact on a person’s life you are trying to have. To the great credit of you and your team, your passion and commitment has been unrelenting over the course of this project. You set yourselves a high and challenging goal, and have made tremendous progress which I think has exceeded what we thought was possible given the challenges you were dealing with "
From the User (Mark Pollock)
“The complexities of multiple sensory impairment are difficult to understand for most people. Dolga and his team managed to appreciate those complexities and apply logical thinking to the problem. Both the concept and practical solution developed are way beyond what I expected. I believe that this solution will be a significant step along the path towards a cure for paralysis.” Mark Pollock – Explorer & Collaboration Catalyst at the Mark Pollock Trust"
From Microsoft Supervisor (Michael Vermeersch)
" Passion, seeking to understand, research, attention for detail, keeping us informed and engaged every step of the way, going beyond in helping us to next steps beyond the current project. I loved the outcome and I want to make sure that we can give it due attention for next steps. I can see it helping with the original use case, but also in other applications, such as vestibular rehabilitation, proprioception, convalescence,…"
Compatibility testing was carried out to ensure the application did not have any discrepancy when working on a different OS.As the application is only intended for a windows platform we decided to test it on two of the main releases. These are windows 8.1 and windows 10. The results are shown below.
Operating System | Version | Architecture | Result |
---|---|---|---|
Windows | 8.1 | 64 bit | Successful |
Windows | 10 | 64 bit | Successful |
We tested our app on two main devices which have different features (e.g. screen size, memory). We had to make sure our app responds well to these changes. The results are summarised within the table below.
Device Model | Resolution | RAM Memory | Processor | Result |
---|---|---|---|---|
Lenovo G-50 | 1600 x 900p | 4 | Intel® Core™ i7-8700K Processor | Successful |
Dell 8080 | 2560 x 1440p | 8 | Intel® Core™ i5-8600K Processor | Successful |
As previously mentioned in Unit and Integration Testing we used NSubstitute to create automated tests. In addition, we wanted to have automatic builds as well. To achieve this, we used Travis CI, a continuous integration tool. We linked GitHub project with our Travis account and created bash scripts to build and run tests every time we pushed to GitHub. If the build or at least one test failed a notification email with details is sent to the developer who pushed to the branch. After fixing the failed test we reran all the tests to ensure that no new bug were introduced.
The user interface and their experience is one of the most important things to bear in mind when developing any application. The user interface and overall feel of the application can determine whether the application is ultimately successful or not. During the course of the project there have been several changes to the user interface and immersive expierence, these have been both stylistic and structural changes. Our iterative design directed us to choosing a vocal user interface specially designed for visually impaired users.
FunctionalityThe user interface is what the user sees and interacts with. Functionality is what the user is able to do with the application. These are both equally weighted hence an important aspect to consider when developing a application. The main purpose of our application is to correct the posture of paralysed patients during therapy sessions. This was achieved by breaking down the overall functionality into several subsections. First we had to integrate the Kinect sensor with Unity and process the error-filled input data. It was used alongside the pose analysis algorithms we developed - Another important aspect of the applications functionality. Lastly, generating 3D sounds using the output of the previously mentioned algorithms was a core functionality. This was successfully integrated for a more immersive experience. Since we approached an open research problem, our solution is specific for our scenario and subject area, thus, it is not perfect. Voice Command features is another feature of our functionality that we added as an extra requirement.
StabilityThe application we developed is stable and does not crash. Testing has ensured an error free software. Depending on the hardware components of the computer used, the application can run slower than normal, but it is still stable. Regarding reliability, our application gives thorough reliable feedback for pose correction. The only problem, as explained in previous sections, is occlusion. If this phenomenon appears, then the output from our algorithms is not completely correct. This is related to the input data received from the sensor not being accurate.
EfficiencyVirtualTherapy was designed to be as efficient as possible, especially since we have to give feedback in real time. We could not afford to use methods which were expensive in terms of time when researching potential pose recognition algorithms and 3D sound generation files. Furthermore, we optimized the algorithms we used as much as possible, both from a time and memory perspective.
CompatibilityOur software was designed using Microsoft technologies and is therefore targeted for their operating system. This means Windows 10 is the only OS on which the application will run. In terms of hardware, the computer running the application should have at least 4GB of RAM and a USB3 port.
MaintainabilityWe tried to develop the code following the SOLID design principles. This implies that our code is flexible, reusable and maintainable. By separating the code in multiple components we created a structure that is easy to understand. Furthermore, it is easy to change some components like the Kinect section, as technology is rapidly improving.
Project ManagementWe organised our project based on 3 major deadlines. The first was on the 12th December 2017, by which we had to complete the research phase of our project and propose a credible solution. The second point in our timeline was the 3rd of March 2018, on this date we had to present a prototype of our work. Finally, the last major due date is the 22nd of March 2018, whereby a complete version of the application needs to be submitted. Along with these milestones, we created internal deadlines to assure we would deliver the product on time. We created a Gantt chart that shows how we organised our time.
The project could be improved in various ways if an extra 3 months were given. The quality of the sounds used would be adjusted in terms of replicating natural audio. With regards to the avatar, we would remove the anime-type avatar and replace it with a realistic representation of a human avatar. Although this would not benefit the user, it would provide the application with a more professional feel. One of the main problems we had faced was the obscurity caused by occlusion. With extra time given we could improve the way we deal with occlusion. One possible solution would be to implement deep learning. However, the deep learning algorithm should be developed from scratch and use RGB depth images as input. Another way we could take advantage of those 3 months would be if a more cost-effective sensor were to become available. In such a case, a more efficient and accurate sensor would give us more accurate data from the user, creating a more effective application. Although not required by the client, a possible addition to the application would be to add a wider variety of physiotherapeutic exercise, such as taking a step forward or backwards. This would add more functionality to the application, opening it up to a larger user base.
The step by step instructions below show how the application is run. As the user is visually impaired, the steps demonstrated will be for the physiatrist/trainer to undertake.
Getting the Application running© All rights reserved