Project Background
UCL MotionInput is the first major software package launched globally for touchless computing interactions, created as a response to the global COVID-19 pandemic. As lockdown forced staff and students away from campus, UCL launched a range of initiatives to make teaching remotely work for all. In particular, UCL began conversations with NHS, Microsoft and IBM about possible replacements for the NEWS2 protocol, a system used by the NHS to triage patients quickly, but required physical touch and examination. At the same time, ML technologies such as Tensorflow, and Computer Vision techniques such as Convolutional Neural Network (CNN) Training models were accelerating in their development. This led to MotionInput being proposed as a final year project in September 2020, with V1 being released in January 2021, developed by 2 final year BSc Computer Science students. MotionInput V1 had support for 2 modes-of-interaction: A desk mode, which allowed for the user to replace their mouse control with holding a pen in the air, and Exercises mode, which allowed the user to interact through exercises, like walking on the spot. The project was then taken over by a group of MSc Computer Science students, leading to the release of MotionInput V2, which redeveloped and expanded functionality to 4 modes-of-interaction, including Hands, full Body, Eyes and Head. Now we are leading the development of the 3rd generation of MotionInput with our primary stakeholders being: Prof Dean Mohamedally, Sheena Visram, Sibghah Khan (UCL), Sinead Tattan (GOSH/UCL), and additional clients: Lee Stott (Microsoft), John McNamara (IBM), Rakshita Kumar, Karunya Selvaratma and Eesha irfan (UCL).
Initial Project Objectives
From our initial project brief, we added a set of tentative project objectives before doing MotionInput research.
- Examining the current codebase of MotionInput v2, and developing reduced compilation executables for Windows for improvements in performance and reduced latency
- Create an apps-library of MotionInput extended software, allowing access from a single application
- Produce 4 builds for different user-profiles: low power, regular desktop user, gaming, and industry-specific users
- Work with the final-year students that were doing their industry project with MotionInput, to integrate new features
Project Analysis
As we were tasked with developing the next iteration of MotionInput, our requirements dictated that we analyse the current v2 codebase in order to re-engineer it. In our analysis, we documented a number of issues with v2 underneath the surface, in order to ensure that our re-engineered MotionInput v3 rectified these issues across all levels, from architecturally to functionally. It is important to state that MotionInput v2 was developed to be a proof-of-concept, demonstrating the feasibility of touchless computing with multiple modes of interaction. As a result, many of the points we make in the analysis stem from the project being proof-of-concept designed rather than a robust, high quality code and architecture backed release software.
Issues found
- Since each student’s individual dissertation was based on implementing a particular functionality for MotionInput, they focused on coding that functionality for a single module without much code commonality between each other’s work. With different coding styles, this resulted in an inconsistent codebase.
- The code is clunky with examples of long function lengths, repeated code and misinterpreted implementations of design patterns.
- The design of the previous architecture makes extendibility, and adding future requirements/functionality more difficult. Adding new gestures for example would’ve required modifying the module source code itself, which makes it difficult for different developers to work separately.
- The implementation of the system does not allow for multiple modules to be used together, and only one can be selected at any time.
- There is a lack of documentation to guide future developers to extend MotionInput’s capabilities easily and quickly. There are some guides, but they can only be found within the dissertation paper themselves.
- Compilation for developers is a complicated and convoluted process, requiring manual installation of dependencies and complicated manual file alteration and movement.
- The system is not very configurable, with many hard-coded values within the code, and the configuration files being difficult to access, read and edit.
Final Requirements
After analysing the MotionInput v2 codebase, and discussions with our clients, we made a set of final requirements at the start of our project development phase, which focused on a complete re-engineer of the old codebase:
- Design and implement an entirely new system architecture, one that is created with developers in mind to ensure that MotionInput is highly configurable and extendable, to allow them to efficiently develop use-case specific apps using the architecture codebase.
- Ensure the new codebase retains all of the functionality of MotionInput v2, including in all 4 existing modules, and also develop new functionality for the existing modules.
- Ensure that our architecture implementation uses good programming practices, including consistent, non-repeating code and design patterns to ease maintainability and future development.
- Allow for the simultaneous use of multiple ML libraries and technologies within a single module and the simultaneous use of multiple modules together. For example, using the full body module together with the hand module.
- With regards to image calculation, introduce frame by frame processing that is efficient and minimal in terms of the logic for detecting and calculating gestures, thereby increasing scalability and reducing latency.
- Create detailed and usable developer-oriented documentation, complimenting the codebase and allowing new developers to quickly get up to speed and ease the development process of creating and configuring gestures or modules, and compilation of their code.
MoSCoW Project Goals
After finalising our requirements, we created a MoSCoW requirement list to inform our goals for the project. This allowed us to easily check our progress by comparing the number of points on the list that we have successfully implemented or achieved.
-
Must Have
- Use multiple ML (modules) libraries at once
- Switching events by gestures on runtime
- Increased frame by frame performance
- Decreased startup time
- Functionality widely configurable through just JSON files
- Extenadable code - easy to add events, gestures and handlers
- Clear documentation
- Ability to create reduced functionality compiled builds, e.g. removing an unused module and its model files
- Simple API for the frontend to use
- Reduced storage size of compiled builds
- Work with the final year students to integrate the new features in MotionInput v3.0
- System can be closed without causing a window freeze
- Move over all Hand module functionality from v2
- Move over all Head module functionality from v2
- Move over all Eye module functionality from v2
- Move over all Body module functionality from v2
- Ensuring that elements displayed in the view for modes can be removed and displayed when required when switching modes
-
Should Have
- Add functionality for an in-air virtual keyboard
- Add functionality (a module) for speech recognition
- Add functionality for gesture recording
- Add functionality for a new mode combining the extremity triggers + walking on the spot, allowing for walking on the spot to trigger a key hold, and the extremity triggers to change the keybind set
- Add functinality for a new "Gamepad" mode, similarly allowing triggers to be used in conjunction with walking on the spot, but the triggers acting as Gamepad buttons, optimising for gaming
- Add functionality for a new "FPS" mode, adapting the "Gamepad" mode to allow for the cursor to be controlled by the hands (potentially best for FPS games)
- Optimise system using multithreading technology
- Optimising gesture detection in each module by only performing calculations on the primitives required from the loaded events
- Allowing the exercise detection to switch between equipment and no equipment modes without restarting MotionInput
- Ability to detect exercises and extremity triggers simultaneously, allowing for combined modes
-
Could Have
- Automated compilation of source code
- Automation of creation of micro-builds, which are compiled executables of individual modes
- Compilation into a DLL
-
Won't Have