Summary of Achievements

Overview of project accomplishments and implementation status

MoSCoW Requirements Achievement

For brevity, each member will be referred to by their initials.

ID Requirement Priority Status Contributors
1 M The system must have offline processing for all core functions including speech recognition and lyric matching. Success. Other than the YouTube access, the system requires no further internet access to process speech, as the Whisper model is packaged in the executable. YA, EC, AN
2 M The system must have local audio preprocessing optimized for Intel hardware using OpenVINO. Success. OpenVINO pipeline is utilised, which improves model inference time. YA, EC
3 M The system must have real-time voice-to-lyric matching with support for children’s accents and varied pronunciations. Success. Algorithm first implemented with the Whisper model in November and improved upon since. YA, EC, AN
4 M The system must have a native Windows application optimized for Intel-based systems with NPUs. YA, EC, AN, JW
5 M The system must have support for YouTube link transcription, allowing teachers to select songs. Success. EC, AN
6 M The system must have a high-usability interface, accessible to both neurodivergent children and non-technical users. Success. JW, AN
7 S The system should allow teachers to create and save playlists for the classroom. Success. Fully functional playlist manager with playlists stored locally. AN
8 S The system should have adjustable playback or difficulty settings to suit individual learners. Success. YA
9 S The system should have a non-scoring mode for pressure-free learning experiences. Success. Score may be toggled hidden or shown at any point. AN
10 S The system should have a scoring mode to provide feedback when desired. Success. YA
11 C The system could have customizable visual themes to personalize the user experience. Success. Background selector implemented in focus mode. JW
12 C The system could have a reward system (e.g., stars) to increase engagement. Success. YA, JW
13 C The system could have a focus mode where distracting UI components are removed. Success. JW
14 W The system won’t store the voices of users to protect the privacy of users. Success. File is kept solely locally. -
15 W The system won’t use copyrighted songs to ensure GDPR compliance. Success. No work-around attempted. -

Known Bugs

No known bugs.

Individual Contribution Distribution

System Artefacts

Work packages Yusuf Ediz Anthony Jerry
Research and experiments 30% 25% 30% 15%
UI Design 10% 20% 25% 45%
Coding 30% 30% 25% 15%
Testing 30% 25% 20% 25%
Overall Contribution 25% 25% 25% 25%

Website

Work packages Yusuf Ediz Anthony Jerry
Website template and setup 25% 25% 25% 25%
Home 10% 10% 10% 70%
Video 25% 25% 25% 25%
Requirement 20% 10% 10% 60%
Research 15% 65% 10% 10%
Algorithm 10% 10% 70% 10%
UI design 10% 10% 0% 80%
System design 10% 80% 0% 10%
Implementation 70% 10% 10% 10%
Testing 15% 0% 85% 0%
Evaluation and future work 10% 10% 70% 10%
User and deployment manuals 10% 70% 10% 10%
Legal issues 70% 10% 10% 10%
Blog and monthly video 50% 15% 15% 20%
Overall contribution 25% 25% 25% 25%

Critical Evaluation

Detailed analysis of project components and performance

User Interface / User Experience

Our app provides a smooth and streamlined user experience, and we believe that we have achieved our goal of an exciting, accessible game. At its very base, React Native has allowed us to build an intuitive and responsive application. On the main game screen, the use of bright colours, big buttons, all while conforming to familiar app layouts, means players need just one click to begin singing. Meanwhile, our focus mode and theme selector for performances takes gameplay to a completely new level, allowing players to completely lock into their performance. ReadingStar also has settings available to adjust features like fonts, and their sizing, colours, and more, and the scoring visibility can be toggled at any moment.

Functionality

Our aim was to provide an application which allowed users of any demographic to easily improve their speech by performing karaoke with their desired song, providing real-time feedback and rewards based on similarity. We achieved this, and managed to build on this with further analysis on the recording of their voice during the song to finally score their overall performance.

The Whisper model we utilised for transcription and analysis in this project is world-leading in terms of performance and WER, when standardized for all demographics, but unfortunately its limitations for live ASR emerged in this project.

One point of note was its occasional inaccuracy for medium-length or incomplete phrases. Due to this, perfectly valid speech to the human ear would not be deemed similar enough to the lyric in real-time, and subsequently not be awarded in-game points. This is because the model adapts transcriptions based on the surrounding context, which it is starved of in the short window for the lyric matching (no more than a few seconds). Therefore, the final accuracy score presented at the end of a song is always the better (and correct) performance assessment, compared to the points score.

Stability

The project as a whole is now very stable, though at times during development it demonstrated quite volatile behaviour. For example, developing a React Native front-end for Windows suffers from a limited range of libraries which required us to find tricky workarounds for seemingly simple tasks, such as microphone access. Furthermore, the app was prone to crashes in the early stages of UI development, usually due to errors such as failed library imports or buttons disappearing at runtime, despite being present in the code. However, as we became familiar with the platform, these spontaneous errors decreased in frequency and we are very confident that the current application is robust enough to handle a vast range of events.

Efficiency

For our project, additional efficiency was provided by Intel’s OpenVINO inference pipeline. As demonstrated in the performance testing section, OpenVINO provides optimised performance at no cost to speed, and thus uses less power as a result. This is critical for users, especially on laptops, who can still run the model at great speed without having to worry about excessive power usage.

Tested Option Average Total Power Draw per Session (W)
CPU utilised 11.10419456
GPU utilised 14.42176254
NPU utilised 9.326389253

ReadingStar aims for low latency inference for real time ASR, making the NPU the most suitable inference device. This is because the NPU excels at low-latency and energy-efficient inference whereas the GPU is optimised for parallel, high-throughput tasks, which is more suitable for training and less ideal for real time inference tasks.

Compatibility

The very choice of React Native for our UI was made with the foremost thought of compatibility in mind. In a world where Windows PCs are ubiquitous in households and educational institutions alike, with the long-term aim of distribution as a Microsoft Store installable, it is imperative that the app is supported by as many of them as possible.

Most Windows PCs run on a form of Intel hardware, meaning that the OpenVINO pipeline included will improve efficiency on almost every device that uses ReadingStar. Furthermore, this app can also run at its peak performance due to OpenVINO’s NPU compatibility on devices that possess one.

In the future, a mobile version could be considered, and as we already have a React Native framework it should not be difficult to port it to iOS- or Android-specific usage. This would allow for a wider range of users to access the app, such as parents of neurodivergent children or even the children themselves.

Maintainability

Throughout our project, our team strived to write quality and self-documenting code, ensuring an intuitive “pick-up-and-go” development process. Because of this effort on our part, we are very confident that any future engineers can continue effectively without difficulty understanding the algorithms.

We also worked hard to devise an optimal workflow which was reflected in the code. This means that there are no convoluted algorithms or redundant functions to trawl through before a change is made.

Project Management

Throughout the course of our project, our team was very interactive over every channel. Outside of the lab sessions which we all maintained consistent attendance to, we would plan and discuss the execution of solutions over our WhatsApp group chat, and by logging our progress on GitHub with issues, descriptive commits, and thorough code reviews.

Through all these media, we would delegate or share responsibilities for upcoming tasks, holding each other accountable and working quickly to maintain steady progress. In addition, our team was very flexible in terms of roles, with each member contributing to all aspects of the project. This was particularly useful when one member was struggling with a task, as another could step in and provide assistance.

Our agile approach as the requirements evolved, from feedback at the BETT Conference to the comments of pupils at the Helen Allison School, was kept streamlined through receptive communication on the WhatsApp group chat again. We also made sure to check off objectives at regular points in development, and keep the Gantt chart updated.

Future of the ReadingStar Project

Potential improvements and extensions for the project

Proposed Improvements

Music streaming service integration

The embedded YouTube player is extremely accessible and simple to use, requiring just a URL to start singing a song. However, its drawbacks unfortunately lay with the YouTube guidelines itself. Many songs, such as pop tracks, are blocked from the player, presumably due to copyright or ad revenue status. Even if a song is available, it may be rendered unusable in the context of ReadingStar, because it has no captions or YouTube cannot auto-generate them accurately for the full song.

As a team, we explored Spotify integration but quickly abandoned it because of the cost and restrictions placed on lower-tier access to its API. Without these constraints, we may consider this an additional music source alongside YouTube. A music streaming service sync solves multiple problems for us, namely:

  • Incomplete lyrics: full and timed lyrics are available for almost every song.
  • Limited range of songs: over 70 million songs streamable on Spotify, and music videos available too.
  • Dependency on YouTube: e.g. if a song is removed from YouTube, an alternative platform exists to stream.

It could operate similarly to Genius, the music lyrics platform, which has a music player on each song page, linked to a streaming service (currently Apple Music).

Unfortunately, for our project, React Native was one of the only feasible options because we required a responsive, modern, and flexible UI, without sacrificing compatibility. We do note that…

Playlist storage and handling

Playlists were stored with atomic songs, inside a JSON file, which had fully-functional CRUD operations. However, this meant that the file was not normalised, meaning that you had to duplicate each song entry to add it to a new playlist. To improve this limited storage function, we could split the file into two, one storing songs, and the other containing playlist names and entries, similar to a database. Therefore, each song would be assigned a unique key and would only need to be referred to by ID to be loaded from a playlist, so no duplicated entries are required.

Moreover, a song’s YouTube URL is not shown to the user if loaded from a pre-saved playlist, which means that it is difficult to add a song already stored to a new playlist. To solve this, we could take a different approach by adding a trivial function (maybe presented as a checkbox in settings or near the URL input bar) to show it, if a user so wished.


Better visual rewards / feedback

Our current reward system is a simple star pop-up, and points based on similarity accumulated every performance. While this is effective, it may not particularly engaging after multiple performances. In addition, this was our lowest-scoring area on the feedback form, indicating that our users could grow indifferent to or bored of the existing rewards after a few games. To counter this, we propose including AI-generated visuals related to the song that appear after a matched lyric, or colourful animations at the end of a song. This would be a great way to keep users engaged and motivated to continue playing, especially for younger users.

Furthermore, this is another way to increase accessibility and inclusivity, as it was noted that users who have difficulty reading prefer to use images to understand the contents of the media source, in this case a song. Thus, the new visuals would allow them to keep track and sing along by associating the images with words, which in turn could help them learn to read. It also means that they would not be alienated by ReadingStar, which presumes a basic knowledge of reading and speaking English.

Extension Possibilities

Additional performance feedback

Our final score provides very accurate feedback on the singing performance of a user. In the future, we could expand on this by providing more detail on the accuracy per lyric. This is already stored after the Whisper model transcription. Thus, we may once again use a sequence matcher to highlight the best and worst speech of the song, and present them in the application, using appropriate colour grading. This would allow educators and students alike to identify the most difficult pronunciations, which would aid future speech practice.

Automatic difficulty classification by song

Note this differs to the difficulty selector already featured in-game, and would not replace it, rather complement it.

Throughout the course of our project, and underscored in our user acceptance testing phase, we noticed the range in difficulty of songs played. Slower songs, such as nursery rhymes, were easier to perform than high-tempo songs, such as those featured on Disney soundtracks.

One way to improve UX when performing is to calculate the speed of the song upon receipt of the lyrics, using a measure such as the average of lyric length / lyric interval. Then, we would compare this to a preset threshold which would automatically classify the song in categories such as starter, challenging, or ultimate difficulty. This would be presented to the user at a suitable point, such as before playing the song or next to the score graphic. It would provide the user an additional opportunity to select a suitable threshold for the in-game difficulty scoring threshold. Furthermore, it means users can "work up" to the more difficult songs, instead of attempting it first and possibly being put off by a poor performance on a song that is currently too challenging for them.

Leaderboard, and Multiplayer Versus Mode

A locally-stored leaderboard would provide additional competitive incentive when practising with the game. We could add an arcade-style leaderboard in a JSON file, where the top 5 performances per song and difficulty are scored, and the player is prompted to add theirs if high enough. This would be a great way to encourage replayability and competition amongst the individual using ReadingStar.

Additionally, a heavily-requested feature from our visit to the Helen Allison School was a multiplayer mode, where users could compete with each other on who sings the most accurately. This could be implemented with a multiple-microphone setup, and modifications to the back-end to record multiple performances concurrently. We would only need the existing model pipeline for transcription, as it already processes vocal input into transcribed data and could incorporate multiple inputs into its queue. A shared results screen at the end of each session could highlight individual performances and declare a winner based on overall accuracy. One point of note is that further noise reduction techniques may be necessary, particularly in environments with multiple audio sources, such as a noisy classroom.

Conclusion

Final thoughts on the project

Overall, we can undoubtedly classify our project work as a success, as we hit all objectives set out and extended on multiple, and the application is nearing a stable release to the Microsoft Store. Most importantly though, feedback from users, especially at the Helen Allison School, one of the institutions most invested in the app’s development, was overwhelmingly positive. Anthony and Jerry had the wonderful opportunity to see the children, who will be using this, having fun performing songs with ReadingStar, and we hope this is the seed for a revolution in accessible speech therapy.