Algorithms

Description of key algorithms and their implementation

Algorithms Overview

As mentioned previously, core features of our program, such as lyric matching and range of song selection, were somewhat restrained by legal boundaries. However, given our technical decisions, we were well-equipped to approach the task at hand.

Real time Transcription

To ensure real-time vocal transcription and feedback, we first require a stream of audio input, and a similarity tolerance (threshold). The stream starts recording when the user's chosen song starts, and is then segmented into chunks at regular intervals, based on the length of the lyric to be compared. Next, these chunks are posted to the API, and added to the transcription queue.

Asynchronously from the front-end, the queue is dequeued, and this chunk is transcribed to written English language using the Whisper model, pipelined using OpenVINO. The OpenVINO pipeline optimizes calculation to return near-instant transcription, very necessary for quick feedback. This is the cornerstone of our live transcription method, and allows for live awarding of points.

We then compare the transcription to the actual song lyric from the given interval with a sequence matcher, which returns a score between 0 and 1. If the score is above the threshold, the player is awarded points for their accurate singing performance and a star reward appears during the song. The points awarded are directly correlated to the similarity score.

Final rating

During the song, the chunks are stored sequentially, and when the song ends, this stream is saved as a .WAV file. This complete vocal recording allows a more accurate rating to be computed. It is passed through the Whisper model's OpenVINO pipeline, which transcribes the audio again in just a few seconds. Finally, the transcription is compared to the full lyrics of the song, using the sequence matcher, and that score is denoted as the "accuracy percentage" at the user's interface.

Key Components

Whisper speech recognition model
OpenVINO optimization pipeline
Sequence matching algorithm for text comparison
Real-time audio processing and chunking
Asynchronous transcription queue system