System Design

Designing the System

Table of Contents

Flowchart

Intended Usage

The Journey of MotionInput

System Architecture

MotionInput 3.4 Speech Configuration Mode

System Architecture Diagram

This segment of the MotionInput system leverages advanced NLP techniques to allow users to customize their gaming controls using voice commands. The system architecture diagram illustrates the sophisticated process of converting spoken words into actionable game controls, enhancing accessibility and gaming experience.

User Interaction

Users interact with the system first by using the front-end provide by MotionInput of Microsoft Foundation Class (MFC) to select this customization mode then they can enter their speech via voice commands. These commands are raw audio data that need to be processed and understood by the system. The "Whisper/VOSK" component represents a voice recognition system that transcribes spoken words into text without requiring any specific syntax (i.e., the absence of commas).

NLP Processing Unit

The core of the system is the NLP unit, which processes the transcribed text.

  • Similarity Matching: Within the NLP unit, the transcribed text is analyzed for similarity to known commands using the SentenceTransformer model. This model uses embeddings to calculate the cosine similarity between the user's input and a set of predefined phrases, ensuring that variations of phrases and synonyms are accurately interpreted.
  • Named Entity Recognition (NER): This step involves recognizing specific terms and phrases related to gaming and real-life actions. It uses the spaCy NLP Model trained on custom data (TRAIN_DATA) to identify and classify entities within the transcribed text. This model has been enhanced with custom entity matchers (GAME matcher, POSES matcher, GESTURE matcher) tailored for gaming contexts.

JSON Config Generation

After processing the voice command, the system generates a JSON configuration file. This file includes the in-game key mappings and recognized entities such as game names, orientations, and body landmarks. It is here that the NER outputs are translated into actual control configurations. The Mode Config is a byproduct of this process, possibly a mode state or profile that the user can switch to within the MotionInput system..

Output to User

Once the JSON file is generated, it is used to reconfigure the MotionInput system, mapping voice commands to game controls. While the diagram does not show the explicit use of this JSON file, it implies that the user would then run MotionInput with the newly created configuration. The output to the user can be ambiguous, represented by the cloud symbol in the diagram, indicating that the user will experience the effects of the configuration through the updated behavior of the MotionInput system. This dynamic loading the JSON format is essentially part of our MotionInput seniors Joseph dissertation works which he integrated our part along with it.

UML Diagram

Bridging the Gap: The MotionInput Journey

The Journey of MotionInput

Method Descriptions for ModelFineTuner:

  • __init__: Constructor initializing the SentenceTransformer with the specified base model and setting the output path for the fine-tuned model.
  • prepare_data: Converts a list of dictionaries containing sentence pairs and similarity scores into a DataLoader with InputExample objects.
  • fine_tune: Fine-tunes the model on the provided dataset and saves the fine-tuned model to the output path.
  • load_fine_tuned_model: Loads the fine-tuned model from the output path into the model attribute.

Relationships and Dependencies for ModelFineTuner:

  • Uses SentenceTransformer, InputExample, and losses.CosineSimilarityLoss from the sentence_transformers library.
  • Uses DataLoader from the torch.utils.data library for creating training data batches.
  • Uses pandas.DataFrame for handling and converting the input data into the format required for model training.

Method Descriptions for MotionGameMapper:

  • __init__: Constructor initializing the SentenceTransformer model and spaCy NER model.
  • _similarities_match: Static method that calculates the cosine similarity between a target phrase and a list of possible phrases using embeddings.
  • motion_to_action_mapping: Maps a user motion to an in-game action using the similarity match function.
  • action_to_key_input: Static method that maps a game action to the corresponding keyboard input.
  • initialize_output_structure: Static method that initializes the data structure for the output JSON.
  • predict_to_json: Processes sentences to predict game-related actions and outputs to a JSON file.
  • _predict_without_comma: Processes input sentences to identify actions without the need for comma separation.

Relationships and Dependencies for MotionGameMapper:

  • Uses SentenceTransformer for sentence embeddings.
  • Uses spacy.Language for NER processing.
  • Uses global constants games_actions, game_key_mappings, available_gestures, and available_poses for data processing.

Method Descriptions for NERTrainer:

  • __init__: Constructor that initializes the model_dir and sets up the NER model by calling _load_or_create_model and _add_entity_matchers.
  • _load_or_create_model: Private method that either loads an existing spaCy model from the specified directory or creates a new one if it doesn't exist.
  • _add_entity_matchers: Private method that adds custom entity matcher components to the NLP pipeline if they are not already added.
  • train: Public method that trains the NER model using the provided training data and iteration count. It shuffles the data, updates the model with the examples, and then saves the model using _save_model.
  • _save_model: Private method that saves the trained NER model to the disk.

Relationships and Dependencies for NERTrainer:

  • Depends on the spacy.Language class.
  • Uses spacy.training.Example for training data examples.
  • Uses spacy.matcher.Matcher and spacy.util.filter_spans for entity matching functionality.
  • Uses external data TRAIN_DATA for training purposes.