Research

Our research enabled us to explore existing solutions and examine current codebases, which in turn helped us identify and update requirements

Existing Solution Review

We began by examining the current repo of the existing eye gaze module within MI v3.2. This build was the primary work of Guari Desai, whom we also met with for clarification on any confusions we had about her codebase. In their basic version, they had the file eye_tracking_mediapipe.py that she evaluated as “can track different parts of the eye accurately”. The following is a breakdown of how the code within this file operates:

  • The code reads configuration data from a JSON file and initialises necessary modules like the webcam, FaceMesh from MediaPipe, and PyAutoGUI for cursor control.
  • It creates a small window with a red dot at the centre of the screen, prompting the user to focus on it for calibration purposes.
  • During calibration, the code tracks the movement of the user's eyes and records their average position as the default cursor centre.
  • After calibration, the code continuously captures video frames from the webcam, processes them to detect facial landmarks, and calculates the position of the user's eyes.
  • Depending on the user's settings, the code adjusts the cursor position based on the detected eye movement. It also implements features like double-click control using eye gestures like winking or smiling.
  • The code displays the processed video frames with visualisations of the detected eye landmarks and cursor movement.
  • The program runs indefinitely until the user closes the window or exits the application.
  • Overall, this code provides a basic framework for implementing eye gaze navigation functionality using facial landmark detection and cursor control, catering to various user preferences and interaction patterns.

The advanced version works by operating two files gaze_tracking.py: “the main file to run to detect the gaze vector”, and gaze.py: “the code that takes landmarks and other values to return the gaze vector.”

  • Starting with 'gaze_tracking.py', this script utilises the MediaPipe library to detect facial landmarks and track the user's gaze. It initialises a camera stream and processes each frame using the MediaPipe Face Mesh model to identify relevant facial landmarks. The detected landmarks are then passed to the 'gaze.py' script for gaze estimation.
  • In 'gaze.py', the gaze estimation process begins by calculating the relative positions of facial landmarks and mapping them to 3D model points. These points are used to estimate the rotation and translation vectors, which describe the orientation and position of the user's head relative to the camera. By projecting the gaze vector onto the image plane, the script determines the direction of the user's gaze and visualises it by drawing a line from the pupil to the estimated gaze point on the screen.

While the code demonstrates a functional implementation of gaze tracking, there are several areas that warrant critical evaluation and improvement:

  • Firstly, the code lacks extensive documentation and comments, making it challenging for developers to understand the underlying logic and functionality. Improved documentation would enhance code readability and facilitate easier maintenance and debugging.
  • Furthermore, the gaze estimation algorithm implemented in 'gaze.py' may not be optimised for accuracy and robustness in various lighting conditions or user environments. As a critical component of an eye gaze tracking system, the accuracy of gaze estimation is paramount for reliable user interaction. Therefore, rigorous testing and validation procedures are necessary to assess the algorithm's performance across diverse scenarios and datasets.
  • Additionally, the code could benefit from refactoring and modularization to improve code organisation and maintainability. Breaking down the functionality into smaller, reusable components would enhance code reusability and facilitate future enhancements or modifications.

Overall, while the provided code lays the groundwork for an eye gaze tracking system, critical evaluation and refinement are essential to address potential limitations and ensure the system's effectiveness, accuracy, and usability in real-world applications.

Summary of Technical Decisions