Main System Design Diagrams




Description

The user interacts with Gesture Recorder MFC, which has various selections like, whether to record live or import video, record a gesture or pose, etc.

The front-end MFC writes user selections into backEndConfig.json file, which can be read by the gesture recorder back-end python programs.

Click the launch button to run the gesture recorder with selected options and do the recording.

When recording is done, the gesture recorder will generate raw data of recorded pose or gesture, with corresponding pictures or gifs of it. The data will be loaded by MotionInput MFC.

Pose & Gesture Recorder Architecture

Pose: a single shape that the user can make with their hand. These are trained from a single image and each frame of the camera feed is compared to the data recorded from the training image.

Gesture: a motion made by the user’s body over a time period, set to 25 frames in our current implementation. The data is trained from this series of frames to construct a motion graph, of which a buffer of an equivalent number of frames is compared to when filled.

The recorders are composed of a single class (each) that handle their respective functions. They initialize a camera and detect hand/body landmarks using the MediaPipe library. The class has several methods, including:

  • __init__: Initializes the class and sets up the camera, number of hands to detect, etc.
  • __del__: Destroys the camera and closes any open windows.
  • get_color: Returns the color of a hand landmark based on whether it is detected or not.
  • draw_landmarks: Draws the landmarks on the frame.
  • draw_info: Draws information such as FPS and detected gestures/poses on the image.
  • save: Saves a gesture/pose to be checked later.
  • load: Loads all gestures/poses from the models’ directory.

A key point regarding this design was how to compose the JSON output files. Our philosophy was that a file should be represented by the minimum data necessary. This was to reduce file size (and by extension, complexity during processing), and to allow flexibility if further fields were required later in development – by not defining them now, we allow for them to be added later.