Implementation

Mar 7, 2023

The main feature of our solution is the algorithm that allows users to calculate the angle of the camera to the hand. To achieve this, we represented a hand as a flat surface so that we could calculate its normal vector. A triangle was the perfect shape for this as it can only have one normal vector. We use three landmarks to build a triangle: the tip of the index finger, the tip of the middle finger and the tip of the pinky. This triangle simplifies the shape of the hand down to just a flat surface, making it easier to perform calculations on it.

After this, we calculate the normal vector of the triangle by using 3D maths. We represent the camera view direction as a front vector going forward into the hand. By finding the angle between these vectors, we calculate the angle of the hand to the camera.

Once the angle reaches 70 degrees, we assume the user wishes to switch to a different camera. We also use this value to find the next best camera to switch to: out of all other cameras, the one with the lowest angle to the hand (assuming the hand is in the frame) - is the next best camera. We find the best camera by sampling each source (one by one, each frame) and storing the calculated values. Once every camera has been sampled, we find the camera’s index with the lowest angle and switch to it.

Initially, the camera index and display index were not connected to each other. To find the corresponding display for the new camera, we mapped each display to a camera that combines displays and cameras into a single index. This improves the accuracy of the project.

Overall, our solution provides an efficient and intuitive way for users to control multiple cameras using hand gestures. By simplifying the hand shape down to a flat surface and calculating the camera angle, we can accurately detect when the user wants to switch to a different camera. This feature is particularly useful for video conferencing or live streaming scenarios where multiple cameras are set up in a room. Our algorithm also ensures that the next best camera is selected based on the angle to the hand, which further enhances the user experience.