Our research led us to platforms like our project in that they aimed at providing computer vision analysis for a specific sport.

Clutch

A badminton analytics application that utilizes Computer Vision and machine learning whose goal is to become a fully automated virtual badminton coach. [1]
Main features:

Recognizes which shots players are making
Able to track more than just 1 player on the court
Able to determine what “pressure” the player is having,
Tracks type of handle grip

As this application is aimed at badminton, which is also a racket sport, we theorized that its focus on outputting specific metrics may also be a feature that would be of interest to tennis players.

BMVC tennis analytics

A project which aims to extract high-level features from a tennis match video and use it to analyze a player’s play pattern.[2]
Main features:

Tracks position of ball through 2D data association method
Calculates 3D ball trajectory while considering ball physics
Detects players by bounding possible players in boxes with a faster-RCNN model

While this project differs from ours in the sense that our project has shot recognition as one of the main functions it still introduced us to the useful idea of tracking players by first bounding them to boxes before performing avatar extraction on them.

Zepp Tennis 2

This application is a powerful swing analyser where you can track practice and match statistics, allowing you to gain insight into your own performance. [3]
Main features:

Requires external device/product to be stuck onto racket
Generate clips of shots
Able to compare statistics with other players
Able to track and share progress with others

A major difference between this application and ours is how this requires an external product to be attached to the racket whereas ours utilizes computer vision to extract the necessary player data. Despite this, we found interest in the application’s focus on the possibility of sharing user progress and clipping specific shots, the latter of which was implemented.

Technology Review

While we did receive advice from our supervisors on which languages and frameworks to use, we still conducted in-depth research into what would suit us best.

Solutions

One of the key points of our project was to recognise the shots taken by tennis players, and to achieve this we needed to make use of machine learning. From the initial research we conducted, we narrowed our scope down to two models, a Convolutional Neural Network Model (CNN)[4] and a Spatial-Temporal Graph Convolutional Network model (ST-GCN)[5][6]. One difference to note between the two are how they are performed; CNN models are performed in a Euclidean space while GCN models are not [7].

Through our experiments, we found the CNN to work better than the ST-GCN model as it provided us an 85% accuracy on average whereas the latter was more difficult to implement. As our project had already been significantly delayed, we chose to follow with the CNN model as it showed us promising results.

Subsequently, to train our model we used the THETIS dataset as it contained a wide variety of tennis players taking different shots. [8]

Programming Languages

We choose Python for our backend as its code simplicity and readability would help external developers understand the code more efficiently as opposed to C++ which requires a deeper learning curve. Moreover, it is one of the most supported languages in terms of machine learning [9] and computer vision, as the frameworks and libraries we looked at had extensive support for python such as MediaPipe and Tensorflow. Lastly, Python has many readily available libraries which we needed such as NumPy and SciPy for data processing.

For our front-end we turned to Javascript as it had many frameworks available to it that we could use in our program such as Vue.js and Three.js which we used for our 3D reconstruction.

Frameworks

The backend of our project was split into three main parts which included player extraction, shot recognition analysis and the API which utilized analysis results.

For player data extraction, we considered two possible frameworks, MediaPipe and Tensorflow. However, after careful consideration we decided to use MediaPipe as it contained ready to use solutions for our problem and already functioned quite well with OpenCV which was used to capture video feed frames. The pose estimation MediaPipe performed also gave us more joint coordinates we could work with, with a total of 32 compared to some other frameworks which only provided around half. [11]

Regarding the shot recognition, we settled on TensorFlow as it was well supported in the machine learning field and allowed us to easily train our models and apply it to our program. Furthermore, its abstraction of machine learning processes made it much easier to use. [12]

Lastly, for our API, we mainly considered two frameworks, Flask and Django. In the end we went for Flask due to its flexibility and simplicity, which was far more suited to the scale of our project. [13]

In terms of the front-end of the application we considered a few frameworks like React.js, Vue.js and Flutter. Vue.js was chosen due to its comparatively low learning curve as well as its widely available online documentation. [14] We also chose to use Bootstrap, a popular CSS framework, to obtain a simple and clean UI and create a responsive website, simplifying its usage on mobile phones. Three.js was selected for 3D reconstruction as it required no extra plugins and could be run natively on any browser using WebGL, thus providing additional support for mobile phones.[15]

Technology Decisions

Technology		Decision
ML Model		CNN
Language	Back-end	Python
	Front-end	Javascript
Backend frameworks	Avatar extraction	MediaPipe and OpenCV
	Shot recognition	TensorFlow
	API	Flask
Frontend frameworks		Vue.js
Libraries	Python	NumPy and SciPy

References

[1] https://www.clutchapp.io/ [Accessed on 10th January 2022]
[2] https://github.com/vishaltiwari/bmvc-tennis-analytics [Accessed on 10th January 2022]
[3] http://www.zepplabs.com/en-us/tennis/ [Accessed on 10th January 2022]
[4] https://doi.org/10.48550/arXiv.1704.07595 [Accessed on 3rd January 2022]
[5] M. Skublewska-Paszkowska, P. Powroznik, and E. Lukasik, “Learning Three Dimensional Tennis Shots Using Graph Convolutional Networks,” Sensors, vol. 20, no. 21, p. 6094, Oct. 2020, doi: 10.3390/s20216094. [Accessed on 3rd January 2022]
[6]https://graphneural.network/ [Accessed on 3rd January 2022]
[7]https://medium.com/@christinatan0704/an-overview-on-spatial-temporal-graph-convolutional-networks-for-skeleton-based-action-recognition-2181c0c0a0f0 [Accessed on 3rd January 2022]
[8] Sofia Gourgari, Georgios Goudelis, Konstantinos Karpouzis and Stefanos Kollias. THETIS: THree Dimensional Tennis Shots A human action dataset. In CVPR, International workshop on Behavior Analysis in Games and modern Sensing devices, June 2013. [Accessed on 3rd January 2022]
[9] https://dev.to/imagescv/top-3-programming-languages-for-implementing-a-computer-vision-system-4jk5 [Accessed on 10th January 2022]
[10] https://www.tensorflow.org/ [Accessed on 10th January 2022]
[11] https://google.github.io/mediapipe/solutions/pose.html [Accessed on 10th January 2022]
[12] https://www.infoworld.com/article/3278008/what-is-tensorflow-the-machine-learning-library-explained.html [Accessed on 10th January 2022]
[13] https://www.interviewbit.com/blog/flask-vs-django/ [Accessed on 2nd March 2022]
[14] https://flutter.dev/ [Accessed on 2nd March 2022]
[15] https://threejs.org/docs/#manual/en/introduction/WebGL-compatibility-check [Accessed on 2nd March 2022]

Related Projects Review