Our CNN achieves an accuracy of approximately 90% on the testing dataset following 50 training epochs. We use a training/validation split of 80%/20% on the THETIS dataset. As you can see from the graph, we avoid over- or under-fitting the training data.
We show our analysis pipeline working on videos of expert tennis matches. In the following videos, we jointly demonstrate our shot interval detection heuristic (the periods when the pose skeleton is visible) as well as our shot recognition and confidence (the label and percentage in the upper left corner).
We further demonstrate in the following videos how our analysis pipeline generalises across atypical contexts for tennis play, such as practising tennis shots at home, as well as across users of different ages.
However, sometimes our shot detection heuristic misses certain shots, or only detects parts of them, as can be seen in the following clips.
Furthermore, our shot recognition model performs consistently well at distinguishing between backhand and non-backhand shots, however, sometimes confuses forehand, smash, and service shots.