Overview
My master's thesis at Vrije Universiteit Amsterdam, developed at the Sports Intelligence Lab. Body orientation at the moment a player receives the ball, and scanning checks performed before receiving, are two off-the-ball statistics that coaching staff care about. Both are still annotated manually by professional analysts. This thesis proposes a single computer vision pipeline that automates them from broadcast match footage.
Validated on a 2024/25 Women's Super League match (Brighton vs Aston Villa), with predictions compared against Vantage analyst annotations. The body orientation system reached 75% accuracy. Scan counting did not work well at broadcast resolution and is reported as a partial result; details below.

Pipeline
The pipeline is rule-based plumbing around six learned components:
- Player and ball detection. Two separate YOLOv8 models, one for players and one for the ball. Combining them into a single multi-class model was tried first and produced a model heavily biased toward the player class because of the annotation imbalance: 11,368 player annotations versus 886 ball annotations in the Roboflow Universe data.
- Multi-object tracking. Norfair (Kalman-filter-based) for identity-stable player tracks across frames, with re-identification logic to recover after occlusions.
- Automatic team selection. HSV color sampling at the player's bounding box centre plus K-means clustering into two teams. Documented failure mode: kits with white numbers on the centre stripe (e.g. ID 26 in evaluation) misclassify because the sampled pixel is white, not the team color.
- Pass detection. Rule-based: a pass event is registered when ball possession transfers between two players on the same team, using a foot-to-ball cosine distance and a 1m possession threshold. 8 of 14 ground-truth passes were detected on the evaluation footage; the misses were almost all caused by the ball being occluded by the passer's body, so the ball detector lost the ball at the moment of release.
- Body orientation. GluonCV ResNet-152 pose estimation extracts shoulder keypoints. The shoulder vector, its perpendicular (forward direction), and the angular difference to a user-set forward vector pointing at the opponent goal classify orientation as Open, Half-Open, or Closed. 75% accuracy vs analyst annotations on detected passes.
- Scanning. Euler angles from facial keypoints, with a 45 deg/sec angular velocity threshold to count rapid head movements as scans.



Results
| Component | Result | Notes | | :-- | :-- | :-- | | Player + ball detection | Two-model pipeline | Required by 12.8x annotation imbalance | | Pass detection | 8 / 14 events recovered | Ball occlusion at the moment of release was the dominant failure | | Body orientation | 75% accuracy | Validated on Brighton vs Aston Villa, WSL 2024/25 | | Scan counting | Partial / not viable at broadcast resolution | No perfectly matching counts; most estimates off by more than 3 scans |
The honest finding on scan counting: at the resolution of broadcast football footage, facial keypoints are not reliable enough for the Euler-angle approach to work. The thesis treats this as a negative result and proposes higher resolution capture or hybrid head-pose / optical-flow approaches as the next step.
Publication
The body orientation portion of this work was published at ACM MMSports '25 in Dublin (October 2025) as Semi-Automatic Estimation of Body Orientation in Football, with Mauricio Verano Merino and Elixabete Sarasola Nieto.
Stack
PyTorch, OpenCV, YOLOv8, GluonCV (MXNet, ResNet-152), Norfair, Roboflow Universe data, Python.


