Bird's-Eye View Transformation System
Bachelor's thesis: developing a robust BEV dataset for autonomous driving by combining drone footage, object detection, and semantic segmentation.
Overview
My bachelor's thesis at Universiteit van Amsterdam, completed for Saivvy — a company building a phone app that provides cyclists a bird's-eye view of their surroundings by transforming ground-view camera footage into a top-down perspective. The project focused on creating a robust BEV dataset using drone and side-view cameras, then training detection and segmentation models on this real-world data.

Data Collection
- Captured drone footage at 60 meters altitude using a DJI Mini 3 Pro with auto-lock tracking
- Synchronized ground-view cameras with drone footage using NTP server time calibration (10-20ms accuracy)
- Collected footage across different urban environments: intersections, traffic lights, occluded roads
- 70/20/10 train/validation/test split with geographically separated test locations
Models and Results
Three segmentation architectures were evaluated on 7 semantic classes (road, bike lane, sidewalk, pedestrian crossing, continuous/non-continuous lines):
- Segformer (Mit-B1): Best overall performance — 59.6% recall, 41.8% precision, 30.5% mIoU
- FCN (ResNet): Strong on pedestrian crossings (94.4% recall) but lower overall precision
- PointRend (ResNet): Competitive on sidewalks and continuous lines
For detection, models were tested with pre-training on the Stanford Drone Dataset. A tracking script maintained bounding box identity across frames, feeding into a post-processing pipeline for Saivvy's mapping model.

Technologies
Python, OpenMMLab, Segformer, FCN, PointRend, CVAT, DJI Mini 3 Pro