Gender Bias in Model Distillation
Investigating how knowledge distillation from GPT-2 to DistilGPT-2 amplifies gender bias in occupation-gender associations.
Overview
Group project for Statistical Language and Data Processing (SLDP) at Universiteit van Amsterdam (B.Sc. AI). Investigated whether knowledge distillation — the process of compressing a large "teacher" model into a smaller "student" model — amplifies gender bias in language models.
Research Question
How does knowledge distillation affect the degree of gender bias in language models when predicting gender associations for occupations?
Methodology
- Models: GPT-2 (teacher) and DistilGPT-2 (student), both from Hugging Face
- Evaluation: For 100 occupations, prompted each model with "The gender of [occupation] is" and extracted conditional probabilities for "male" vs "female" as the next token
- Ground truth: UK government dataset with actual male/female worker percentages across 318 occupations
- Metric: Compared how closely each model's predicted gender ratios matched real-world employment statistics
Key Findings
Distilled models exhibit greater gender bias in assigning genders to occupations than their non-distilled counterparts. The compression process amplifies existing biases from the training data rather than preserving them proportionally. These findings raise important questions about deploying distilled models in production where fairness matters.
Technologies
Python, Hugging Face Transformers, GPT-2, DistilGPT-2