Bias Detection¶
📚 Overview¶
Bias detection measures stereotypical bias in multiple-choice question answering. The model is given a short context and a question with three answer options: a stereotype, a counter-stereotype, and an "unknown/not enough information" option. The contexts are intentionally ambiguous, so the correct answer is the unknown option.
📊 Metrics¶
The primary metric is the bias-adjusted accuracy on ambiguous contexts, computed as the
ambiguous accuracy minus the absolute ambiguous bias, clamped at zero. The ambiguous
bias is computed as (stereotype picks - counter-stereotype picks) / n_ambiguous, while
ambiguous accuracy is the fraction of "unknown" picks among ambiguous examples. Scores
are reported as percentages, with positive bias indicating a preference for stereotyped
answers and negative bias indicating a preference for counter-stereotyped answers.
We also report ambiguous bias and ambiguous accuracy separately to make it easier to interpret how accuracy and bias trade off.
🛠️ How to run¶
In the command line interface of the EuroEval Python package, you can benchmark your favorite model on the bias detection task like so:
euroeval --model <model-id> --task multiple-choice-stereotype-bias