euroeval.scores
source module euroeval.scores
Aggregation of raw scores into the mean and a confidence interval.
Functions
- 
log_scores — Log the scores. 
- 
aggregate_scores — Helper function to compute the mean with confidence intervals. 
source log_scores(dataset_name: str, metrics: c.Sequence['Metric'], scores: c.Sequence[dict[str, float]], model_id: str, model_revision: str, model_param: str | None) → ScoreDict
Log the scores.
Parameters
- 
dataset_name : str — Name of the dataset. 
- 
metrics : c.Sequence['Metric'] — List of metrics to log. 
- 
scores : c.Sequence[dict[str, float]] — The scores that are to be logged. This is a list of dictionaries full of scores. 
- 
model_id : str — The model ID of the model that was evaluated. 
- 
model_revision : str — The revision of the model. 
- 
model_param : str | None — The model parameter, if any. 
Returns
- 
ScoreDict — A dictionary with keys 'raw_scores' and 'total', with 'raw_scores' being identical to scoresand 'total' being a dictionary with the aggregated scores (means and standard errors).
source aggregate_scores(scores: c.Sequence[dict[str, float]], metric: Metric) → tuple[float, float]
Helper function to compute the mean with confidence intervals.
Parameters
- 
scores : c.Sequence[dict[str, float]] — Dictionary with the names of the metrics as keys, of the form " _ ", such as "val_f1", and values the metric values. 
- 
metric : Metric — The metric, which is used to collect the correct metric from scores.
Returns
- 
tuple[float, float] — A pair of floats, containing the score and the radius of its 95% confidence interval.