euroeval.generation
source module euroeval.generation
Functions related to text generation of models.
Functions
-
generate — Evaluate a model on a dataset through generation.
-
generate_single_iteration — Evaluate a model on a dataset in a single iteration through generation.
-
debug_log — Log inputs and outputs for debugging purposes.
source generate(model: BenchmarkModule, datasets: list[DatasetDict], model_config: ModelConfig, dataset_config: DatasetConfig, benchmark_config: BenchmarkConfig) → list[dict[str, float]]
Evaluate a model on a dataset through generation.
Parameters
-
model : BenchmarkModule — The model to evaluate.
-
datasets : list[DatasetDict] — The datasets to evaluate on.
-
model_config : ModelConfig — The configuration of the model.
-
benchmark_config : BenchmarkConfig — The configuration of the benchmark.
-
dataset_config : DatasetConfig — The configuration of the dataset.
Returns
-
list[dict[str, float]] — A list of dictionaries containing the test scores.
source generate_single_iteration(dataset: Dataset, model: BenchmarkModule, dataset_config: DatasetConfig, benchmark_config: BenchmarkConfig, cache: ModelCache) → dict[str, float]
Evaluate a model on a dataset in a single iteration through generation.
Parameters
-
dataset : Dataset — The dataset to evaluate on.
-
model : BenchmarkModule — The model to evaluate.
-
dataset_config : DatasetConfig — The configuration of the dataset.
-
benchmark_config : BenchmarkConfig — The configuration of the benchmark.
-
cache : ModelCache — The model output cache.
Returns
-
dict[str, float] — A list of dictionaries containing the scores for each metric.
Raises
-
ValueError
source debug_log(batch: dict[str, t.Any], model_output: GenerativeModelOutput, extracted_labels: list[dict | str | list[str]], dataset_config: DatasetConfig) → None
Log inputs and outputs for debugging purposes.
Parameters
-
batch : dict[str, t.Any] — The batch of examples to evaluate on.
-
model_output : GenerativeModelOutput — The output of the model.
-
extracted_labels : list[dict | str | list[str]] — The extracted labels from the model output.
-
dataset_config : DatasetConfig — The configuration of the dataset.
Raises