Skip to content


source module euroeval.finetuning

Functions related to the finetuning of models.


source finetune(model: BenchmarkModule, datasets: list[DatasetDict], model_config: ModelConfig, dataset_config: DatasetConfig, benchmark_config: BenchmarkConfig)list[dict[str, float]]

Evaluate a model on a dataset through finetuning.


  • model : BenchmarkModule

    The model to evaluate.

  • datasets : list[DatasetDict]

    The datasets to use for training and evaluation.

  • model_config : ModelConfig

    The configuration of the model.

  • dataset_config : DatasetConfig

    The dataset configuration.

  • benchmark_config : BenchmarkConfig

    The benchmark configuration.


  • list[dict[str, float]] A list of dicts containing the scores for each metric for each iteration.


source finetune_single_iteration(model: BenchmarkModule | None, dataset: DatasetDict, iteration_idx: int, training_args: TrainingArguments, model_config: ModelConfig, dataset_config: DatasetConfig, benchmark_config: BenchmarkConfig)dict[str, float]

Run a single iteration of a benchmark.


  • model : BenchmarkModule | None

    The model to use in the benchmark. If None then a new model will be loaded.

  • dataset : DatasetDict

    The dataset to use for training and evaluation.

  • iteration_idx : int

    The index of the iteration.

  • training_args : TrainingArguments

    The training arguments.

  • model_config : ModelConfig

    The model configuration.

  • dataset_config : DatasetConfig

    The dataset configuration.

  • benchmark_config : BenchmarkConfig

    The benchmark configuration.


  • dict[str, float] The scores for the test dataset.


source get_training_args(benchmark_config: BenchmarkConfig, model_config: ModelConfig, iteration_idx: int, dtype: DataType, batch_size: int | None = None)TrainingArguments

Get the training arguments for the current iteration.


  • benchmark_config : BenchmarkConfig

    The benchmark configuration.

  • model_config : ModelConfig

    The model configuration.

  • iteration_idx : int

    The index of the current iteration. This is only used to generate a unique random seed for the current iteration.

  • dtype : DataType

    The data type to use for the model weights.

  • batch_size : int | None

    The batch size to use for the current iteration, or None if the batch size in the benchmark config should be used.


  • TrainingArguments The training arguments for the current iteration.