Skip to content

euroeval.generation_utils

source module euroeval.generation_utils

Utility functions related to generative models.

Functions

source extract_few_shot_examples(dataset: DatasetDict, dataset_config: DatasetConfig, benchmark_config: BenchmarkConfig, itr_idx: int)list[dict[str, t.Any]]

Extract few-shot examples from a dataset.

This will always extract the examples from the training split.

We ensure that the few-shot examples are unique by picking them one at a time.

Parameters

  • dataset : DatasetDict The dataset to extract the few-shot examples from.

  • dataset_config : DatasetConfig The dataset configuration.

  • benchmark_config : BenchmarkConfig The benchmark configuration.

  • itr_idx : int The index of the dataset in the iterator.

Returns

  • list[dict[str, t.Any]] The few-shot examples.

Raises

  • InvalidBenchmark If there are not enough short examples for few-shot learning.

  • NotImplementedError

source apply_prompt(examples: dict[str, t.Any], few_shot_examples: list[dict[str, t.Any]], model_config: ModelConfig, dataset_config: DatasetConfig, generative_type: GenerativeType | None, always_populate_text_field: bool, tokeniser: PreTrainedTokenizer | None)dict[str, t.Any]

Apply prompt template to an example, potentially with few-shot examples.

Parameters

  • examples : dict[str, t.Any] The examples to apply the few-shot examples to.

  • few_shot_examples : list[dict[str, t.Any]] The few-shot examples to apply.

  • model_config : ModelConfig The model configuration.

  • dataset_config : DatasetConfig The dataset configuration.

  • generative_type : GenerativeType | None The generative type of the model.

  • always_populate_text_field : bool Whether to always populate the 'text' field in the examples, as opposed to the 'messages' field.

  • tokeniser : PreTrainedTokenizer | None The tokeniser to use for the model. If None, the tokeniser is not used.

Returns

  • dict[str, t.Any] The example with the few-shot examples applied.

Raises

  • ValueError

  • NotImplementedError

source raise_if_wrong_params(model_config: ModelConfig, allowed_params: dict[re.Pattern, list[str]])None

Raise an error if the model configuration has invalid parameters.

Parameters

  • model_config : ModelConfig The model configuration.

  • allowed_params : dict[re.Pattern, list[str]] The allowed parameters for the model, being a dictionary mapping a regex pattern matching the model ID to a list of allowed parameters for those models.

Raises

  • InvalidModel If the model configuration has invalid parameters.