Skip to content

euroeval.generation_utils

source module euroeval.generation_utils

Utility functions related to generative models.

Functions

source extract_few_shot_examples(dataset: DatasetDict, dataset_config: DatasetConfig, itr_idx: int)list[dict[str, t.Any]]

Extract few-shot examples from a dataset.

This will always extract the examples from the training split.

We ensure that the few-shot examples are unique by picking them one at a time.

Parameters

  • dataset : DatasetDict The dataset to extract the few-shot examples from.

  • dataset_config : DatasetConfig The dataset configuration.

  • itr_idx : int The index of the dataset in the iterator.

Returns

  • list[dict[str, t.Any]] The few-shot examples.

Raises

source apply_prompt(examples: dict[str, t.Any], few_shot_examples: list[dict[str, t.Any]], model_config: ModelConfig, dataset_config: DatasetConfig, instruction_model: bool, always_populate_text_field: bool, tokenizer: PreTrainedTokenizer | None)dict[str, t.Any]

Apply prompt template to an example, potentially with few-shot examples.

Parameters

  • examples : dict[str, t.Any] The examples to apply the few-shot examples to.

  • few_shot_examples : list[dict[str, t.Any]] The few-shot examples to apply.

  • dataset_config : DatasetConfig The dataset configuration.

  • instruction_model : bool Whether the model is instruction-tuned.

  • always_populate_text_field : bool Whether to always populate the 'text' field in the examples, as opposed to the 'messages' field.

  • tokenizer : PreTrainedTokenizer | None The tokenizer to use for the model. If None, the tokenizer is not used.

Returns

  • dict[str, t.Any] The example with the few-shot examples applied.

Raises

  • ValueError

  • NotImplementedError