euroeval.generation_utils
source module euroeval.generation_utils
Utility functions related to generative models.
Functions
-
extract_few_shot_examples — Extract few-shot examples from a dataset.
-
apply_prompt — Apply prompt template to an example, potentially with few-shot examples.
source extract_few_shot_examples(dataset: DatasetDict, dataset_config: DatasetConfig, itr_idx: int) → list[dict[str, t.Any]]
Extract few-shot examples from a dataset.
This will always extract the examples from the training split.
We ensure that the few-shot examples are unique by picking them one at a time.
Parameters
-
dataset : DatasetDict — The dataset to extract the few-shot examples from.
-
dataset_config : DatasetConfig — The dataset configuration.
-
itr_idx : int — The index of the dataset in the iterator.
Returns
-
list[dict[str, t.Any]] — The few-shot examples.
Raises
-
NotImplementedError
source apply_prompt(examples: dict[str, t.Any], few_shot_examples: list[dict[str, t.Any]], model_config: ModelConfig, dataset_config: DatasetConfig, instruction_model: bool, always_populate_text_field: bool, tokenizer: PreTrainedTokenizer | None) → dict[str, t.Any]
Apply prompt template to an example, potentially with few-shot examples.
Parameters
-
examples : dict[str, t.Any] — The examples to apply the few-shot examples to.
-
few_shot_examples : list[dict[str, t.Any]] — The few-shot examples to apply.
-
dataset_config : DatasetConfig — The dataset configuration.
-
instruction_model : bool — Whether the model is instruction-tuned.
-
always_populate_text_field : bool — Whether to always populate the 'text' field in the examples, as opposed to the 'messages' field.
-
tokenizer : PreTrainedTokenizer | None — The tokenizer to use for the model. If None, the tokenizer is not used.
Returns
-
dict[str, t.Any] — The example with the few-shot examples applied.
Raises
-
ValueError
-
NotImplementedError