euroeval.generation_utils
source module euroeval.generation_utils
Utility functions related to generative models.
Functions
-
extract_few_shot_examples — Extract few-shot examples from a dataset.
-
apply_prompt — Apply prompt template to an example, potentially with few-shot examples.
-
raise_if_wrong_params — Raise an error if the model configuration has invalid parameters.
source extract_few_shot_examples(dataset: DatasetDict, dataset_config: DatasetConfig, benchmark_config: BenchmarkConfig, itr_idx: int) → list[dict[str, t.Any]]
Extract few-shot examples from a dataset.
This will always extract the examples from the training split.
We ensure that the few-shot examples are unique by picking them one at a time.
Parameters
-
dataset : DatasetDict — The dataset to extract the few-shot examples from.
-
dataset_config : DatasetConfig — The dataset configuration.
-
benchmark_config : BenchmarkConfig — The benchmark configuration.
-
itr_idx : int — The index of the dataset in the iterator.
Returns
-
list[dict[str, t.Any]] — The few-shot examples.
Raises
-
InvalidBenchmark — If there are not enough short examples for few-shot learning.
-
NotImplementedError
source apply_prompt(examples: dict[str, t.Any], few_shot_examples: list[dict[str, t.Any]], model_config: ModelConfig, dataset_config: DatasetConfig, generative_type: GenerativeType | None, always_populate_text_field: bool, tokeniser: PreTrainedTokenizer | None) → dict[str, t.Any]
Apply prompt template to an example, potentially with few-shot examples.
Parameters
-
examples : dict[str, t.Any] — The examples to apply the few-shot examples to.
-
few_shot_examples : list[dict[str, t.Any]] — The few-shot examples to apply.
-
model_config : ModelConfig — The model configuration.
-
dataset_config : DatasetConfig — The dataset configuration.
-
generative_type : GenerativeType | None — The generative type of the model.
-
always_populate_text_field : bool — Whether to always populate the 'text' field in the examples, as opposed to the 'messages' field.
-
tokeniser : PreTrainedTokenizer | None — The tokeniser to use for the model. If None, the tokeniser is not used.
Returns
-
dict[str, t.Any] — The example with the few-shot examples applied.
Raises
-
ValueError
-
NotImplementedError
source raise_if_wrong_params(model_config: ModelConfig, allowed_params: dict[re.Pattern, list[str]]) → None
Raise an error if the model configuration has invalid parameters.
Parameters
-
model_config : ModelConfig — The model configuration.
-
allowed_params : dict[re.Pattern, list[str]] — The allowed parameters for the model, being a dictionary mapping a regex pattern matching the model ID to a list of allowed parameters for those models.
Raises
-
InvalidModel — If the model configuration has invalid parameters.