euroeval.yaml_config¶
source module euroeval.yaml_config
Load dataset configurations from YAML files.
This module handles all YAML-related functionality for loading dataset
configurations, including Inspect AI-compatible eval.yaml files from
Hugging Face Hub repositories.
Functions
-
load_yaml_config — Load a dataset config from an eval.yaml file in a Hugging Face repo.
-
load_dataset_config_from_yaml — Load a dataset config from a YAML file.
-
promote_field_spec_fields — Promote column names from field_spec to top-level keys.
-
validate_and_get_task — Validate the task field or infer it from Inspect AI hints.
-
infer_task_from_inspect_ai — Try to infer the EuroEval task from Inspect AI YAML fields.
-
parse_languages — Parse language codes from YAML or use fallbacks.
-
build_kwargs — Build keyword arguments for
DatasetConfigfrom YAML fields. -
parse_string_field — Parse and validate a string field from YAML.
-
parse_int_field — Parse and validate an integer field from YAML.
-
load_yaml_file — Load a YAML file and return its contents as a dictionary.
source load_yaml_config(hf_api: HfApi, dataset_id: str, cache_dir: Path) → DatasetConfig | None
Load a dataset config from an eval.yaml file in a Hugging Face repo.
Parameters
-
hf_api : HfApi — The Hugging Face API object.
-
dataset_id : str — The ID of the dataset to get the config for.
-
cache_dir : Path — The directory to store the cache in.
Returns
-
DatasetConfig | None — The dataset config if it exists, otherwise None.
source load_dataset_config_from_yaml(yaml_path: Path, fallback_language_codes: list[str] | None = None) → DatasetConfig | None
Load a dataset config from a YAML file.
The file is fully compatible with the Inspect AI eval.yaml format
(https://inspect.aisi.org.uk/tasks.html#hugging-face). The EuroEval-specific
task and languages keys are optional:
task-- if absent, the task is inferred from Inspect AI hints: a solver withname: multiple_choiceor afield_spec.choicesentry both map to themultiple-choicetask. If the task cannot be inferred an error is logged and None is returned.languages-- if absent, thefallback_language_codesargument (a list of ISO 639-1 codes) is used. When called fromtry_get_dataset_config_from_repo, the Hugging Face Hub repo metadata supplies this fallback automatically. If neither source provides a language list, English ("en") is used as the final fallback and a warning is logged.
Column mappings may be specified either as flat top-level keys
(input_column / target_column / choices_column) or via a
tasks[0].field_spec block using the Inspect AI input / target /
choices sub-keys. Top-level keys take precedence when both are present.
tasks[0].split is used as the test split. try_get_dataset_config_from_repo
auto-detects the train and val splits from the repository, and also uses
tasks[0].config as the HuggingFace dataset config/subset name.
When reading field_spec:
field_spec.inputis used asinput_column.field_spec.targetis used astarget_columnonly when it is a plain column name. Inspect AI also allows"literal:<value>"(a hard-coded answer string) and bare integers (which Inspect AI maps to letters A, B, C ...); both are silently skipped because they are not column names.field_spec.choicesis used aschoices_column(a single column name or a list of column names).
Example -- EuroEval flat format
task: classification
languages:
- en
labels:
- positive
- negative
Example -- pure Inspect AI format (task and languages are inferred automatically)
# eval.yaml -- no EuroEval-specific keys required
name: My Dataset
tasks:
- id: my_dataset
split: test
field_spec:
input: question
target: answer
choices: options
solvers:
- name: multiple_choice
scorers:
- name: choice
Example -- Inspect AI format with optional EuroEval overrides
# eval.yaml
name: My Dataset
tasks:
- id: my_dataset
split: test
field_spec:
input: text
target: label
solvers:
- name: multiple_choice
scorers:
- name: choice
# EuroEval-specific keys (optional; ignored by Inspect AI)
task: multiple-choice
languages:
- en
Parameters
-
yaml_path : Path — Path to the YAML config file.
-
fallback_language_codes : list[str] | None — ISO 639-1 language codes to use when the YAML file does not contain a
languageskey. Typically supplied from HuggingFace Hub repo metadata bytry_get_dataset_config_from_repo.
Returns
-
DatasetConfig | None — A
DatasetConfigbuilt from the YAML data, or None if the file could not be parsed or contains invalid values.
source promote_field_spec_fields(raw: dict[str, object]) → None
Promote column names from field_spec to top-level keys.
Promotes the following mappings when the top-level key is not already set:
field_spec.input->input_columnfield_spec.target->target_column(only if plain, not literal/int)field_spec.choices->choices_columntasks[0].split->test_split
Parameters
-
raw : dict[str, object] — The parsed YAML data to modify in place.
source validate_and_get_task(raw: dict[str, object], yaml_path: Path) → Task | None
Validate the task field or infer it from Inspect AI hints.
Parameters
-
raw : dict[str, object] — The parsed YAML data.
-
yaml_path : Path — Path to the YAML config file (for error messages).
Returns
-
Task | None — A valid Task object, or None if validation failed.
source infer_task_from_inspect_ai(raw: dict[str, object], task_map: dict[str, Task]) → Task | None
Try to infer the EuroEval task from Inspect AI YAML fields.
Currently detects:
- A solver with
name: multiple_choiceintasks[0].solvers->multiple-choice - A
choiceskey intasks[0].field_spec->multiple-choice - A scorer with
name: model_graded_factintasks[0].scorers->open-ended-qatask with an LLM-as-a-judge metric. The judge model is read fromscorers[0].args.model; when absent, the default judge defined inOPEN_ENDED_QAis used.
Parameters
-
raw : dict[str, object] — The raw YAML data.
-
task_map : dict[str, Task] — The mapping from task names to task objects.
Returns
-
Task | None — The inferred task, or None if the task cannot be inferred.
source parse_languages(raw: dict[str, object], fallback_codes: list[str] | None, yaml_path: Path) → list[Language] | None
Parse language codes from YAML or use fallbacks.
Parameters
-
raw : dict[str, object] — The parsed YAML data.
-
fallback_codes : list[str] | None — ISO 639-1 language codes to use as a fallback.
-
yaml_path : Path — Path to the YAML config file (for error messages).
Returns
-
list[Language] | None — A list of Language objects, or None if validation failed.
source build_kwargs(raw: dict[str, object], yaml_path: Path) → dict[str, str | int | list[str] | dict[str, str]] | None
Build keyword arguments for DatasetConfig from YAML fields.
Reads the following optional fields from raw and maps them to the
corresponding DatasetConfig constructor arguments:
- String fields:
prompt_prefix,prompt_template,instruction_prompt,input_column,target_column,test_split. - Integer fields:
num_few_shot_examples,max_generated_tokens. labels-- a list of strings.prompt_label_mapping-- a mapping from strings to strings.choices_column-- a string or list of strings.
Parameters
-
raw : dict[str, object] — The parsed YAML data.
-
yaml_path : Path — Path to the YAML config file (for error messages).
Returns
-
dict[str, str | int | list[str] | dict[str, str]] | None — A dictionary suitable for unpacking into
DatasetConfig(...), or None if any field fails validation.
source parse_string_field(raw: dict[str, object], field_name: str, yaml_path: Path) → str | None
Parse and validate a string field from YAML.
Parameters
-
raw : dict[str, object] — The parsed YAML data.
-
field_name : str — The name of the field to parse.
-
yaml_path : Path — Path to the YAML config file (for error messages).
Returns
-
str | None — The field value as a string, or None if validation failed.
source parse_int_field(raw: dict[str, object], field_name: str, yaml_path: Path) → int | None
Parse and validate an integer field from YAML.
Parameters
-
raw : dict[str, object] — The parsed YAML data.
-
field_name : str — The name of the field to parse.
-
yaml_path : Path — Path to the YAML config file (for error messages).
Returns
-
int | None — The field value as an integer, or None if validation failed.
source load_yaml_file(yaml_path: Path) → dict[str, object] | None
Load a YAML file and return its contents as a dictionary.
Parameters
-
yaml_path : Path — Path to the YAML config file.
Returns
-
dict[str, object] | None — The parsed YAML content as a dictionary, or None if parsing failed.