euroeval.yaml_config¶

Load dataset configurations from YAML files.

This module handles all YAML-related functionality for loading dataset configurations, including Inspect AI-compatible eval.yaml files from Hugging Face Hub repositories.

Functions

load_yaml_config — Load a dataset config from an eval.yaml file in a Hugging Face repo.
load_dataset_config_from_yaml — Load a dataset config from a YAML file.
promote_field_spec_fields — Promote column names from field_spec to top-level keys.
validate_and_get_task — Validate the task field or infer it from Inspect AI hints.
infer_task_from_inspect_ai — Try to infer the EuroEval task from Inspect AI YAML fields.
parse_languages — Parse language codes from YAML or use fallbacks.
build_kwargs — Build keyword arguments for DatasetConfig from YAML fields.
parse_string_field — Parse and validate a string field from YAML.
parse_int_field — Parse and validate an integer field from YAML.
load_yaml_file — Load a YAML file and return its contents as a dictionary.

source load_yaml_config(hf_api: HfApi, dataset_id: str, cache_dir: Path) → DatasetConfig | None

Load a dataset config from an eval.yaml file in a Hugging Face repo.

Parameters

hf_api : HfApi — The Hugging Face API object.
dataset_id : str — The ID of the dataset to get the config for.
cache_dir : Path — The directory to store the cache in.

Returns

DatasetConfig | None — The dataset config if it exists, otherwise None.

source load_dataset_config_from_yaml(yaml_path: Path, fallback_language_codes: list[str] | None = None) → DatasetConfig | None

Load a dataset config from a YAML file.

The file is fully compatible with the Inspect AI eval.yaml format (https://inspect.aisi.org.uk/tasks.html#hugging-face). The EuroEval-specific task and languages keys are optional:

task -- if absent, the task is inferred from Inspect AI hints: a solver with name: multiple_choice or a field_spec.choices entry both map to the multiple-choice task. If the task cannot be inferred an error is logged and None is returned.
languages -- if absent, the fallback_language_codes argument (a list of ISO 639-1 codes) is used. When called from try_get_dataset_config_from_repo, the Hugging Face Hub repo metadata supplies this fallback automatically. If neither source provides a language list, English ("en") is used as the final fallback and a warning is logged.

Column mappings may be specified either as flat top-level keys (input_column / target_column / choices_column) or via a tasks[0].field_spec block using the Inspect AI input / target / choices sub-keys. Top-level keys take precedence when both are present.

tasks[0].split is used as the test split. try_get_dataset_config_from_repo auto-detects the train and val splits from the repository, and also uses tasks[0].config as the HuggingFace dataset config/subset name.

When reading field_spec:

field_spec.input is used as input_column.
field_spec.target is used as target_column only when it is a plain column name. Inspect AI also allows "literal:<value>" (a hard-coded answer string) and bare integers (which Inspect AI maps to letters A, B, C ...); both are silently skipped because they are not column names.
field_spec.choices is used as choices_column (a single column name or a list of column names).

Example -- EuroEval flat format

task: classification
languages:
  - en
labels:
  - positive
  - negative

Example -- pure Inspect AI format (task and languages are inferred automatically)

# eval.yaml -- no EuroEval-specific keys required
name: My Dataset
tasks:
  - id: my_dataset
    split: test
    field_spec:
      input: question
      target: answer
      choices: options
    solvers:
      - name: multiple_choice
    scorers:
      - name: choice

Example -- Inspect AI format with optional EuroEval overrides

# eval.yaml
name: My Dataset
tasks:
  - id: my_dataset
    split: test
    field_spec:
      input: text
      target: label
    solvers:
      - name: multiple_choice
    scorers:
      - name: choice
# EuroEval-specific keys (optional; ignored by Inspect AI)
task: multiple-choice
languages:
  - en

Parameters

yaml_path : Path — Path to the YAML config file.
fallback_language_codes : list[str] | None — ISO 639-1 language codes to use when the YAML file does not contain a languages key. Typically supplied from HuggingFace Hub repo metadata by try_get_dataset_config_from_repo.

Returns

DatasetConfig | None — A DatasetConfig built from the YAML data, or None if the file could not be parsed or contains invalid values.

source promote_field_spec_fields(raw: dict[str, object]) → None

Promote column names from field_spec to top-level keys.

Promotes the following mappings when the top-level key is not already set:

field_spec.input -> input_column
field_spec.target -> target_column (only if plain, not literal/int)
field_spec.choices -> choices_column
tasks[0].split -> test_split

Parameters

raw : dict[str, object] — The parsed YAML data to modify in place.

source validate_and_get_task(raw: dict[str, object], yaml_path: Path) → Task | None

Validate the task field or infer it from Inspect AI hints.

Parameters

raw : dict[str, object] — The parsed YAML data.
yaml_path : Path — Path to the YAML config file (for error messages).

Returns

Task | None — A valid Task object, or None if validation failed.

source infer_task_from_inspect_ai(raw: dict[str, object], task_map: dict[str, Task]) → Task | None

Try to infer the EuroEval task from Inspect AI YAML fields.

Currently detects:

A solver with name: multiple_choice in tasks[0].solvers -> multiple-choice
A choices key in tasks[0].field_spec -> multiple-choice
A scorer with name: model_graded_fact in tasks[0].scorers -> open-ended-qa task with an LLM-as-a-judge metric. The judge model is read from scorers[0].args.model; when absent, the default judge defined in OPEN_ENDED_QA is used.

Parameters

raw : dict[str, object] — The raw YAML data.
task_map : dict[str, Task] — The mapping from task names to task objects.

Returns

Task | None — The inferred task, or None if the task cannot be inferred.

source parse_languages(raw: dict[str, object], fallback_codes: list[str] | None, yaml_path: Path) → list[Language] | None

Parse language codes from YAML or use fallbacks.

Parameters

raw : dict[str, object] — The parsed YAML data.
fallback_codes : list[str] | None — ISO 639-1 language codes to use as a fallback.
yaml_path : Path — Path to the YAML config file (for error messages).

Returns

list[Language] | None — A list of Language objects, or None if validation failed.

source build_kwargs(raw: dict[str, object], yaml_path: Path) → dict[str, str | int | list[str] | dict[str, str]] | None

Build keyword arguments for DatasetConfig from YAML fields.

Reads the following optional fields from raw and maps them to the corresponding DatasetConfig constructor arguments:

String fields: prompt_prefix, prompt_template, instruction_prompt, input_column, target_column, test_split.
Integer fields: num_few_shot_examples, max_generated_tokens.
labels -- a list of strings.
prompt_label_mapping -- a mapping from strings to strings.
choices_column -- a string or list of strings.

Parameters

raw : dict[str, object] — The parsed YAML data.
yaml_path : Path — Path to the YAML config file (for error messages).

Returns

dict[str, str | int | list[str] | dict[str, str]] | None — A dictionary suitable for unpacking into DatasetConfig(...), or None if any field fails validation.

source parse_string_field(raw: dict[str, object], field_name: str, yaml_path: Path) → str | None

Parse and validate a string field from YAML.

Parameters

raw : dict[str, object] — The parsed YAML data.
field_name : str — The name of the field to parse.
yaml_path : Path — Path to the YAML config file (for error messages).

Returns

str | None — The field value as a string, or None if validation failed.

source parse_int_field(raw: dict[str, object], field_name: str, yaml_path: Path) → int | None

Parse and validate an integer field from YAML.

Parameters

raw : dict[str, object] — The parsed YAML data.
field_name : str — The name of the field to parse.
yaml_path : Path — Path to the YAML config file (for error messages).

Returns

int | None — The field value as an integer, or None if validation failed.

source load_yaml_file(yaml_path: Path) → dict[str, object] | None

Load a YAML file and return its contents as a dictionary.

Parameters

yaml_path : Path — Path to the YAML config file.

Returns

dict[str, object] | None — The parsed YAML content as a dictionary, or None if parsing failed.