Skip to content

euroeval.yaml_config

source module euroeval.yaml_config

Load dataset configurations from YAML files.

This module handles all YAML-related functionality for loading dataset configurations, including Inspect AI-compatible eval.yaml files from Hugging Face Hub repositories.

Functions

source load_yaml_config(hf_api: HfApi, dataset_id: str, cache_dir: Path)DatasetConfig | None

Load a dataset config from an eval.yaml file in a Hugging Face repo.

Parameters

  • hf_api : HfApi The Hugging Face API object.

  • dataset_id : str The ID of the dataset to get the config for.

  • cache_dir : Path The directory to store the cache in.

Returns

  • DatasetConfig | None The dataset config if it exists, otherwise None.

source load_dataset_config_from_yaml(yaml_path: Path, fallback_language_codes: list[str] | None = None)DatasetConfig | None

Load a dataset config from a YAML file.

The file is fully compatible with the Inspect AI eval.yaml format (https://inspect.aisi.org.uk/tasks.html#hugging-face). The EuroEval-specific task and languages keys are optional:

  • task -- if absent, the task is inferred from Inspect AI hints: a solver with name: multiple_choice or a field_spec.choices entry both map to the multiple-choice task. If the task cannot be inferred an error is logged and None is returned.
  • languages -- if absent, the fallback_language_codes argument (a list of ISO 639-1 codes) is used. When called from try_get_dataset_config_from_repo, the Hugging Face Hub repo metadata supplies this fallback automatically. If neither source provides a language list, English ("en") is used as the final fallback and a warning is logged.

Column mappings may be specified either as flat top-level keys (input_column / target_column / choices_column) or via a tasks[0].field_spec block using the Inspect AI input / target / choices sub-keys. Top-level keys take precedence when both are present.

tasks[0].split is used as the test split. try_get_dataset_config_from_repo auto-detects the train and val splits from the repository, and also uses tasks[0].config as the HuggingFace dataset config/subset name.

When reading field_spec:

  • field_spec.input is used as input_column.
  • field_spec.target is used as target_column only when it is a plain column name. Inspect AI also allows "literal:<value>" (a hard-coded answer string) and bare integers (which Inspect AI maps to letters A, B, C ...); both are silently skipped because they are not column names.
  • field_spec.choices is used as choices_column (a single column name or a list of column names).

Example -- EuroEval flat format

task: classification
languages:
  - en
labels:
  - positive
  - negative

Example -- pure Inspect AI format (task and languages are inferred automatically)

# eval.yaml -- no EuroEval-specific keys required
name: My Dataset
tasks:
  - id: my_dataset
    split: test
    field_spec:
      input: question
      target: answer
      choices: options
    solvers:
      - name: multiple_choice
    scorers:
      - name: choice

Example -- Inspect AI format with optional EuroEval overrides

# eval.yaml
name: My Dataset
tasks:
  - id: my_dataset
    split: test
    field_spec:
      input: text
      target: label
    solvers:
      - name: multiple_choice
    scorers:
      - name: choice
# EuroEval-specific keys (optional; ignored by Inspect AI)
task: multiple-choice
languages:
  - en

Parameters

  • yaml_path : Path Path to the YAML config file.

  • fallback_language_codes : list[str] | None ISO 639-1 language codes to use when the YAML file does not contain a languages key. Typically supplied from HuggingFace Hub repo metadata by try_get_dataset_config_from_repo.

Returns

  • DatasetConfig | None A DatasetConfig built from the YAML data, or None if the file could not be parsed or contains invalid values.

source promote_field_spec_fields(raw: dict[str, object])None

Promote column names from field_spec to top-level keys.

Promotes the following mappings when the top-level key is not already set:

  • field_spec.input -> input_column
  • field_spec.target -> target_column (only if plain, not literal/int)
  • field_spec.choices -> choices_column
  • tasks[0].split -> test_split

Parameters

  • raw : dict[str, object] The parsed YAML data to modify in place.

source validate_and_get_task(raw: dict[str, object], yaml_path: Path)Task | None

Validate the task field or infer it from Inspect AI hints.

Parameters

  • raw : dict[str, object] The parsed YAML data.

  • yaml_path : Path Path to the YAML config file (for error messages).

Returns

  • Task | None A valid Task object, or None if validation failed.

source infer_task_from_inspect_ai(raw: dict[str, object], task_map: dict[str, Task])Task | None

Try to infer the EuroEval task from Inspect AI YAML fields.

Currently detects:

  • A solver with name: multiple_choice in tasks[0].solvers -> multiple-choice
  • A choices key in tasks[0].field_spec -> multiple-choice
  • A scorer with name: model_graded_fact in tasks[0].scorers -> open-ended-qa task with an LLM-as-a-judge metric. The judge model is read from scorers[0].args.model; when absent, the default judge defined in OPEN_ENDED_QA is used.

Parameters

  • raw : dict[str, object] The raw YAML data.

  • task_map : dict[str, Task] The mapping from task names to task objects.

Returns

  • Task | None The inferred task, or None if the task cannot be inferred.

source parse_languages(raw: dict[str, object], fallback_codes: list[str] | None, yaml_path: Path)list[Language] | None

Parse language codes from YAML or use fallbacks.

Parameters

  • raw : dict[str, object] The parsed YAML data.

  • fallback_codes : list[str] | None ISO 639-1 language codes to use as a fallback.

  • yaml_path : Path Path to the YAML config file (for error messages).

Returns

  • list[Language] | None A list of Language objects, or None if validation failed.

source build_kwargs(raw: dict[str, object], yaml_path: Path)dict[str, str | int | list[str] | dict[str, str]] | None

Build keyword arguments for DatasetConfig from YAML fields.

Reads the following optional fields from raw and maps them to the corresponding DatasetConfig constructor arguments:

  • String fields: prompt_prefix, prompt_template, instruction_prompt, input_column, target_column, test_split.
  • Integer fields: num_few_shot_examples, max_generated_tokens.
  • labels -- a list of strings.
  • prompt_label_mapping -- a mapping from strings to strings.
  • choices_column -- a string or list of strings.

Parameters

  • raw : dict[str, object] The parsed YAML data.

  • yaml_path : Path Path to the YAML config file (for error messages).

Returns

  • dict[str, str | int | list[str] | dict[str, str]] | None A dictionary suitable for unpacking into DatasetConfig(...), or None if any field fails validation.

source parse_string_field(raw: dict[str, object], field_name: str, yaml_path: Path)str | None

Parse and validate a string field from YAML.

Parameters

  • raw : dict[str, object] The parsed YAML data.

  • field_name : str The name of the field to parse.

  • yaml_path : Path Path to the YAML config file (for error messages).

Returns

  • str | None The field value as a string, or None if validation failed.

source parse_int_field(raw: dict[str, object], field_name: str, yaml_path: Path)int | None

Parse and validate an integer field from YAML.

Parameters

  • raw : dict[str, object] The parsed YAML data.

  • field_name : str The name of the field to parse.

  • yaml_path : Path Path to the YAML config file (for error messages).

Returns

  • int | None The field value as an integer, or None if validation failed.

source load_yaml_file(yaml_path: Path)dict[str, object] | None

Load a YAML file and return its contents as a dictionary.

Parameters

  • yaml_path : Path Path to the YAML config file.

Returns

  • dict[str, object] | None The parsed YAML content as a dictionary, or None if parsing failed.