Skip to content

euroeval.custom_dataset_configs

source module euroeval.custom_dataset_configs

Load custom dataset configs.

This module provides the main entry point for loading dataset configurations from Hugging Face repositories, including Python-based configs. YAML-specific loading logic lives in the yaml_config module.

Functions

source load_custom_datasets_module(custom_datasets_file: Path)ModuleType | None

Load the custom datasets module if it exists.

Parameters

  • custom_datasets_file : Path The path to the custom datasets module.

Returns

  • ModuleType | None The custom datasets module, or None if it does not exist.

source try_get_dataset_config_from_repo(dataset_id: str, api_key: str | None, cache_dir: Path, trust_remote_code: bool, run_with_cli: bool)DatasetConfig | None

Try to get a dataset config from a Hugging Face dataset repository.

The function first looks for a YAML config file (eval.yaml) which can be loaded without executing any remote code. If no YAML file is present the function falls back to euroeval_config.py, which requires trust_remote_code=True.

Parameters

  • dataset_id : str The ID of the dataset to get the config for.

  • api_key : str | None The Hugging Face API key to use to check if the repositories have custom dataset configs.

  • cache_dir : Path The directory to store the cache in.

  • trust_remote_code : bool Whether to trust remote code. Only required when loading a Python config (euroeval_config.py). YAML configs never require this flag.

  • run_with_cli : bool Whether the code is being run with the CLI.

Returns

  • DatasetConfig | None The dataset config if it exists, otherwise None.

source load_python_config(hf_api: HfApi, dataset_id: str, cache_dir: Path, trust_remote_code: bool, run_with_cli: bool)DatasetConfig | None

Load a dataset config from a euroeval_config.py file in a Hugging Face repo.

Parameters

  • hf_api : HfApi The Hugging Face API object.

  • dataset_id : str The ID of the dataset to get the config for.

  • cache_dir : Path The directory to store the cache in.

  • trust_remote_code : bool Whether to trust remote code.

  • run_with_cli : bool Whether the code is being run with the CLI.

Returns

  • DatasetConfig | None The dataset config if it exists, otherwise None.