Skip to content

euroeval.split_utils

source module euroeval.split_utils

Utilities for detecting and mapping dataset splits.

Functions

  • find_split Return the shortest split name containing keyword, or None.

  • get_repo_split_names Extract split names from a Hugging Face dataset repo.

  • get_repo_splits Return the (train, val, test) split names for a Hugging Face dataset repo.

source find_split(splits: list[str], keyword: str)str | None

Return the shortest split name containing keyword, or None.

Parameters

  • splits : list[str] A list of split names.

  • keyword : str The keyword to search for.

Returns

  • str | None The shortest split name containing keyword, or None if no such split exists.

source get_repo_split_names(hf_api: HfApi, dataset_id: str)list[str]

Extract split names from a Hugging Face dataset repo.

Parameters

  • hf_api : HfApi The Hugging Face API object.

  • dataset_id : str The ID of the dataset to get the split names for.

Returns

  • list[str] A list of split names.

source get_repo_splits(hf_api: HfApi, dataset_id: str)tuple[str | None, str | None, str | None]

Return the (train, val, test) split names for a Hugging Face dataset repo.

Parameters

  • hf_api : HfApi The Hugging Face API object.

  • dataset_id : str The ID of the dataset to get the split names for.

Returns

  • tuple[str | None, str | None, str | None] A 3-tuple (train_split, val_split, test_split) where each element is either the name of the matching split or None if no such split exists.