euroeval.split_utils¶
source module euroeval.split_utils
Utilities for detecting and mapping dataset splits.
Functions
-
find_split — Return the shortest split name containing
keyword, or None. -
get_repo_split_names — Extract split names from a Hugging Face dataset repo.
-
get_repo_splits — Return the (train, val, test) split names for a Hugging Face dataset repo.
source find_split(splits: list[str], keyword: str) → str | None
Return the shortest split name containing keyword, or None.
Parameters
-
splits : list[str] — A list of split names.
-
keyword : str — The keyword to search for.
Returns
-
str | None — The shortest split name containing
keyword, or None if no such split exists.
source get_repo_split_names(hf_api: HfApi, dataset_id: str) → list[str]
Extract split names from a Hugging Face dataset repo.
Parameters
-
hf_api : HfApi — The Hugging Face API object.
-
dataset_id : str — The ID of the dataset to get the split names for.
Returns
-
list[str] — A list of split names.
source get_repo_splits(hf_api: HfApi, dataset_id: str) → tuple[str | None, str | None, str | None]
Return the (train, val, test) split names for a Hugging Face dataset repo.
Parameters
-
hf_api : HfApi — The Hugging Face API object.
-
dataset_id : str — The ID of the dataset to get the split names for.
Returns
-
tuple[str | None, str | None, str | None] — A 3-tuple (train_split, val_split, test_split) where each element is either the name of the matching split or None if no such split exists.