euroeval.preprocessing¶
source module euroeval.preprocessing
Preprocessing utilities for custom dataset column mapping.
Functions
-
merge_input_and_choices — Merge input text and choices into a single text field.
-
build_preprocessing_func — Build a preprocessing function from column mapping arguments.
source merge_input_and_choices(example: dict, input_column: str, choices_column: str | list[str], choices_label: str) → dict
Merge input text and choices into a single text field.
Parameters
-
example : dict — A single dataset example with at least the
input_columnand the column(s) named bychoices_column. -
input_column : str — The name of the column containing the input text.
-
choices_column : str | list[str] — Either the name of a single column containing a list of answer-choice strings, or a list of column names each containing a single answer-choice string.
-
choices_label : str — The language-specific label for the choices section (e.g.
"Choices").
Returns
-
dict — The example with a new
"text"key containing the merged input and formatted choices.
source build_preprocessing_func(dataset_name: str, task_group: TaskGroup, input_column: str, target_column: str | None, choices_column: str | list[str] | None, choices_label: str) → c.Callable[[DatasetDict], DatasetDict]
Build a preprocessing function from column mapping arguments.
The returned function renames or merges columns in a DatasetDict to match the framework's standard column names:
- If
input_columndiffers from"text"(withoutchoices_column), it is renamed to"text". - If
choices_columnis given,input_columnandchoices_columnare merged into a single"text"column formatted as::
<input_text>
<choices_label>:
a. <choice_0>
b. <choice_1>
...
- If
target_columnis given, it is renamed to the task-group standard:"labels"for token classification,"target_text"for text-to-text, and"label"for everything else.
Parameters
-
dataset_name : str — The name of the dataset, used in error messages.
-
task_group : TaskGroup — The task group, used to determine the standard target column name.
-
input_column : str — Column to rename to
"text". When combined withchoices_column, the two are merged into a formatted"text"column instead. Defaults to"text"(no rename). -
target_column : str | None — Column to rename to the task-appropriate standard target column name.
-
choices_column : str | list[str] | None — Either the name of a single column containing a list of answer-choice strings, or a list of column names each containing a single answer-choice string, to merge with the input column.
-
choices_label : str — The language-specific label for the choices section (e.g.
"Choices").
Returns
-
c.Callable[[DatasetDict], DatasetDict] — A callable that accepts a
DatasetDictand returns a preprocessedDatasetDict.
Raises