Skip to content

euroeval.preprocessing

source module euroeval.preprocessing

Preprocessing utilities for custom dataset column mapping.

Functions

source merge_input_and_choices(example: dict, input_column: str, choices_column: str | list[str], choices_label: str)dict

Merge input text and choices into a single text field.

Parameters

  • example : dict A single dataset example with at least the input_column and the column(s) named by choices_column.

  • input_column : str The name of the column containing the input text.

  • choices_column : str | list[str] Either the name of a single column containing a list of answer-choice strings, or a list of column names each containing a single answer-choice string.

  • choices_label : str The language-specific label for the choices section (e.g. "Choices").

Returns

  • dict The example with a new "text" key containing the merged input and formatted choices.

source build_preprocessing_func(dataset_name: str, task_group: TaskGroup, input_column: str, target_column: str | None, choices_column: str | list[str] | None, choices_label: str)c.Callable[[DatasetDict], DatasetDict]

Build a preprocessing function from column mapping arguments.

The returned function renames or merges columns in a DatasetDict to match the framework's standard column names:

  • If input_column differs from "text" (without choices_column), it is renamed to "text".
  • If choices_column is given, input_column and choices_column are merged into a single "text" column formatted as::
<input_text>
<choices_label>:
a. <choice_0>
b. <choice_1>
...
  • If target_column is given, it is renamed to the task-group standard: "labels" for token classification, "target_text" for text-to-text, and "label" for everything else.

Parameters

  • dataset_name : str The name of the dataset, used in error messages.

  • task_group : TaskGroup The task group, used to determine the standard target column name.

  • input_column : str Column to rename to "text". When combined with choices_column, the two are merged into a formatted "text" column instead. Defaults to "text" (no rename).

  • target_column : str | None Column to rename to the task-appropriate standard target column name.

  • choices_column : str | list[str] | None Either the name of a single column containing a list of answer-choice strings, or a list of column names each containing a single answer-choice string, to merge with the input column.

  • choices_label : str The language-specific label for the choices section (e.g. "Choices").

Returns

  • c.Callable[[DatasetDict], DatasetDict] A callable that accepts a DatasetDict and returns a preprocessed DatasetDict.

Raises