euroeval.task_group_utils.multiple_choice_classification

source module euroeval.task_group_utils.multiple_choice_classification

Utility functions related to the multiple-choice classification task group.

Classes

MultipleChoiceClassificationTrainer — Trainer subclass for multiple-choice classification tasks.

Functions

prepare_examples — Prepare the features.
postprocess_predictions_and_labels — Postprocess the predictions and labels.

source class MultipleChoiceClassificationTrainer(model: Union[PreTrainedModel, nn.Module, None] = None, args: TrainingArguments = None, data_collator: Optional[DataCollator] = None, train_dataset: Optional[Union[Dataset, IterableDataset, 'datasets.Dataset']] = None, eval_dataset: Optional[Union[Dataset, dict[str, Dataset], 'datasets.Dataset']] = None, processing_class: Optional[Union[PreTrainedTokenizerBase, BaseImageProcessor, FeatureExtractionMixin, ProcessorMixin]] = None, model_init: Optional[Callable[[], PreTrainedModel]] = None, compute_loss_func: Optional[Callable] = None, compute_metrics: Optional[Callable[[EvalPrediction], dict]] = None, callbacks: Optional[list[TrainerCallback]] = None, optimizers: tuple[Optional[torch.optim.Optimizer], Optional[torch.optim.lr_scheduler.LambdaLR]] = (None, None), optimizer_cls_and_kwargs: Optional[tuple[type[torch.optim.Optimizer], dict[str, Any]]] = None, preprocess_logits_for_metrics: Optional[Callable[[torch.Tensor, torch.Tensor], torch.Tensor]] = None)

Bases : Trainer

Trainer subclass for multiple-choice classification tasks.

Methods

evaluate — Evaluate the model on the given dataset.

source method MultipleChoiceClassificationTrainer.evaluate(eval_dataset: Dataset | None = None, ignore_keys: list[str] | None = None, metric_key_prefix: str = 'eval') → dict[str, float]

Evaluate the model on the given dataset.

Parameters

eval_dataset : Dataset | None — The dataset to evaluate on. If None, then use the stored evaluation dataset.
ignore_keys : list[str] | None — The keys to ignore when computing the metrics.
metric_key_prefix : str — The prefix to use for the metric keys.

Returns

dict[str, float] — The metrics computed on the evaluation dataset.

source prepare_examples(examples: BatchEncoding, tokenizer: PreTrainedTokenizer) → BatchEncoding

Prepare the features.

Parameters

examples : BatchEncoding — The examples to prepare.
tokenizer : PreTrainedTokenizer — The tokenizer to use to prepare the examples.

Returns

BatchEncoding — The prepared examples.

source postprocess_predictions_and_labels(predictions: np.ndarray, dataset: Dataset) → tuple['Predictions', 'Labels']

Postprocess the predictions and labels.

Parameters

predictions : np.ndarray — The model predictions, of shape (num_examples, 2), corresponding to the False/True probabilities for each example.
dataset : Dataset — The dataset containing the examples.

Returns

tuple['Predictions', 'Labels'] — The postprocessed predictions and labels.

Raises

InvalidBenchmark