euroeval.task_group_utils.multiple_choice_classification¶
source module euroeval.task_group_utils.multiple_choice_classification
Utility functions related to the multiple-choice classification task group.
Classes
-
MultipleChoiceClassificationTrainer — Trainer subclass for multiple-choice classification tasks.
Functions
-
prepare_examples — Prepare the features.
-
postprocess_predictions_and_labels — Postprocess the predictions and labels.
source class MultipleChoiceClassificationTrainer(model: PreTrainedModel | nn.Module | None = None, args: TrainingArguments | None = None, data_collator: DataCollator | None = None, train_dataset: Dataset | IterableDataset | datasets.Dataset | None = None, eval_dataset: Dataset | dict[str, Dataset] | datasets.Dataset | None = None, processing_class: PreTrainedTokenizerBase | BaseImageProcessor | FeatureExtractionMixin | ProcessorMixin | None = None, model_init: Callable[..., PreTrainedModel] | None = None, compute_loss_func: Callable | None = None, compute_metrics: Callable[[EvalPrediction], dict] | None = None, callbacks: list[TrainerCallback] | None = None, optimizers: tuple[torch.optim.Optimizer | None, torch.optim.lr_scheduler.LambdaLR | None] = (None, None), optimizer_cls_and_kwargs: tuple[type[torch.optim.Optimizer], dict[str, Any]] | None = None, preprocess_logits_for_metrics: Callable[[torch.Tensor, torch.Tensor], torch.Tensor] | None = None)
Bases : Trainer
Trainer subclass for multiple-choice classification tasks.
Methods
-
evaluate — Evaluate the model on the given dataset.
source method MultipleChoiceClassificationTrainer.evaluate(eval_dataset: Dataset | None = None, ignore_keys: list[str] | None = None, metric_key_prefix: str = 'eval') → dict[str, float]
Evaluate the model on the given dataset.
Parameters
-
eval_dataset : Dataset | None — The dataset to evaluate on. If None, then use the stored evaluation dataset.
-
ignore_keys : list[str] | None — The keys to ignore when computing the metrics.
-
metric_key_prefix : str — The prefix to use for the metric keys.
Returns
-
dict[str, float] — The metrics computed on the evaluation dataset.
source prepare_examples(examples: BatchEncoding, tokeniser: PreTrainedTokenizer) → BatchEncoding
Prepare the features.
Parameters
-
examples : BatchEncoding — The examples to prepare.
-
tokeniser : PreTrainedTokenizer — The tokeniser to use to prepare the examples.
Returns
-
BatchEncoding — The prepared examples.
source postprocess_predictions_and_labels(predictions: np.ndarray, dataset: Dataset) → tuple['Predictions', 'Labels']
Postprocess the predictions and labels.
Parameters
-
predictions : np.ndarray — The model predictions, of shape (num_examples, 2), corresponding to the False/True probabilities for each example.
-
dataset : Dataset — The dataset containing the examples.
Returns
-
tuple['Predictions', 'Labels'] — The postprocessed predictions and labels.
Raises
-
InvalidBenchmark — If the predictions are not a 2D array with shape (num_examples, 2).