Skip to content

euroeval.human_evaluation

source module euroeval.human_evaluation

Gradio app for conducting human evaluation of the tasks.

Classes

  • HumanEvaluator An app for evaluating human performance on the EuroEval benchmark.

Functions

  • main Start the Gradio app for human evaluation.

source class HumanEvaluator(annotator_id: int, title: str, description: str, dummy_model_id: str = 'mistralai/Mistral-7B-v0.1')

An app for evaluating human performance on the EuroEval benchmark.

Initialize the HumanEvaluator.

Parameters

  • annotator_id : int

    The annotator ID for the evaluation.

  • title : str

    The title of the app.

  • description : str

    The description of the app.

  • dummy_model_id : str

    The model ID to use for generating prompts.

Methods

source method HumanEvaluator.create_app()gr.Blocks

Create the Gradio app for human evaluation.

Returns

  • gr.Blocks The Gradio app for human evaluation.

source method HumanEvaluator.update_dataset_choices(language: str | None, task: str | None)Dropdown

Update the dataset choices based on the selected language and task.

Parameters

  • language : str | None

    The language selected by the user.

  • task : str | None

    The task selected by the user.

Returns

  • Dropdown A list of dataset names that match the selected language and task.

source method HumanEvaluator.update_dataset(dataset_name: str, iteration: int)tuple[Markdown, Markdown, Dropdown, Textbox, Button, Button, Textbox, Button]

Update the dataset based on a selected dataset name.

Parameters

  • dataset_name : str

    The dataset name selected by the user.

  • iteration : int

    The iteration index of the datasets to evaluate.

Returns

  • tuple[Markdown, Markdown, Dropdown, Textbox, Button, Button, Textbox, Button] A tuple (task_examples, question, entity_type, entity, entity_add_button, entity_reset_button, answer, submit_button) for the selected dataset.

Raises

  • NotImplementedError

source method HumanEvaluator.add_entity_to_answer(question: str, entity_type: str, entity: str, answer: str)tuple[Textbox, Textbox]

Add an entity to the answer.

Parameters

  • question : str

    The current question.

  • entity_type : str

    The entity type selected by the user.

  • entity : str

    The entity provided by the user.

  • answer : str

    The current answer.

Returns

  • tuple[Textbox, Textbox] A tuple (entity, answer) with a (blank) entity and answer.

source method HumanEvaluator.reset_entities()Textbox

Reset the entities in the answer.

Returns

  • Textbox A blank answer.

source method HumanEvaluator.submit_answer(dataset_name: str, question: str, answer: str, annotator_id: int)tuple[str, str]

Submit an answer to the dataset.

Parameters

  • dataset_name : str

    The name of the dataset.

  • question : str

    The question for the dataset.

  • answer : str

    The answer to the question.

  • annotator_id : int

    The annotator ID for the evaluation.

Returns

  • tuple[str, str] A tuple (question, answer), with question being the next question, and answer being an empty string.

source method HumanEvaluator.example_to_markdown(example: dict)tuple[str, str]

Convert an example to a Markdown string.

Parameters

  • example : dict

    The example to convert.

Returns

  • tuple[str, str] A tuple (task_examples, question) for the example.

source method HumanEvaluator.compute_and_log_scores()None

Computes and logs the scores for the dataset.

source main(annotator_id: int)None

Start the Gradio app for human evaluation.

Raises