euroeval.metrics.llm_as_a_judge
source module euroeval.metrics.llm_as_a_judge
Metrics based on LLM-as-a-judge.
Classes
-
LLMAsAJudgeMetric — Use an LLM to judge the quality of the predictions.
-
Fluency — Response format for the fluency metric.
source class LLMAsAJudgeMetric(name: str, pretty_name: str, judge_id: str, judge_kwargs: dict[str, t.Any], user_prompt: str, response_format: t.Type[BaseModel], scoring_fn: t.Callable[[BaseModel | None], float], condition_formatting_fn: t.Callable[[str], str] = lambda x: x, system_prompt: str | None = None)
Bases : Metric
Use an LLM to judge the quality of the predictions.
Initialise the LLM as a judge metric.
Parameters
-
name : str — The name of the metric in snake_case.
-
pretty_name : str — The pretty name of the metric, used for display purposes.
-
judge_id : str — The model ID of the LLM to use as a judge.
-
judge_kwargs : dict[str, t.Any] — Generation parameters for the judge model, such as temperature.
-
user_prompt : str — The user prompt to use for the judge model. The prompt should be formatted with the variables
prediction
andcondition
, to include the model predictions and a description of what the prediction should be judged on, respectively. If the condition is not needed, it can be omitted from the prompt, but theprediction
variable must still be present. -
response_format : t.Type[BaseModel] — The response format to use for the judge model. This should be a Pydantic model that defines the expected structure of the judge's response.
-
scoring_fn : t.Callable[[BaseModel | None], float] — A function that takes the judge's response and returns a score.
-
condition_formatting_fn : optional — A function to format the condition string before it is included in the user prompt. Defaults to a no-op function that returns the input unchanged.
-
system_prompt : optional — The system prompt to use for the judge model. If not provided, no system prompt will be used.
source class Fluency(**data: Any)
Bases : BaseModel
Response format for the fluency metric.
Create a new model by parsing and validating input data from keyword arguments.
Raises [ValidationError
][pydantic_core.ValidationError] if the input data cannot be
validated to form a valid model.
self
is explicitly positional-only to allow self
as a field name.
Attributes
-
fluency : t.Annotated[int, Field(ge=1, le=5)] — The fluency rating, an integer between 1 and 5.
-
model_config : ClassVar[ConfigDict] — Configuration for the model, should be a dictionary conforming to [
ConfigDict
][pydantic.config.ConfigDict]. -
model_extra : dict[str, Any] | None — Get extra fields set during validation.
-
model_fields_set : set[str] — Returns the set of fields that have been explicitly set on this model instance.