euroeval.eee_utils¶

Utility functions for the Every Eval Ever (EEE) output format.

Functions

benchmark_result_to_eee_dict — Convert a BenchmarkResult to the Every Eval Ever (EEE) format.
benchmark_result_from_eee_dict — Create a BenchmarkResult from an Every Eval Ever format dictionary.
parse_optional_str — Parse a string-encoded optional string value.
parse_optional_bool — Parse a string-encoded optional boolean value.

source benchmark_result_to_eee_dict(result: BenchmarkResult) → dict

Convert a BenchmarkResult to the Every Eval Ever (EEE) format.

Produces a dictionary conforming to the Every Eval Ever JSON schema v0.2.1 (https://github.com/evaleval/every_eval_ever/blob/main/eval.schema.json). The resulting dict can be written directly to euroeval_benchmark_results.jsonl and later reconstructed without loss via benchmark_result_from_eee_dict.

The mapping is as follows:

Top-level fields: schema_version, evaluation_id, evaluation_timestamp, retrieved_timestamp, source_metadata.
model_info: model id/name plus EuroEval-specific details (num_model_parameters, max_sequence_length, vocabulary_size, merge, generative, generative_type) in additional_details.
eval_library: name="euroeval", library version, and evaluation context (languages, task, shot config, library versions, raw per-iteration scores) in additional_details.
evaluation_results: one entry per metric. The 95 % confidence interval half-width stored in the _se keys is exposed as a confidence_interval with confidence_level: 0.95. Speed metrics (test_speed, test_speed_short) do not include score_type, min_score, or max_score because tokens-per-second has no fixed upper bound.

Parameters

result : BenchmarkResult — The benchmark result to convert.

Returns

dict — A dictionary matching the EEE JSON schema v0.2.1.

source benchmark_result_from_eee_dict(config: dict) → BenchmarkResult

Create a BenchmarkResult from an Every Eval Ever format dictionary.

Reconstructs a full BenchmarkResult from a dictionary conforming to the Every Eval Ever (EEE) JSON schema v0.2.1. This function is the inverse of benchmark_result_to_eee_dict and enables lossless round-trips.

Parameters

config : dict — A dictionary conforming to the EEE JSON schema v0.2.1, as produced by benchmark_result_to_eee_dict.

Returns

BenchmarkResult — The reconstructed benchmark result.

source parse_optional_str(value: str | None) → str | None

Parse a string-encoded optional string value.

Parameters

value : str | None — The string to parse. None maps to None.

Returns

str | None — None if value is None, otherwise the original string.

source parse_optional_bool(value: str | None) → bool | None

Parse a string-encoded optional boolean value.

Parameters

value : str | None — The string to parse. None maps to None; any other value is compared case-insensitively to "true".

Returns

bool | None — None if value is None, otherwise a boolean.