euroeval.eee_utils¶
source module euroeval.eee_utils
Utility functions for the Every Eval Ever (EEE) output format.
Functions
-
benchmark_result_to_eee_dict — Convert a BenchmarkResult to the Every Eval Ever (EEE) format.
-
benchmark_result_from_eee_dict — Create a BenchmarkResult from an Every Eval Ever format dictionary.
-
parse_optional_str — Parse a string-encoded optional string value.
-
parse_optional_bool — Parse a string-encoded optional boolean value.
source benchmark_result_to_eee_dict(result: BenchmarkResult) → dict
Convert a BenchmarkResult to the Every Eval Ever (EEE) format.
Produces a dictionary conforming to the Every Eval Ever JSON schema v0.2.1
(https://github.com/evaleval/every_eval_ever/blob/main/eval.schema.json).
The resulting dict can be written directly to
euroeval_benchmark_results.jsonl and later reconstructed without loss via
benchmark_result_from_eee_dict.
The mapping is as follows:
- Top-level fields:
schema_version,evaluation_id,evaluation_timestamp,retrieved_timestamp,source_metadata. model_info: modelid/nameplus EuroEval-specific details (num_model_parameters,max_sequence_length,vocabulary_size,merge,generative,generative_type) inadditional_details.eval_library:name="euroeval", library version, and evaluation context (languages, task, shot config, library versions, raw per-iteration scores) inadditional_details.evaluation_results: one entry per metric. The 95 % confidence interval half-width stored in the_sekeys is exposed as aconfidence_intervalwithconfidence_level: 0.95. Speed metrics (test_speed,test_speed_short) do not includescore_type,min_score, ormax_scorebecause tokens-per-second has no fixed upper bound.
Parameters
-
result : BenchmarkResult — The benchmark result to convert.
Returns
-
dict — A dictionary matching the EEE JSON schema v0.2.1.
source benchmark_result_from_eee_dict(config: dict) → BenchmarkResult
Create a BenchmarkResult from an Every Eval Ever format dictionary.
Reconstructs a full BenchmarkResult from a dictionary conforming to the
Every Eval Ever (EEE) JSON schema v0.2.1. This function is the inverse of
benchmark_result_to_eee_dict and enables lossless round-trips.
Parameters
-
config : dict — A dictionary conforming to the EEE JSON schema v0.2.1, as produced by
benchmark_result_to_eee_dict.
Returns
-
BenchmarkResult — The reconstructed benchmark result.
source parse_optional_str(value: str | None) → str | None
Parse a string-encoded optional string value.
Parameters
-
value : str | None — The string to parse.
Nonemaps toNone.
Returns
-
str | None —
Noneif value isNone, otherwise the original string.
source parse_optional_bool(value: str | None) → bool | None
Parse a string-encoded optional boolean value.
Parameters
-
value : str | None — The string to parse.
Nonemaps toNone; any other value is compared case-insensitively to"true".
Returns
-
bool | None —
Noneif value isNone, otherwise a boolean.