euroeval.benchmark_modules.litellm

source module euroeval.benchmark_modules.litellm

Generative models from an inference API, using the LiteLLM framework.

Classes

LiteLLMModel — A generative model from LiteLLM.

Functions

raise_if_wrong_params — Raise an error if the model configuration has invalid parameters.
try_download_ollama_model — Try to download an Ollama model.

source class LiteLLMModel(model_config: ModelConfig, dataset_config: DatasetConfig, benchmark_config: BenchmarkConfig)

Bases : BenchmarkModule

A generative model from LiteLLM.

Initialise the model.

Parameters

model_config : ModelConfig — The model configuration.
dataset_config : DatasetConfig — The dataset configuration.
benchmark_config : BenchmarkConfig — The benchmark configuration.

Attributes

num_params : int — The number of parameters in the model.
generative_type : GenerativeType | None — Get the generative type of the model.
vocab_size : int — The vocabulary size of the model.
model_max_length : int — The maximum length of the model.
data_collator : c.Callable[[list[t.Any]], dict[str, t.Any]] — The data collator used to prepare samples during finetuning.
compute_metrics : ComputeMetricsFunction — The function used to compute the metrics.
extract_labels_from_generation : ExtractLabelsFunction — The function used to extract the labels from the generated output.
trainer_class : t.Type['Trainer'] — The Trainer class to use for finetuning.

Methods

generate — Generate outputs from the model.
model_exists — Check if a model exists.
get_model_config — Fetch the model configuration.
prepare_dataset — Prepare the dataset for the model.

source property LiteLLMModel.generative_type: GenerativeType | None

Get the generative type of the model.

Returns

GenerativeType | None — The generative type of the model, or None if it has not been set yet.

source method LiteLLMModel.generate(inputs: dict) → GenerativeModelOutput

Generate outputs from the model.

Parameters

inputs : dict — A batch of inputs to pass through the model.

Returns

GenerativeModelOutput — The generated model outputs.

Raises

InvalidBenchmark

source property LiteLLMModel.num_params: int

The number of parameters in the model.

Returns

int — The number of parameters in the model.

source property LiteLLMModel.vocab_size: int

The vocabulary size of the model.

Returns

int — The vocabulary size of the model.

source property LiteLLMModel.model_max_length: int

The maximum length of the model.

Returns

int — The maximum length of the model.

source property LiteLLMModel.data_collator: c.Callable[[list[t.Any]], dict[str, t.Any]]

The data collator used to prepare samples during finetuning.

Returns

c.Callable[[list[t.Any]], dict[str, t.Any]] — The data collator.

source property LiteLLMModel.extract_labels_from_generation: ExtractLabelsFunction

The function used to extract the labels from the generated output.

Returns

ExtractLabelsFunction — The function used to extract the labels from the generated output.

source property LiteLLMModel.trainer_class: t.Type['Trainer']

The Trainer class to use for finetuning.

Returns

t.Type['Trainer'] — The Trainer class.

source classmethod LiteLLMModel.model_exists(model_id: str, benchmark_config: BenchmarkConfig) → bool | NeedsExtraInstalled | NeedsEnvironmentVariable

Check if a model exists.

Parameters

model_id : str — The model ID.
benchmark_config : BenchmarkConfig — The benchmark configuration.

Returns

bool | NeedsExtraInstalled | NeedsEnvironmentVariable — Whether the model exists, or an error describing why we cannot check whether the model exists.

Raises

e

source classmethod LiteLLMModel.get_model_config(model_id: str, benchmark_config: BenchmarkConfig) → ModelConfig

Fetch the model configuration.

Parameters

model_id : str — The model ID.
benchmark_config : BenchmarkConfig — The benchmark configuration.

Returns

ModelConfig — The model configuration.

source method LiteLLMModel.prepare_dataset(dataset: DatasetDict, task: Task, itr_idx: int) → DatasetDict

Prepare the dataset for the model.

This includes things like tokenisation.

Parameters

dataset : DatasetDict — The dataset to prepare.
task : Task — The task to prepare the dataset for.
itr_idx : int — The index of the dataset in the iterator.

Returns

DatasetDict — The prepared dataset.

source raise_if_wrong_params(model_config: ModelConfig, allowed_params: dict[str, list[str]]) → None

Raise an error if the model configuration has invalid parameters.

Parameters

model_config : ModelConfig — The model configuration.
allowed_params : dict[str, list[str]] — The allowed parameters for the model.

Raises

InvalidModel — If the model configuration has invalid parameters.

source try_download_ollama_model(model_id: str) → bool

Try to download an Ollama model.

Parameters

model_id : str — The model ID. If the model does not start with "ollama/" or "ollama_chat/" then this function will return False.

Returns

bool — Whether the model was downloaded successfully.

Raises

InvalidModel — If Ollama is not running or the model cannot be downloaded.