Skip to content

euroeval.languages

source module euroeval.languages

List of languages and their language codes.

The language codes contain both all the ISO 639-1 codes, as well as the ISO 639-3 codes for languages that do not have an ISO 639-1 code.

Classes

  • Language A benchmarkable language.

Functions

source dataclass Language(code: str, name: str, _and_separator: str | None = field(repr=False, default=None), _or_separator: str | None = field(repr=False, default=None))

A benchmarkable language.

Attributes

  • code : str The ISO 639-1 language code of the language.

  • name : str The name of the language.

  • and_separator : optional The word 'and' in the language.

  • or_separator : optional The word 'or' in the language.

source property Language.and_separator: str

Get the word 'and' in the language.

Returns

  • str The word 'and' in the language.

Raises

source property Language.or_separator: str

Get the word 'or' in the language.

Returns

  • str The word 'or' in the language.

Raises

source get_all_languages()dict[str, Language]

Get a list of all the languages.

Returns

  • dict[str, Language] A mapping between language codes and their configurations.

source get_correct_language_codes(language_codes: str | c.Sequence[str])c.Sequence[str]

Get correct language code(s).

Parameters

  • language_codes : str | c.Sequence[str] The language codes of the languages to include, both for models and datasets. Here 'no' means both Bokmål (nb) and Nynorsk (nn). Set this to 'all' if all languages should be considered.

Returns

  • c.Sequence[str] The correct language codes.