euroeval.languages¶
source module euroeval.languages
List of languages and their language codes.
The language codes contain both all the ISO 639-1 codes, as well as the ISO 639-3 codes for languages that do not have an ISO 639-1 code.
Classes
-
Language — A benchmarkable language.
Functions
-
get_all_languages — Get a list of all the languages.
-
get_correct_language_codes — Get correct language code(s).
source dataclass Language(code: str, name: str, _and_separator: str | None = field(repr=False, default=None), _or_separator: str | None = field(repr=False, default=None))
A benchmarkable language.
Attributes
-
code : str — The ISO 639-1 language code of the language.
-
name : str — The name of the language.
-
and_separator : optional — The word 'and' in the language.
-
or_separator : optional — The word 'or' in the language.
source property Language.and_separator: str
Get the word 'and' in the language.
Returns
-
str — The word 'and' in the language.
Raises
-
NotImplementedError — If
and_separatorisNone.
source property Language.or_separator: str
Get the word 'or' in the language.
Returns
-
str — The word 'or' in the language.
Raises
-
NotImplementedError — If
or_separatorisNone.
source get_all_languages() → dict[str, Language]
Get a list of all the languages.
Returns
-
dict[str, Language] — A mapping between language codes and their configurations.
source get_correct_language_codes(language_codes: str | c.Sequence[str]) → c.Sequence[str]
Get correct language code(s).
Parameters
-
language_codes : str | c.Sequence[str] — The language codes of the languages to include, both for models and datasets. Here 'no' means both Bokmål (nb) and Nynorsk (nn). Set this to 'all' if all languages should be considered.
Returns
-
c.Sequence[str] — The correct language codes.