Leaderboard

The final score is obtained by averaging the task scores, excluding public tasks from the computation of the final score. In cases where tasks have multiple metrics, these metrics are also averaged.

The leaderboard is calculated using the updated code and datasets of the MERA v1.2.0 benchmark. The previous leaderboard is no longer supported and is available here.

Model type
{{ name }}
Measurement method
{{ name }}
Model size
{{ name }}
 
Chat Template
 
System promt
 
Multi-turn
Main tasks
Final leaderboard scores are composed of models' performance on 15 general tasks. Test answers for the datasets are kept private.
Open tasks
Public Tasks refer to datasets from public test datasets that are not included in the overall rating. For these tasks, all answers are publicly available. These datasets include the sets widely used in the community, diagnostic experimental datasets related to ethics and stereotypes of models, and sanity checks for instructional models (ex., SimpleAr).
Ratings by subcategory
Tasks by subcategories are the general and open tasks for which, in addition to the general result, you can view evaluations by categories. For example, for MaMuRaMu or ruMMLU, you can compare model evaluations for specific domains.
Filters
Select tasks
All tasks
{{ task.title }}
{{ task.leaderboard_description }}
Select domains
All domains
{{ domain.title }}
Expand the list of domains Collapse the list of domains
Metric: {{ subcategoriesFilters.activeTask.subcategories.metric }}
Filters Reset
Chat Template
System promt
Multi-turn
Model type
{{ name }}
Measurement method
{{ name }}
Model size
{{ name }}
Main tasks Open tasks
All tasks
{{ task.title }}
{{ task.leaderboard_description }}
Select domains
All tasks
{{ domain.title }}
{{ group.title }}
Model, team
Result
{{ i + 1 }}
{{ submit.name }} {{ transformSize(submit.size) }}
{{ submit.team_name }}
{{ submit.score }}
{{ task.title }}
{{ getTaskScore(submit, task) }}
{{ column.title }}
{{ getSubcategoryColumnValue(submit, column) }}

There are no suitable results