Leaderboard

The aggregated score for the ranking allows for a fair comparison of models, even when they have been evaluated on different task sets: models receive zeros for any missed tasks, while the results from the tasks they attempted are averaged with equal weight for all tasks. This approach produces a single score that remains consistent and enables the comparison of scores derived from different sets of tasks.

The public rating of the results is customizable. Use the filter to select the task and models you are interested in, and compare only the models and tasks that are important to you.

{{ name }}
Filters
Filters
Tasks
All tasks
Private
{{ task.title }}
Models
All the submissions
{{ submit.name }}
Model, team
Result
{{ i + 1 }}
{{ submit.name }} {{ transformSize(submit.size) }}
{{ submit.team_name }}
{{ getScore(submit) }}
{{ task.title }}
{{ submit.task_scores?.[task.title.replace('-', '_')] ?? '-' }}

Nothing was found

Change or reset the request