Leaderboard

The aggregated score for the ranking allows for a fair comparison of models, even when they have been evaluated on different task sets: models receive zeros for any missed tasks, while the results from the tasks they attempted are averaged with equal weight for all tasks. This approach produces a single score that remains consistent and enables the comparison of scores derived from different sets of tasks.

The public rating of the results is customizable. Use the filter to select the task and models you are interested in, and compare only the models and tasks that are important to you.

Filters

Model, team

Result

{{ task.title }}
{{ submit.task_scores?.[task.title.replace('-', '_')] ?? '-' }}

#	Model, team	Total overall	{{ task.title }}
{{ i + 1 }}	{{ submit.name }} {{ submit.size <= 0 ? '' : ('(' + transformSize(submit.size) + ') ') }}- {{ submit.team_name }}	{{ submit.score }}	{{ submit.task_scores?.[task.title.replace('-', '_')] ?? '-' }}

Nothing was found

Change or reset the request