Tasks

The benchmark comprises 23 tasks across various domains, assessing different skills of the model.

Type of task
All tasks
{{ taskName }}
Select domains
All domains
{{ domain.title }}
Expand the list of domains Collapse the list of domains
Filters Reset
Type of task
{{ taskName }}
Select domains
{{ domain.title }}
Filters
The name of the dataset
The ability of the model
Top Score | Human Baseline
Metric