Tasks

A catalog of multimodal tasks for evaluating modern LLMs.
The taxonomy indicates which skills of the model are tested by the test

Choose a modality
All tasks
Private
{{ name }}
Tested skills
All taxa
{{ skill.title }}
Expand the list of skills Collapse the list of skills
Filters Reset
Choose a modality
{{ name }}
Tested skills
{{ skill.title }}
Filters
The name of the set
Top Score | Human Baseline
Metric
For tasks with two or more metrics, the score is considered as the average of all metrics.
Tested skills