Tasks

A catalog of multimodal tasks for evaluating modern LLMs.
The taxonomy indicates which skills of the model are tested by the test

Choose a modality

All tasks

Private

Tested skills

All taxa

Expand the list of skills Collapse the list of skills

Filters

The name of the set

Top Score | Human Baseline

Metric

Tested skills