The benchmark comprises 21 tasks:
The table will scroll to the left
Task name | Modality | Task Type | Output format | Class | Metric | information |
---|---|---|---|---|---|---|
BPS | Code | Algorithms | Binary classification | Examination | Accuracy | More detailed |
CheGeKa | Text | World Knowledge | Open question | Examination | F1 / EM | More detailed |
LCS | Code | Algorithms | Multiclass classification | Examination | Accuracy | More detailed |
MaMuRaMu | Text | Reasoning | Multiclass classification | Examination | Accuracy | More detailed |
MathLogicQA | Text | Maths, Logic | Choosing an answer | Problematic | Accuracy | More detailed |
MultiQ | Text | Reasoning QA | Open question | Problematic | F1-score/EM | More detailed |
PARus | Text | Common Sense | Binary classification | Problematic | Accuracy | More detailed |
RCB | Text | NLI | Multiclass classification | Problematic | Avg. F1 / Accuracy | More detailed |
ruCodeEval | Text, Code | Computer Code | Open question | Examination | pass@k | More detailed |
ruDetox | Text | Ethics | Open question | Diagnostic | J | More detailed |
ruEthics | Text | Ethics | Binary classification | Diagnostic | 5 MCC | More detailed |
ruHateSpeech | Text | Ethics | Binary classification | Diagnostic | Accuracy | More detailed |
ruHHH | Text | Ethics | Binary classification | Diagnostic | Accuracy | More detailed |
ruHumanEval | Text, Code | Computer Code | Open question | Examination | pass@k | More detailed |
ruMMLU | Text | Reasoning | Choosing an answer | Examination | Accuracy | More detailed |
ruModAr | Mathematics | Maths, Logic | Open question | Problematic | EM | More detailed |
ruMultiAr | Mathematics | Maths | Multiclass classification, Open question | Problematic | EM | More detailed |
ruOpenBookQA | Text | World Knowledge | Choosing an answer | Problematic | Avg. F1 / Accuracy | More detailed |
ruTiE | Text | Reasoning, Dialogue Context, Memory | Binary classification | Problematic | Accuracy | More detailed |
ruWorldTree | Text | World Knowledge | Choosing an answer | Problematic | Avg. F1 / Accuracy | More detailed |
RWSD | Text | Reasoning | Binary classification | Problematic | Accuracy | More detailed |
SimpleAr | Mathematics | Maths | Open question | Problematic | EM | More detailed |
USE | Text | Reasoning | Open question, Choosing an answer, Matching | Examination | Grade Norm | More detailed |