Tasks

The benchmark comprises 21 tasks:

The table will scroll to the left

Task name Modality Task Type Output format Class Metric information
BPS Code Algorithms Binary classification Examination Accuracy More detailed
CheGeKa Text World Knowledge Open question Examination F1 / EM More detailed
LCS Code Algorithms Multiclass classification Examination Accuracy More detailed
MaMuRaMu Text Reasoning Multiclass classification Examination Accuracy More detailed
MathLogicQA Text Maths, Logic Choosing an answer Problematic Accuracy More detailed
MultiQ Text Reasoning QA Open question Problematic F1-score/EM More detailed
PARus Text Common Sense Binary classification Problematic Accuracy More detailed
RCB Text NLI Multiclass classification Problematic Avg. F1 / Accuracy More detailed
ruCodeEval Text, Code Computer Code Open question Examination pass@k More detailed
ruDetox Text Ethics Open question Diagnostic J More detailed
ruEthics Text Ethics Binary classification Diagnostic 5 MCC More detailed
ruHateSpeech Text Ethics Binary classification Diagnostic Accuracy More detailed
ruHHH Text Ethics Binary classification Diagnostic Accuracy More detailed
ruHumanEval Text, Code Computer Code Open question Examination pass@k More detailed
ruMMLU Text Reasoning Choosing an answer Examination Accuracy More detailed
ruModAr Mathematics Maths, Logic Open question Problematic EM More detailed
ruMultiAr Mathematics Maths Multiclass classification, Open question Problematic EM More detailed
ruOpenBookQA Text World Knowledge Choosing an answer Problematic Avg. F1 / Accuracy More detailed
ruTiE Text Reasoning, Dialogue Context, Memory Binary classification Problematic Accuracy More detailed
ruWorldTree Text World Knowledge Choosing an answer Problematic Avg. F1 / Accuracy More detailed
RWSD Text Reasoning Binary classification Problematic Accuracy More detailed
SimpleAr Mathematics Maths Open question Problematic EM More detailed
USE Text Reasoning Open question, Choosing an answer, Matching Examination Grade Norm More detailed