The table will scroll to the left
Task name | Result | Metric |
---|---|---|
LCS | 0.124 | Accuracy |
RCB | 0.331 / 0.178 | Avg. F1 / Accuracy |
USE | 0.016 | Grade Norm |
RWSD | 0.481 | Accuracy |
PARus | 0.506 | Accuracy |
ruTiE | 0.519 | Accuracy |
MultiQ | 0.119 / 0.044 | F1-score/EM |
CheGeKa | 0.018 / 0 | F1 / EM |
ruModAr | 0.476 | EM |
ruMultiAr | 0.176 | EM |
MathLogicQA | 0.353 | Accuracy |
ruWorldTree | 0.766 / 0.765 | Avg. F1 / Accuracy |
ruOpenBookQA | 0.675 / 0.676 | Avg. F1 / Accuracy |
The table will scroll to the left
Task name | Result | Metric | ||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
BPS | 0.521 | Accuracy | ||||||||||||||||||||||||
ruMMLU | 0.613 | Accuracy | ||||||||||||||||||||||||
SimpleAr | 0.927 | EM | ||||||||||||||||||||||||
ruHumanEval | 0.005 / 0.023 / 0.037 | pass@k | ||||||||||||||||||||||||
ruHHH |
0.517
|
Accuracy | ||||||||||||||||||||||||
ruHateSpeech |
0.551
|
Accuracy | ||||||||||||||||||||||||
ruDetox |
|
Overall average score (J) Assessment of the preservation of meaning (SIM) Assessment of naturalness (FL) Style Transfer Accuracy (STA) |
||||||||||||||||||||||||
ruEthics |
Table results:
[[-0.033, -0.041
, -0.029, -0.046
, -0.015], |
5 MCC |
MERA
davinci-002
GPT base model from OpenAI. Details are not disclosed.
GPT base model from OpenAI. Details are not disclosed.
GPT base model from OpenAI. Details are not disclosed.
GPT base model from OpenAI. Details are not disclosed.
Apache 2.0 license
Code version v.1.1.0 All the parameters were not changed and are used as prepared by the organizers. Details: - OpenAI 1.10.0 - Tiktoken 0.5.2 - Context length 2049