Таблица скроллится влево
Задача | Результат | Метрика |
---|---|---|
LCS | 0.124 | Accuracy |
RCB | 0.331 / 0.178 | Avg. F1 / Accuracy |
USE | 0.016 | Grade Norm |
RWSD | 0.481 | Accuracy |
PARus | 0.506 | Accuracy |
ruTiE | 0.519 | Accuracy |
MultiQ | 0.119 / 0.044 | F1-score/EM |
CheGeKa | 0.018 / 0 | F1 / EM |
ruModAr | 0.476 | EM |
ruMultiAr | 0.176 | EM |
MathLogicQA | 0.353 | Accuracy |
ruWorldTree | 0.766 / 0.765 | Avg. F1 / Accuracy |
ruOpenBookQA | 0.675 / 0.676 | Avg. F1 / Accuracy |
Таблица скроллится влево
Задача | Результат | Метрика | ||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
BPS | 0.521 | Accuracy | ||||||||||||||||||||||||
ruMMLU | 0.613 | Accuracy | ||||||||||||||||||||||||
SimpleAr | 0.927 | EM | ||||||||||||||||||||||||
ruHumanEval | 0.005 / 0.023 / 0.037 | pass@k | ||||||||||||||||||||||||
ruHHH |
0.517
|
Accuracy | ||||||||||||||||||||||||
ruHateSpeech |
0.551
|
Accuracy | ||||||||||||||||||||||||
ruDetox |
|
Общая средняя оценка (J) Оценка сохранения смысла (SIM) Оценка натуральности (FL) Точность переноса стиля (STA) |
||||||||||||||||||||||||
ruEthics |
Результаты таблицы:
[[-0.033, -0.041
, -0.029, -0.046
, -0.015], |
5 MCC |
MERA
davinci-002
GPT base model from OpenAI. Details are not disclosed.
GPT base model from OpenAI. Details are not disclosed.
GPT base model from OpenAI. Details are not disclosed.
GPT base model from OpenAI. Details are not disclosed.
Apache 2.0 license
Code version v.1.1.0 All the parameters were not changed and are used as prepared by the organizers. Details: - OpenAI 1.10.0 - Tiktoken 0.5.2 - Context length 2049