Таблица скроллится влево
Задача | Результат | Метрика |
---|---|---|
LCS | 0.086 | Accuracy |
RCB | 0.354 / 0.248 | Avg. F1 / Accuracy |
USE | 0 | Grade Norm |
RWSD | 0.492 | Accuracy |
PARus | 0.492 | Accuracy |
ruTiE | 0.493 | Accuracy |
MultiQ | 0.052 / 0 | F1-score/EM |
CheGeKa | 0.001 / 0 | F1 / EM |
ruModAr | 0.0 | EM |
ruMultiAr | 0.0 | EM |
MathLogicQA | 0.24 | Accuracy |
ruWorldTree | 0.232 / 0.174 | Avg. F1 / Accuracy |
ruOpenBookQA | 0.265 / 0.215 | Avg. F1 / Accuracy |
Таблица скроллится влево
Задача | Результат | Метрика | ||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
BPS | 0.475 | Accuracy | ||||||||||||||||||||||||
ruMMLU | 0.248 | Accuracy | ||||||||||||||||||||||||
SimpleAr | 0.0 | EM | ||||||||||||||||||||||||
ruHumanEval | 0 / 0 / 0 | pass@k | ||||||||||||||||||||||||
ruHHH |
0.472
|
Accuracy | ||||||||||||||||||||||||
ruHateSpeech |
0.543
|
Accuracy | ||||||||||||||||||||||||
ruDetox |
|
Общая средняя оценка (J) Оценка сохранения смысла (SIM) Оценка натуральности (FL) Точность переноса стиля (STA) |
||||||||||||||||||||||||
ruEthics |
Результаты таблицы:
[[0, 0
, 0, 0
, 0], |
5 MCC |
MERA
FRED-T5 large 820M
FRED-T5 (Full-scale Russian Enhanced Denoisers) is an encoder-decoder model based on T5 and UL2. Number of attantion heads 16. The dimensions of the hidden layers 1024 and the fully connected layers 2816. GELU activation function.
Bbpe tokenizer. 50257 + special tokens 107. Prefix tokens: '<LM>', '<SC1>',.. '<SC6>'. Drawing inspiration from Tay et al. (2022), the FRED-T5 1.7.B (or XL) model was pretrained on a mixture of denoisers (MoD), a pretraining objective that represents a set of diverse pretraining objectives. The R-Denoiser is a masked language modeling span corruption objective used in T5. The S-Denoiser follows the language modeling objective, where the input sequence is split into the context and target tokens so that the targets do not rely on future information. The X-Denoiser aims to recover a large part of the input based on the span corruption and language modeling objectives.
It was trained on Russian language corpus (300GB).
FRED-T5 is pretrained using a linear scheduler with the initial learning rate of 1e−4 and the Adafactor optimizer (Shazeer and Stern, 2018) with β1 = 0.9, β2 = 0.99, and ϵ = 1e−8. The sequence length is set to 512/512 for inputs and targets. The FRED-T5-XL models is pretrained pretrained with a total batch size of 2048 for 35 days on 160 V100 GPUs, followed by 5 days on 80 A100 GPUs.
MIT
Code version v.1.1.0 All the parameters were not changed and are used as prepared by the organizers. Details: - 1 x NVIDIA A100 - dtype auto - Pytorch 2.1.2 + CUDA 12.1 - Transformers 4.36.2 - Context length 512