The table will scroll to the left
Task name | Result | Metric |
---|---|---|
LCS | 0.088 | Accuracy |
RCB | 0.333 / 0.167 | Avg. F1 / Accuracy |
USE | 0 | Grade Norm |
RWSD | 0.5 | Accuracy |
PARus | 0.498 | Accuracy |
ruTiE | 0.495 | Accuracy |
MultiQ | 0.031 / 0.001 | F1-score/EM |
CheGeKa | 0.006 / 0 | F1 / EM |
ruModAr | 0.001 | EM |
ruMultiAr | 0.0 | EM |
MathLogicQA | 0.246 | Accuracy |
ruWorldTree | 0.255 / 0.13 | Avg. F1 / Accuracy |
ruOpenBookQA | 0.25 / 0.129 | Avg. F1 / Accuracy |
The table will scroll to the left
Task name | Result | Metric | ||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
BPS | 0.508 | Accuracy | ||||||||||||||||||||||||
ruMMLU | 0.262 | Accuracy | ||||||||||||||||||||||||
SimpleAr | 0.0 | EM | ||||||||||||||||||||||||
ruHumanEval | 0 / 0 / 0 | pass@k | ||||||||||||||||||||||||
ruHHH |
0.494
|
Accuracy | ||||||||||||||||||||||||
ruHateSpeech |
0.543
|
Accuracy | ||||||||||||||||||||||||
ruDetox |
|
Overall average score (J) Assessment of the preservation of meaning (SIM) Assessment of naturalness (FL) Style Transfer Accuracy (STA) |
||||||||||||||||||||||||
ruEthics |
Table results:
[[0, 0
, 0, 0
, 0], |
5 MCC |
MERA
FRED-T5 1.7B
FRED-T5 (Full-scale Russian Enhanced Denoisers) is an encoder-decoder model based on T5 and UL2. Number of attantion heads 24. The dimensions of the hidden layers 1536. GELU activation function.
Bbpe tokenizer. 50257 + special tokens 107. Prefix tokens: '<LM>', '<SC1>',.. '<SC6>'. Drawing inspiration from Tay et al. (2022), the FRED-T5 1.7.B (or XL) model was pretrained on a mixture of denoisers (MoD), a pretraining objective that represents a set of diverse pretraining objectives. The R-Denoiser is a masked language modeling span corruption objective used in T5. The S-Denoiser follows the language modeling objective, where the input sequence is split into the context and target tokens so that the targets do not rely on future information. The X-Denoiser aims to recover a large part of the input based on the span corruption and language modeling objectives.
It was trained on Russian language corpus (300GB).
FRED-T5 is pretrained using a linear scheduler with the initial learning rate of 1e−4 and the Adafactor optimizer (Shazeer and Stern, 2018) with β1 = 0.9, β2 = 0.99, and ϵ = 1e−8. The sequence length is set to 512/512 for inputs and targets. The FRED-T5-XL models is pretrained with a total batch size of 2048 and for 45 days on 112 A100 GPUs.
MIT
Code version v.1.1.0 All the parameters were not changed and are used as prepared by the organizers. Details: - 1 x NVIDIA A100 - dtype auto - Pytorch 2.1.2 + CUDA 12.1 - Transformers 4.36.2 - Context length 512