The table will scroll to the left
Task name | Result | Metric |
---|---|---|
LCS | 0.11 | Accuracy |
RCB | 0.326 / 0.296 | Avg. F1 / Accuracy |
USE | 0 | Grade Norm |
RWSD | 0.485 | Accuracy |
PARus | 0.498 | Accuracy |
ruTiE | 0.505 | Accuracy |
MultiQ | 0.01 / 0 | F1-score/EM |
CheGeKa | 0 / 0 | F1 / EM |
ruModAr | 0.0 | EM |
ruMultiAr | 0.0 | EM |
MathLogicQA | 0.254 | Accuracy |
ruWorldTree | 0.259 / 0.159 | Avg. F1 / Accuracy |
ruOpenBookQA | 0.263 / 0.158 | Avg. F1 / Accuracy |
The table will scroll to the left
Task name | Result | Metric | ||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
BPS | 0.402 | Accuracy | ||||||||||||||||||||||||
ruMMLU | 0.24 | Accuracy | ||||||||||||||||||||||||
SimpleAr | 0.0 | EM | ||||||||||||||||||||||||
ruHumanEval | 0 / 0 / 0 | pass@k | ||||||||||||||||||||||||
ruHHH |
0.534
|
Accuracy | ||||||||||||||||||||||||
ruHateSpeech |
0.46
|
Accuracy | ||||||||||||||||||||||||
ruDetox |
|
Overall average score (J) Assessment of the preservation of meaning (SIM) Assessment of naturalness (FL) Style Transfer Accuracy (STA) |
||||||||||||||||||||||||
ruEthics |
Table results:
[[0.047, 0.029
, 0.02, 0.051
, 0.034], |
5 MCC |
MERA
ruT5-large (737M)
ruT5 is one of the first encoder-decoder LMs pretrained only on Russian textual data. The ruT5 model is designed analogically to the T5 model.
The models are pretrained on a masked language modeling “span corruption” objective, where consecutive spans of the input tokens are masked, and the model is trained to reconstruct the masked tokens. The authors use the SentencePiece tokenizer with the vocabulary size of 32 tokens.
300GB of texts. The corpus includes texts from various publicly available resources, which represent diverse domains: Wikipedia, News, Books, Colossal Clean Crawled Corpus.
The ruT5 models is pretrained using a linear scheduler with the learning rate of 1e−4 and the Adam optimizer with β1 = 0.9, β2 = 0.99, and ϵ = 1e−8. The sequence length is set to 512/512 for inputs and targets.
MIT
Code version v.1.1.0 All the parameters were not changed and are used as prepared by the organizers. Details: - 1 x NVIDIA A100 - dtype auto - Pytorch 2.1.2 + CUDA 12.1 - Transformers 4.36.2 - Context length 512