ruT5-large (737M)

Created at 12.01.2024 11:21

General assessment: 0.19

The table will scroll to the left

Task name Result Metric
BPS 0.402 Accuracy
LCS 0.11 Accuracy
RCB 0.326 / 0.296 Avg. F1 / Accuracy
USE 0 Grade Norm
RWSD 0.485 Accuracy
PARus 0.498 Accuracy
ruTiE 0.505 Accuracy
MultiQ 0.01 / 0 F1-score/EM
ruMMLU 0.24 Accuracy
CheGeKa 0 / 0 F1 / EM
ruModAr 0.0 Accuracy
SimpleAr 0.0 Accuracy
ruMultiAr 0.0 Accuracy
MathLogicQA 0.254 Accuracy
ruHumanEval 0 / 0 / 0 pass@k
ruWorldTree 0.259 / 0.159 Avg. F1 / Accuracy
ruOpenBookQA 0.263 / 0.158 Avg. F1 / Accuracy

Evaluation on diagnostic datasets:

It is not taken into account in the overall rating

The table will scroll to the left

Task name Result Metric
ruHHH

0.534

  • Honest: 0.525
  • Harmless: 0.552
  • Helpful: 0.525
Accuracy
ruHateSpeech

0.46

  • Women : 0.481
  • Man : 0.343
  • LGBT : 0.353
  • Nationality : 0.405
  • Migrants : 0.714
  • Other : 0.525
Accuracy
ruDetox
  • 0.193
  • 0.4
  • 0.671
  • 0.593

Overall average score (J)

Assessment of the preservation of meaning (SIM)

Assessment of naturalness (FL)

Style Transfer Accuracy (STA)

ruEthics
Correct God Ethical
Virtue 0.047 0.084 0.017
Law 0.029 0.058 -0.026
Moral 0.02 0.055 0.002
Justice 0.051 0.081 -0.006
Utilitarianism 0.034 0.055 0.028

Table results:

[[0.047, 0.029 , 0.02, 0.051 , 0.034],
[0.084, 0.058 , 0.055, 0.081 , 0.055],
[0.017, -0.026 , 0.002, -0.006 , 0.028]]

5 MCC

Information about the submission:

Team:

MERA

Name of the ML model:

ruT5-large (737M)

Additional links:

https://arxiv.org/abs/2309.10931

Architecture description:

ruT5 is one of the first encoder-decoder LMs pretrained only on Russian textual data. The ruT5 model is designed analogically to the T5 model.

Description of the training:

The models are pretrained on a masked language modeling “span corruption” objective, where consecutive spans of the input tokens are masked, and the model is trained to reconstruct the masked tokens. The authors use the SentencePiece tokenizer with the vocabulary size of 32 tokens.

Pretrain data:

300GB of texts. The corpus includes texts from various publicly available resources, which represent diverse domains: Wikipedia, News, Books, Colossal Clean Crawled Corpus.

Training Details:

The ruT5 models is pretrained using a linear scheduler with the learning rate of 1e−4 and the Adam optimizer with β1 = 0.9, β2 = 0.99, and ϵ = 1e−8. The sequence length is set to 512/512 for inputs and targets.

License:

MIT

Strategy, generation and parameters:

Code version v.1.1.0 All the parameters were not changed and are used as prepared by the organizers. Details: - 1 x NVIDIA A100 - dtype auto - Pytorch 2.1.2 + CUDA 12.1 - Transformers 4.36.2 - Context length 512