ruT5-base (222M)

Created at 12.01.2024 11:20

General assessment: 0.193

The table will scroll to the left

Task name Result Metric
BPS 0.486 Accuracy
LCS 0.1 Accuracy
RCB 0.336 / 0.269 Avg. F1 / Accuracy
USE 0 Grade Norm
RWSD 0.481 Accuracy
PARus 0.508 Accuracy
ruTiE 0.493 Accuracy
MultiQ 0.008 / 0 F1-score/EM
ruMMLU 0.237 Accuracy
CheGeKa 0.001 / 0 F1 / EM
ruModAr 0.0 EM
SimpleAr 0.0 EM
ruMultiAr 0.0 EM
MathLogicQA 0.259 Accuracy
ruHumanEval 0 / 0 / 0 pass@k
ruWorldTree 0.234 / 0.151 Avg. F1 / Accuracy
ruOpenBookQA 0.265 / 0.183 Avg. F1 / Accuracy

Evaluation on diagnostic datasets:

It is not taken into account in the overall rating

The table will scroll to the left

Task name Result Metric
ruHHH

0.478

  • Honest: 0.475
  • Harmless: 0.483
  • Helpful: 0.475
Accuracy
ruHateSpeech

0.498

  • Women : 0.491
  • Man : 0.657
  • LGBT : 0.588
  • Nationality : 0.486
  • Migrants : 0.286
  • Other : 0.426
Accuracy
ruDetox
  • 0.003
  • 0.182
  • 0.235
  • 0.079

Overall average score (J)

Assessment of the preservation of meaning (SIM)

Assessment of naturalness (FL)

Style Transfer Accuracy (STA)

ruEthics
Correct God Ethical
Virtue 0.008 -0.001 0.038
Law 0.001 -0.018 0.032
Moral 0.013 0.014 0.042
Justice 0.012 0.019 0.055
Utilitarianism -0.026 0.01 0.033

Table results:

[[0.008, 0.001 , 0.013, 0.012 , -0.026],
[-0.001, -0.018 , 0.014, 0.019 , 0.01],
[0.038, 0.032 , 0.042, 0.055 , 0.033]]

5 MCC

Information about the submission:

Team:

MERA

Name of the ML model:

ruT5-base (222M)

Additional links:

https://arxiv.org/abs/2309.10931

Architecture description:

ruT5 is one of the first encoder-decoder LMs pretrained only on Russian textual data. The ruT5 model is designed analogically to the T5 model.

Description of the training:

The models are pretrained on a masked language modeling “span corruption” objective, where consecutive spans of the input tokens are masked, and the model is trained to reconstruct the masked tokens. The authors use the SentencePiece tokenizer with the vocabulary size of 32 tokens.

Pretrain data:

300GB of texts. The corpus includes texts from various publicly available resources, which represent diverse domains: Wikipedia, News, Books, Colossal Clean Crawled Corpus.

Training Details:

The ruT5 models is pretrained using a linear scheduler with the learning rate of 1e−4 and the Adam optimizer with β1 = 0.9, β2 = 0.99, and ϵ = 1e−8. The sequence length is set to 512/512 for inputs and targets.

License:

MIT

Strategy, generation and parameters:

Code version v.1.1.0 All the parameters were not changed and are used as prepared by the organizers. Details: - 1 x NVIDIA A100 - dtype auto - Pytorch 2.1.2 + CUDA 12.1 - Transformers 4.36.2 - Context length 512