FRED-T5 1.7B

Created at 12.01.2024 11:14

General assessment: 0.191

The table will scroll to the left

Task name Result Metric
BPS 0.508 Accuracy
LCS 0.088 Accuracy
RCB 0.333 / 0.167 Avg. F1 / Accuracy
USE 0 Grade Norm
RWSD 0.5 Accuracy
PARus 0.498 Accuracy
ruTiE 0.495 Accuracy
MultiQ 0.031 / 0.001 F1-score/EM
ruMMLU 0.262 Accuracy
CheGeKa 0.006 / 0 F1 / EM
ruModAr 0.001 Accuracy
SimpleAr 0.0 Accuracy
ruMultiAr 0.0 Accuracy
MathLogicQA 0.246 Accuracy
ruHumanEval 0 / 0 / 0 pass@k
ruWorldTree 0.255 / 0.13 Avg. F1 / Accuracy
ruOpenBookQA 0.25 / 0.129 Avg. F1 / Accuracy

Evaluation on diagnostic datasets:

It is not taken into account in the overall rating

The table will scroll to the left

Task name Result Metric
ruHHH

0.494

  • Honest: 0.508
  • Harmless: 0.5
  • Helpful: 0.475
Accuracy
ruHateSpeech

0.543

  • Women : 0.519
  • Man : 0.686
  • LGBT : 0.588
  • Nationality : 0.595
  • Migrants : 0.286
  • Other : 0.492
Accuracy
ruDetox
  • 0.124
  • 0.315
  • 0.559
  • 0.277

Overall average score (J)

Assessment of the preservation of meaning (SIM)

Assessment of naturalness (FL)

Style Transfer Accuracy (STA)

ruEthics
Correct God Ethical
Virtue 0 0 0
Law 0 0 0
Moral 0 0 0
Justice 0 0 0
Utilitarianism 0 0 0

Table results:

[[0, 0 , 0, 0 , 0],
[0, 0 , 0, 0 , 0],
[0, 0 , 0, 0 , 0]]

5 MCC

Information about the submission:

Team:

MERA

Name of the ML model:

FRED-T5 1.7B

Additional links:

https://arxiv.org/abs/2309.10931

Architecture description:

FRED-T5 (Full-scale Russian Enhanced Denoisers) is an encoder-decoder model based on T5 and UL2. Number of attantion heads 24. The dimensions of the hidden layers 1536. GELU activation function.

Description of the training:

Bbpe tokenizer. 50257 + special tokens 107. Prefix tokens: '<LM>', '<SC1>',.. '<SC6>'. Drawing inspiration from Tay et al. (2022), the FRED-T5 1.7.B (or XL) model was pretrained on a mixture of denoisers (MoD), a pretraining objective that represents a set of diverse pretraining objectives. The R-Denoiser is a masked language modeling span corruption objective used in T5. The S-Denoiser follows the language modeling objective, where the input sequence is split into the context and target tokens so that the targets do not rely on future information. The X-Denoiser aims to recover a large part of the input based on the span corruption and language modeling objectives.

Pretrain data:

It was trained on Russian language corpus (300GB).

Training Details:

FRED-T5 is pretrained using a linear scheduler with the initial learning rate of 1e−4 and the Adafactor optimizer (Shazeer and Stern, 2018) with β1 = 0.9, β2 = 0.99, and ϵ = 1e−8. The sequence length is set to 512/512 for inputs and targets. The FRED-T5-XL models is pretrained with a total batch size of 2048 and for 45 days on 112 A100 GPUs.

License:

MIT

Strategy, generation and parameters:

Code version v.1.1.0 All the parameters were not changed and are used as prepared by the organizers. Details: - 1 x NVIDIA A100 - dtype auto - Pytorch 2.1.2 + CUDA 12.1 - Transformers 4.36.2 - Context length 512