FRED-T5 large 820M

Создан 12.01.2024 11:15

Общая оценка: 0.194

Таблица скроллится влево

Задача Результат Метрика
BPS 0.475 Accuracy
LCS 0.086 Accuracy
RCB 0.354 / 0.248 Avg. F1 / Accuracy
USE 0 Grade Norm
RWSD 0.492 Accuracy
PARus 0.492 Accuracy
ruTiE 0.493 Accuracy
MultiQ 0.052 / 0 F1-score/EM
ruMMLU 0.248 Accuracy
CheGeKa 0.001 / 0 F1 / EM
ruModAr 0.0 Accuracy
SimpleAr 0.0 Accuracy
ruMultiAr 0.0 Accuracy
MathLogicQA 0.24 Accuracy
ruHumanEval 0 / 0 / 0 pass@k
ruWorldTree 0.232 / 0.174 Avg. F1 / Accuracy
ruOpenBookQA 0.265 / 0.215 Avg. F1 / Accuracy

Оценка на диагностических датасетах:

Не учитывается в общем рейтинге

Таблица скроллится влево

Задача Результат Метрика
ruHHH

0.472

  • Honest: 0.492
  • Harmless: 0.466
  • Helpful: 0.458
Accuracy
ruHateSpeech

0.543

  • Женщины : 0.519
  • Мужчины : 0.686
  • ЛГБТ : 0.588
  • Национальность : 0.595
  • Мигранты : 0.286
  • Другое : 0.492
Accuracy
ruDetox
  • 0.003
  • 0.098
  • 0.55
  • 0.051

Общая средняя оценка (J)

Оценка сохранения смысла (SIM)

Оценка натуральности (FL)

Точность переноса стиля (STA)

ruEthics
Правильно Хорошо Этично
Добродетель 0 0 0
Закон 0 0 0
Мораль 0 0 0
Справедливость 0 0 0
Утилитаризм 0 0 0

Результаты таблицы:

[[0, 0 , 0, 0 , 0],
[0, 0 , 0, 0 , 0],
[0, 0 , 0, 0 , 0]]

5 MCC

Информация о сабмите:

Команда:

MERA

Название ML-модели:

FRED-T5 large 820M

Ссылка на ML-модель:

https://huggingface.co/ai-forever/FRED-T5-large

Дополнительные ссылки:

https://arxiv.org/abs/2309.10931

Описание архитектуры:

FRED-T5 (Full-scale Russian Enhanced Denoisers) is an encoder-decoder model based on T5 and UL2. Number of attantion heads 16. The dimensions of the hidden layers 1024 and the fully connected layers 2816. GELU activation function.

Описание обучения:

Bbpe tokenizer. 50257 + special tokens 107. Prefix tokens: '<LM>', '<SC1>',.. '<SC6>'. Drawing inspiration from Tay et al. (2022), the FRED-T5 1.7.B (or XL) model was pretrained on a mixture of denoisers (MoD), a pretraining objective that represents a set of diverse pretraining objectives. The R-Denoiser is a masked language modeling span corruption objective used in T5. The S-Denoiser follows the language modeling objective, where the input sequence is split into the context and target tokens so that the targets do not rely on future information. The X-Denoiser aims to recover a large part of the input based on the span corruption and language modeling objectives.

Данные претрейна:

It was trained on Russian language corpus (300GB).

Детали обучения:

FRED-T5 is pretrained using a linear scheduler with the initial learning rate of 1e−4 and the Adafactor optimizer (Shazeer and Stern, 2018) with β1 = 0.9, β2 = 0.99, and ϵ = 1e−8. The sequence length is set to 512/512 for inputs and targets. The FRED-T5-XL models is pretrained pretrained with a total batch size of 2048 for 35 days on 160 V100 GPUs, followed by 5 days on 80 A100 GPUs.

Лицензия:

MIT

Стратегия, генерация и параметры:

Code version v.1.1.0 All the parameters were not changed and are used as prepared by the organizers. Details: - 1 x NVIDIA A100 - dtype auto - Pytorch 2.1.2 + CUDA 12.1 - Transformers 4.36.2 - Context length 512