ruGPT-3-small

Создан 12.01.2024 14:47

Общая оценка: 0.191

Таблица скроллится влево

Задача Результат Метрика
BPS 0.367 Accuracy
LCS 0.08 Accuracy
RCB 0.333 / 0.167 Avg. F1 / Accuracy
USE 0.001 Grade Norm
RWSD 0.492 Accuracy
PARus 0.498 Accuracy
ruTiE 0.5 Accuracy
MultiQ 0.063 / 0.009 F1-score/EM
ruMMLU 0.263 Accuracy
CheGeKa 0.007 / 0 F1 / EM
ruModAr 0.001 Accuracy
SimpleAr 0.0 Accuracy
ruMultiAr 0.009 Accuracy
MathLogicQA 0.244 Accuracy
ruHumanEval 0 / 0 / 0 pass@k
ruWorldTree 0.257 / 0.254 Avg. F1 / Accuracy
ruOpenBookQA 0.258 / 0.253 Avg. F1 / Accuracy

Оценка на диагностических датасетах:

Не учитывается в общем рейтинге

Таблица скроллится влево

Задача Результат Метрика
ruHHH

0.478

  • Honest: 0.475
  • Harmless: 0.466
  • Helpful: 0.492
Accuracy
ruHateSpeech

0.54

  • Женщины : 0.519
  • Мужчины : 0.657
  • ЛГБТ : 0.588
  • Национальность : 0.595
  • Мигранты : 0.286
  • Другое : 0.492
Accuracy
ruDetox
  • 0.316
  • 0.676
  • 0.612
  • 0.713

Общая средняя оценка (J)

Оценка сохранения смысла (SIM)

Оценка натуральности (FL)

Точность переноса стиля (STA)

ruEthics
Правильно Хорошо Этично
Добродетель 0 0 0
Закон 0 0 0
Мораль 0 0 0
Справедливость 0 0 0
Утилитаризм 0 0 0

Результаты таблицы:

[[0, 0 , 0, 0 , 0],
[0, 0 , 0, 0 , 0],
[0, 0 , 0, 0 , 0]]

5 MCC

Информация о сабмите:

Команда:

MERA

Название ML-модели:

ruGPT-3-small

Дополнительные ссылки:

https://arxiv.org/abs/2309.10931

Описание архитектуры:

ruGPT-3 is a Russian counterpart of GPT-3 (Brown et al., 2020). We use the model architecture description by Brown et al. and the GPT-2 code base (Radford et al., 2019) from the Transformers library. ruGPT-3 is pretrained on the language modeling objective. The BBPE tokenizer with the vocabulary size of 5 · 104 tokens was used.

Описание обучения:

The model was trained with sequence length 1024 using transformers lib by the SberDevices team on 80B tokens for 3 epochs. After that, the model was finetuned 1 epoch with sequence length 2048. Total training time was around 14 days on 128 GPUs for 1024 context and a few days on 16 GPUs for 2048 context. The final perplexity on the test set is 13.6.

Данные претрейна:

450GB of texts. The corpus includes texts from various publicly available resources, which represent diverse domains: Wikipedia, News, Books, Colossal Clean Crawled Corpus, OpenSubtitles.

Детали обучения:

The ruGPT-3 models are pretrained with a maximum sequence length of 1024 tokens for three epochs and of 2048 tokens for one epoch. We use the initial learning rate of 1e−4 and the Adam optimizer with β1 = 0.9, β2 = 0.99, and ϵ = 1e−8. The total number of tokens seen during pretraining is 80B. The pretraining of ruGPT3-large has taken 16 days on the cluster of 32 V100-SXM3 GPUs, respectively.

Лицензия:

MIT

Стратегия, генерация и параметры:

Code version v.1.1.0 All the parameters were not changed and are used as prepared by the organizers. Details: - 1 x NVIDIA A100 - dtype auto - Pytorch 2.1.2 + CUDA 12.1 - Transformers 4.36.2 - Context length 2048