ruGPT-3.5 13B

Created at 12.01.2024 11:18

General assessment: 0.208

The table will scroll to the left

Task name Result Metric
BPS 0.492 Accuracy
LCS 0.132 Accuracy
RCB 0.331 / 0.194 Avg. F1 / Accuracy
USE 0.025 Grade Norm
RWSD 0.523 Accuracy
PARus 0.504 Accuracy
ruTiE 0.488 Accuracy
MultiQ 0.115 / 0.036 F1-score/EM
ruMMLU 0.246 Accuracy
CheGeKa 0.037 / 0 F1 / EM
ruModAr 0.001 Accuracy
SimpleAr 0.029 Accuracy
ruMultiAr 0.025 Accuracy
MathLogicQA 0.258 Accuracy
ruHumanEval 0.001 / 0.003 / 0.006 pass@k
ruWorldTree 0.246 / 0.22 Avg. F1 / Accuracy
ruOpenBookQA 0.223 / 0.208 Avg. F1 / Accuracy

Evaluation on diagnostic datasets:

It is not taken into account in the overall rating

The table will scroll to the left

Task name Result Metric
ruHHH

0.472

  • Honest: 0.475
  • Harmless: 0.466
  • Helpful: 0.475
Accuracy
ruHateSpeech

0.543

  • Women : 0.537
  • Man : 0.657
  • LGBT : 0.647
  • Nationality : 0.514
  • Migrants : 0.286
  • Other : 0.508
Accuracy
ruDetox
  • 0.286
  • 0.562
  • 0.704
  • 0.678

Overall average score (J)

Assessment of the preservation of meaning (SIM)

Assessment of naturalness (FL)

Style Transfer Accuracy (STA)

ruEthics
Correct God Ethical
Virtue -0.036 0.045 0.034
Law -0.023 0.035 -0.021
Moral -0.025 0.034 0.029
Justice -0.017 0.045 0.049
Utilitarianism -0.016 0.04 0.067

Table results:

[[-0.036, -0.023 , -0.025, -0.017 , -0.016],
[0.045, 0.035 , 0.034, 0.045 , 0.04],
[0.034, -0.021 , 0.029, 0.049 , 0.067]]

5 MCC

Information about the submission:

Team:

MERA

Name of the ML model:

ruGPT-3.5 13B

Additional links:

https://habr.com/ru/companies/sberbank/articles/746736/

Architecture description:

ruGPT-3 is a Russian counterpart of GPT-3 (Brown et al., 2020). Model has 13B parameters. This is the biggest model so far and it was used for training first version of GigaChat.

Description of the training:

Model was trained using Deepspeed and Megatron libraries, on 300B tokens dataset for 3 epochs, around 45 days on 512 V100. After that model was finetuned 1 epoch with sequence length 2048 around 20 days on 200 GPU A100 on additional data (see above).

Pretrain data:

Model was pretrained on a 300Gb of various domains, than additionaly trained on the 100 Gb of code and legal documents. Training data was deduplicated, the text deduplication includes 64-bit hashing of each text in the corpus for keeping texts with a unique hash. We also filter the documents based on their text compression rate using zlib4. The most strongly and weakly compressing deduplicated texts are discarded.

Training Details:

After the final training perplexity for this model was around 8.8 for Russian.

License:

MIT

Strategy, generation and parameters:

Code version v.1.1.0 All the parameters were not changed and are used as prepared by the organizers. Details: - 1 x NVIDIA A100 - dtype auto - Pytorch 2.1.2 + CUDA 12.1 - Transformers 4.36.2 - Context length 2048