ruGPT-3.5

RussianNLP Created at 20.09.2024 11:54
0.213
The overall result
517
Place in the rating
Weak tasks:
473
RWSD
487
PARus
512
RCB
461
MultiQ
555
ruWorldTree
548
ruOpenBookQA
186
CheGeKa
529
ruMMLU
424
ruHateSpeech
440
ruDetox
494
ruHHH
466
ruTiE
225
ruHumanEval
335
USE
523
MathLogicQA
485
ruMultiAr
518
SimpleAr
127
LCS
526
BPS
511
ruModAr
554
MaMuRAMu
225
ruCodeEval
+18
Hide

Ratings for leaderboard tasks

The table will scroll to the left

Task name Result Metric
LCS 0.132 Accuracy
RCB 0.342 / 0.257 Accuracy F1 macro
USE 0.082 Grade norm
RWSD 0.462 Accuracy
PARus 0.498 Accuracy
ruTiE 0.504 Accuracy
MultiQ 0.179 / 0.077 F1 Exact match
CheGeKa 0.145 / 0.118 F1 Exact match
ruModAr 0.001 Exact match
MaMuRAMu 0.226 Accuracy
ruMultiAr 0.029 Exact match
ruCodeEval 0.002 / 0.009 / 0.012 Pass@k
MathLogicQA 0.251 Accuracy
ruWorldTree 0.238 / 0.197 Accuracy F1 macro
ruOpenBookQA 0.255 / 0.183 Accuracy F1 macro

Evaluation on open tasks:

Go to the ratings by subcategory

The table will scroll to the left

Task name Result Metric
BPS 0.47 Accuracy
ruMMLU 0.262 Accuracy
SimpleAr 0.027 Exact match
ruHumanEval 0.01 / 0.029 / 0.043 Pass@k
ruHHH 0.472
ruHateSpeech 0.532
ruDetox 0.073
ruEthics
Correct God Ethical
Virtue -0.01 -0.026 0.029
Law -0.048 -0.082 -0.033
Moral -0.033 -0.044 0.002
Justice -0.034 -0.073 -0.002
Utilitarianism 0.001 -0.03 0.016

Information about the submission:

Mera version
v.1.2.0
Torch Version
2.3.1
The version of the codebase
aec92f8
CUDA version
12.1
Precision of the model weights
float16
Seed
1234
Butch
1
Transformers version
4.42.3
The number of GPUs and their type
5 x NVIDIA H100 80GB HBM3
Architecture
hf

Team:

RussianNLP

Name of the ML model:

ruGPT-3.5

Model size

12.9B

Model type:

Opened

Pretrain

Architecture description:

ruGPT-3 is a Russian counterpart of GPT-3 (Brown et al., 2020). Model has 13B parameters. This is the biggest model so far and it was used for training first version of GigaChat.

Description of the training:

Model was trained using Deepspeed and Megatron libraries, on 300B tokens dataset for 3 epochs, around 45 days on 512 V100. After that model was finetuned 1 epoch with sequence length 2048 around 20 days on 200 GPU A100 on additional data (see above).

Pretrain data:

Model was pretrained on a 300Gb of various domains, than additionaly trained on the 100 Gb of code and legal documents. Training data was deduplicated, the text deduplication includes 64-bit hashing of each text in the corpus for keeping texts with a unique hash. We also filter the documents based on their text compression rate using zlib4. The most strongly and weakly compressing deduplicated texts are discarded.

License:

MIT

Inference parameters

Generation Parameters:
simplear - do_sample=false;until=["\n"]; \nchegeka - do_sample=false;until=["\n"]; \nrudetox - do_sample=false;until=["\n"]; \nrumultiar - do_sample=false;until=["\n"]; \nuse - do_sample=false;until=["\n","."]; \nmultiq - do_sample=false;until=["\n"]; \nrumodar - do_sample=false;until=["\n"]; \nruhumaneval - do_sample=true;until=["\nclass","\ndef","\n#","\nif","\nprint"];temperature=0.6; \nrucodeeval - do_sample=true;until=["\nclass","\ndef","\n#","\nif","\nprint"];temperature=0.6;

The size of the context:
simplear, bps, lcs, chegeka, mathlogicqa, parus, rcb, rudetox, ruhatespeech, rummlu, ruworldtree, ruopenbookqa, rumultiar, use, rwsd, mamuramu, multiq, rumodar, ruethics, ruhhh, rutie, ruhumaneval, rucodeeval - 2048

Expand information

Ratings by subcategory

Metric: Grade Norm
Model, team 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 8_0 8_1 8_2 8_3 8_4
ruGPT-3.5
RussianNLP
0.033 0.033 0.067 0.133 0.067 0.133 0 - 0.033 0.033 0 0 0.067 0 0.133 0.333 0 0 0.067 0.033 0.033 0 0.1 0 0 0.133 0.1 0.1 0.067 0.133 0.233
Model, team Honest Helpful Harmless
ruGPT-3.5
RussianNLP
0.459 0.475 0.483
Model, team Anatomy Virology Astronomy Marketing Nutrition Sociology Management Philosophy Prehistory Human aging Econometrics Formal logic Global facts Jurisprudence Miscellaneous Moral disputes Business ethics Biology (college) Physics (college) Human Sexuality Moral scenarios World religions Abstract algebra Medicine (college) Machine learning Medical genetics Professional law PR Security studies Chemistry (школьная) Computer security International law Logical fallacies Politics Clinical knowledge Conceptual_physics Math (college) Biology (high school) Physics (high school) Chemistry (high school) Geography (high school) Professional medicine Electrical engineering Elementary mathematics Psychology (high school) Statistics (high school) History (high school) Math (high school) Professional accounting Professional psychology Computer science (college) World history (high school) Macroeconomics Microeconomics Computer science (high school) European history Government and politics
ruGPT-3.5
RussianNLP
0.244 0.217 0.25 0.214 0.268 0.264 0.32 0.228 0.238 0.139 0.228 0.357 0.23 0.25 0.225 0.22 0.25 0.236 0.211 0.313 0.268 0.187 0.23 0.324 0.188 0.32 0.225 0.213 0.298 0.25 0.23 0.281 0.215 0.253 0.238 0.286 0.28 0.281 0.311 0.276 0.288 0.445 0.234 0.247 0.305 0.463 0.25 0.233 0.259 0.261 0.28 0.194 0.351 0.319 0.24 0.261 0.373
Model, team SIM FL STA
ruGPT-3.5
RussianNLP
0.736 0.636 0.224
Model, team Anatomy Virology Astronomy Marketing Nutrition Sociology Managment Philosophy Pre-History Gerontology Econometrics Formal logic Global facts Jurisprudence Miscellaneous Moral disputes Business ethics Bilology (college) Physics (college) Human sexuality Moral scenarios World religions Abstract algebra Medicine (college) Machine Learning Genetics Professional law PR Security Chemistry (college) Computer security International law Logical fallacies Politics Clinical knowledge Conceptual physics Math (college) Biology (high school) Physics (high school) Chemistry (high school) Geography (high school) Professional medicine Electrical Engineering Elementary mathematics Psychology (high school) Statistics (high school) History (high school) Math (high school) Professional Accounting Professional psychology Computer science (college) World history (high school) Macroeconomics Microeconomics Computer science (high school) Europe History Government and politics
ruGPT-3.5
RussianNLP
0.2 0.267 0.267 0.296 0.237 0.224 0.31 0.211 0.25 0.231 0.179 0.192 0.158 0.202 0.234 0.198 0.262 0.4 0.228 0.175 0.211 0.254 0.222 0.249 0.2 0.242 0.205 0.158 0.175 0.289 0.178 0.256 0.143 0.298 0.303 0.179 0.267 0.311 0.228 0.292 0.216 0.254 0.222 0.156 0.207 0.089 0.241 0.205 0.277 0.228 0.244 0.217 0.253 0.195 0.279 0.199 0.178
Coorect
Good
Ethical
Model, team Virtue Law Moral Justice Utilitarianism
ruGPT-3.5
RussianNLP
-0.01 -0.048 -0.033 -0.034 0.001
Model, team Virtue Law Moral Justice Utilitarianism
ruGPT-3.5
RussianNLP
-0.026 -0.082 -0.044 -0.073 -0.03
Model, team Virtue Law Moral Justice Utilitarianism
ruGPT-3.5
RussianNLP
0.029 -0.033 0.002 -0.002 0.016
Model, team Women Men LGBT Nationalities Migrants Other
ruGPT-3.5
RussianNLP
0.509 0.657 0.588 0.595 0.286 0.475