ruGPT-3.5 13B

MERA Created at 12.01.2024 11:18
0.208
The overall result
The submission does not contain all the required tasks

Ratings for leaderboard tasks

The table will scroll to the left

Task name Result Metric
LCS 0.132 Accuracy
RCB 0.331 / 0.194 Accuracy F1 macro
USE 0.025 Grade norm
RWSD 0.523 Accuracy
PARus 0.504 Accuracy
ruTiE 0.488 Accuracy
MultiQ 0.115 / 0.036 F1 Exact match
CheGeKa 0.037 / 0 F1 Exact match
ruModAr 0.001 Exact match
ruMultiAr 0.025 Exact match
MathLogicQA 0.258 Accuracy
ruWorldTree 0.246 / 0.22 Accuracy F1 macro
ruOpenBookQA 0.223 / 0.208 Accuracy F1 macro

Evaluation on open tasks:

Go to the ratings by subcategory

The table will scroll to the left

Task name Result Metric
BPS 0.492 Accuracy
ruMMLU 0.246 Accuracy
SimpleAr 0.029 Exact match
ruHumanEval 0.001 / 0.003 / 0.006 Pass@k
ruHHH 0.472
ruHateSpeech 0.543
ruDetox 0.286
ruEthics
Correct God Ethical
Virtue -0.036 0.045 0.034
Law -0.023 0.035 -0.021
Moral -0.025 0.034 0.029
Justice -0.017 0.045 0.049
Utilitarianism -0.016 0.04 0.067

Information about the submission:

Mera version
-
Torch Version
-
The version of the codebase
-
CUDA version
-
Precision of the model weights
-
Seed
-
Butch
-
Transformers version
-
The number of GPUs and their type
-
Architecture
-

Team:

MERA

Name of the ML model:

ruGPT-3.5 13B

Additional links:

https://habr.com/ru/companies/sberbank/articles/746736/

Architecture description:

ruGPT-3 is a Russian counterpart of GPT-3 (Brown et al., 2020). Model has 13B parameters. This is the biggest model so far and it was used for training first version of GigaChat.

Description of the training:

Model was trained using Deepspeed and Megatron libraries, on 300B tokens dataset for 3 epochs, around 45 days on 512 V100. After that model was finetuned 1 epoch with sequence length 2048 around 20 days on 200 GPU A100 on additional data (see above).

Pretrain data:

Model was pretrained on a 300Gb of various domains, than additionaly trained on the 100 Gb of code and legal documents. Training data was deduplicated, the text deduplication includes 64-bit hashing of each text in the corpus for keeping texts with a unique hash. We also filter the documents based on their text compression rate using zlib4. The most strongly and weakly compressing deduplicated texts are discarded.

Training Details:

After the final training perplexity for this model was around 8.8 for Russian.

License:

MIT

Strategy, generation and parameters:

Code version v.1.1.0 All the parameters were not changed and are used as prepared by the organizers. Details: - 1 x NVIDIA A100 - dtype auto - Pytorch 2.1.2 + CUDA 12.1 - Transformers 4.36.2 - Context length 2048

Expand information

Ratings by subcategory

Metric: Accuracy
Model, team Honest Helpful Harmless
ruGPT-3.5 13B
MERA
0.475 0.475 0.466
Model, team Anatomy Virology Astronomy Marketing Nutrition Sociology Management Philosophy Prehistory Human aging Econometrics Formal logic Global facts Jurisprudence Miscellaneous Moral disputes Business ethics Biology (college) Physics (college) Human Sexuality Moral scenarios World religions Abstract algebra Medicine (college) Machine learning Medical genetics Professional law PR Security studies Chemistry (школьная) Computer security International law Logical fallacies Politics Clinical knowledge Conceptual_physics Math (college) Biology (high school) Physics (high school) Chemistry (high school) Geography (high school) Professional medicine Electrical engineering Elementary mathematics Psychology (high school) Statistics (high school) History (high school) Math (high school) Professional accounting Professional psychology Computer science (college) World history (high school) Macroeconomics Microeconomics Computer science (high school) European history Government and politics
ruGPT-3.5 13B
MERA
0.2 0.25 0.3 0.343 0.238 0 0.267 0.294 0.2 0.4 0.455 0.3 0.1 0.269 0.136 0.1 0.1 0.259 0.4 0.2 0.2 0.269 0.1 0.333 0.2 0.455 0.313 0.214 0.1 0.182 0.2 0.333 0.1 0.3 0.182 0.2 0.4 0.381 0.3 0.2 0.203 0.4 0.3 0.2 0.25 0.2 0.4 0 0.1 0.1 0.318 0.313 0.176 0.333 0.208 0.212 0.148
Model, team SIM FL STA
ruGPT-3.5 13B
MERA
0.562 0.704 0.678
Coorect
Good
Ethical
Model, team Virtue Law Moral Justice Utilitarianism
ruGPT-3.5 13B
MERA
-0.036 -0.023 -0.025 -0.017 -0.016
Model, team Virtue Law Moral Justice Utilitarianism
ruGPT-3.5 13B
MERA
0.045 0.035 0.034 0.045 0.04
Model, team Virtue Law Moral Justice Utilitarianism
ruGPT-3.5 13B
MERA
0.034 -0.021 0.029 0.049 0.067
Model, team Women Men LGBT Nationalities Migrants Other
ruGPT-3.5 13B
MERA
0.537 0.657 0.647 0.514 0.286 0.508