GigaChat-Pro

Created at 04.07.2024 10:42

General assessment: 0.537

The table will scroll to the left

Task name Result Metric
BPS 0.318 Accuracy
LCS 0.09 Accuracy
RCB 0.53 / 0.449 Avg. F1 / Accuracy
USE 0.338 Grade Norm
RWSD 0.585 Accuracy
PARus 0.884 Accuracy
ruTiE 0.791 Accuracy
MultiQ 0.369 / 0.247 F1-score/EM
ruMMLU 0.816 Accuracy
CheGeKa 0.104 / 0 F1 / EM
ruModAr 0.866 EM
SimpleAr 0.971 EM
ruMultiAr 0.273 EM
MathLogicQA 0.467 Accuracy
ruHumanEval 0.013 / 0.064 / 0.128 pass@k
ruWorldTree 0.939 / 0.939 Avg. F1 / Accuracy
ruOpenBookQA 0.873 / 0.872 Avg. F1 / Accuracy

Evaluation on diagnostic datasets:

It is not taken into account in the overall rating

The table will scroll to the left

Task name Result Metric
ruHHH

0.764

  • Honest: 0.689
  • Harmless: 0.828
  • Helpful: 0.78
Accuracy
ruHateSpeech

0.751

  • Women : 0.759
  • Man : 0.8
  • LGBT : 0.647
  • Nationality : 0.649
  • Migrants : 0.429
  • Other : 0.836
Accuracy
ruDetox
  • 0.238
  • 0.59
  • 0.76
  • 0.459

Overall average score (J)

Assessment of the preservation of meaning (SIM)

Assessment of naturalness (FL)

Style Transfer Accuracy (STA)

ruEthics
Correct God Ethical
Virtue -0.493 -0.449 -0.394
Law -0.493 -0.423 -0.392
Moral -0.492 -0.464 -0.399
Justice -0.447 -0.4 -0.345
Utilitarianism -0.422 -0.374 -0.322

Table results:

[[-0.493, -0.493 , -0.492, -0.447 , -0.422],
[-0.449, -0.423 , -0.464, -0.4 , -0.374],
[-0.394, -0.392 , -0.399, -0.345 , -0.322]]

5 MCC

Information about the submission:

Team:

GIGACHAT

Name of the ML model:

GigaChat-Pro

Additional links:

https://developers.sber.ru/docs/ru/gigachat/api/overview

Architecture description:

GigaChat Pro (version 1.0.26.8) is a Large Language Model (LLM) with 30B parameters that was fine-tuned on instruction corpus and has context length of 8192 tokens. The version is available for users via API since 13.07.

Description of the training:

-

Pretrain data:

-

Training Details:

-

License:

Proprietary model by Sber

Strategy, generation and parameters:

Code version v.1.1.0. All the parameters were not changed and are used as prepared by the organizers. Details: - 2 x NVIDIA A100 + accelerate - dtype float16 - Pytorch 2.3.1 + CUDA 12.1 - Transformers 4.42.3 - Context length 8192