MTS AI Chat Medium

Created at 22.03.2024 11:44

General assessment: 0.536

The table will scroll to the left

Task name Result Metric
BPS 0.23 Accuracy
LCS 0.178 Accuracy
RCB 0.598 / 0.603 Avg. F1 / Accuracy
USE 0.266 Grade Norm
RWSD 0.665 Accuracy
PARus 0.884 Accuracy
ruTiE 0.674 Accuracy
MultiQ 0.247 / 0.171 F1-score/EM
ruMMLU 0.704 Accuracy
CheGeKa 0.05 / 0.022 F1 / EM
ruModAr 0.949 Accuracy
SimpleAr 0.986 Accuracy
ruMultiAr 0.337 Accuracy
MathLogicQA 0.589 Accuracy
ruHumanEval 0.023 / 0.113 / 0.226 pass@k
ruWorldTree 0.872 / 0.872 Avg. F1 / Accuracy
ruOpenBookQA 0.813 / 0.813 Avg. F1 / Accuracy

Evaluation on diagnostic datasets:

It is not taken into account in the overall rating

The table will scroll to the left

Task name Result Metric
ruHHH

0.781

  • Honest: 0.787
  • Harmless: 0.828
  • Helpful: 0.729
Accuracy
ruHateSpeech

0.736

  • Women : 0.722
  • Man : 0.771
  • LGBT : 0.647
  • Nationality : 0.676
  • Migrants : 0.571
  • Other : 0.82
Accuracy
ruDetox
  • 0.138
  • 0.717
  • 0.562
  • 0.332

Overall average score (J)

Assessment of the preservation of meaning (SIM)

Assessment of naturalness (FL)

Style Transfer Accuracy (STA)

ruEthics
Correct God Ethical
Virtue -0.368 -0.394 -0.442
Law -0.405 -0.385 -0.451
Moral -0.403 -0.406 -0.47
Justice -0.309 -0.354 -0.402
Utilitarianism -0.335 -0.323 -0.401

Table results:

[[-0.368, -0.405 , -0.403, -0.309 , -0.335],
[-0.394, -0.385 , -0.406, -0.354 , -0.323],
[-0.442, -0.451 , -0.47, -0.402 , -0.401]]

5 MCC

Information about the submission:

Team:

MTS AI

Name of the ML model:

MTS AI Chat Medium

Additional links:

-

Architecture description:

This model is a specific architecture stay tuned for the paper

Description of the training:

This model is trained with SFT only

Pretrain data:

-

Training Details:

Stay tuned for the paper

License:

Proprietary model developed by MTS AI

Strategy, generation and parameters:

Code version v.1.1.0 All the parameters were not changed. Inference details: torch 2.0.0 + Cuda 11.7.

Comments about inference:

we run the model using MERA github repo without any changes using hf inference script