SOLAR 10.7B Instruct

Created at 03.02.2024 13:43

General assessment: 0.469

The table will scroll to the left

Task name Result Metric
BPS 0.359 Accuracy
LCS 0.078 Accuracy
RCB 0.523 / 0.503 Avg. F1 / Accuracy
USE 0.04 Grade Norm
RWSD 0.654 Accuracy
PARus 0.828 Accuracy
ruTiE 0.7 Accuracy
MultiQ 0.205 / 0.097 F1-score/EM
ruMMLU 0.698 Accuracy
CheGeKa 0.206 / 0.139 F1 / EM
ruModAr 0.459 EM
SimpleAr 0.946 EM
ruMultiAr 0.2 EM
MathLogicQA 0.396 Accuracy
ruHumanEval 0.013 / 0.067 / 0.134 pass@k
ruWorldTree 0.884 / 0.884 Avg. F1 / Accuracy
ruOpenBookQA 0.825 / 0.824 Avg. F1 / Accuracy

Evaluation on diagnostic datasets:

It is not taken into account in the overall rating

The table will scroll to the left

Task name Result Metric
ruHHH

0.702

  • Honest: 0.623
  • Harmless: 0.759
  • Helpful: 0.729
Accuracy
ruHateSpeech

0.747

  • Women : 0.769
  • Man : 0.657
  • LGBT : 0.647
  • Nationality : 0.784
  • Migrants : 0.429
  • Other : 0.803
Accuracy
ruDetox
  • 0.041
  • 0.339
  • 0.572
  • 0.183

Overall average score (J)

Assessment of the preservation of meaning (SIM)

Assessment of naturalness (FL)

Style Transfer Accuracy (STA)

ruEthics
Correct God Ethical
Virtue -0.349 -0.391 -0.479
Law -0.374 -0.327 -0.451
Moral -0.374 -0.385 -0.484
Justice -0.343 -0.339 -0.465
Utilitarianism -0.297 -0.32 -0.384

Table results:

[[-0.349, -0.374 , -0.374, -0.343 , -0.297],
[-0.391, -0.327 , -0.385, -0.339 , -0.32],
[-0.479, -0.451 , -0.484, -0.465 , -0.384]]

5 MCC

Information about the submission:

Team:

Russian_NLP

Name of the ML model:

SOLAR 10.7B Instruct

Additional links:

https://arxiv.org/abs/2312.15166 https://huggingface.co/upstage/SOLAR-10.7B-v1.0

Architecture description:

SOLAR 10.7B Instruct is the instructed version of SOLAR-10.7B, an advanced large language model (LLM) with 10.7 billion parameters, demonstrating superior performance in various natural language processing (NLP) tasks.

Description of the training:

State-of-the-art instruction fine-tuning methods including supervised fine-tuning (SFT) and direct preference optimization (DPO).

Pretrain data:

SOLAR 10.7B, a large language model (LLM) with 10.7 billion parameters, demonstrating superior performance in various natural language processing (NLP) tasks.

Training Details:

The following datasets were used: - c-s-ale/alpaca-gpt4-data (SFT) - Open-Orca/OpenOrca (SFT) - in-house generated data utilizing Metamath(SFT, DPO) - Intel/orca_dpo_pairs (DPO) - allenai/ultrafeedback_binarized_cleaned (DPO)

License:

cc-by-nc-4.0

Strategy, generation and parameters:

Code version v.1.1.0 All the parameters were not changed and are used as prepared by the organizers. Details: - 1 x NVIDIA A100 - dtype auto - Pytorch 2.1.2 + CUDA 12.1 - Transformers 4.36.2 - Context length 4096