Llama 2 7B

Created at 12.01.2024 11:15

General assessment: 0.327

The table will scroll to the left

Task name Result Metric
BPS 0.426 Accuracy
LCS 0.106 Accuracy
RCB 0.349 / 0.272 Avg. F1 / Accuracy
USE 0.014 Grade Norm
RWSD 0.504 Accuracy
PARus 0.532 Accuracy
ruTiE 0.5 Accuracy
MultiQ 0.081 / 0.011 F1-score/EM
ruMMLU 0.452 Accuracy
CheGeKa 0.021 / 0 F1 / EM
ruModAr 0.367 EM
SimpleAr 0.839 EM
ruMultiAr 0.124 EM
MathLogicQA 0.277 Accuracy
ruHumanEval 0.007 / 0.034 / 0.067 pass@k
ruWorldTree 0.545 / 0.543 Avg. F1 / Accuracy
ruOpenBookQA 0.475 / 0.471 Avg. F1 / Accuracy

Evaluation on diagnostic datasets:

It is not taken into account in the overall rating

The table will scroll to the left

Task name Result Metric
ruHHH

0.5

  • Honest: 0.475
  • Harmless: 0.5
  • Helpful: 0.525
Accuracy
ruHateSpeech

0.536

  • Women : 0.593
  • Man : 0.514
  • LGBT : 0.588
  • Nationality : 0.486
  • Migrants : 0.429
  • Other : 0.475
Accuracy
ruDetox
  • 0.261
  • 0.588
  • 0.582
  • 0.611

Overall average score (J)

Assessment of the preservation of meaning (SIM)

Assessment of naturalness (FL)

Style Transfer Accuracy (STA)

ruEthics
Correct God Ethical
Virtue -0.115 -0.043 -0.114
Law -0.124 -0.019 -0.112
Moral -0.11 -0.037 -0.124
Justice -0.129 -0.058 -0.122
Utilitarianism -0.097 -0.05 -0.092

Table results:

[[-0.115, -0.124 , -0.11, -0.129 , -0.097],
[-0.043, -0.019 , -0.037, -0.058 , -0.05],
[-0.114, -0.112 , -0.124, -0.122 , -0.092]]

5 MCC

Information about the submission:

Team:

MERA

Name of the ML model:

Llama 2 7B

Additional links:

https://arxiv.org/abs/2307.09288

Architecture description:

Llama 2 is an auto-regressive language model that uses an optimized transformer architecture.

Description of the training:

Authors used custom training libraries, Meta's Research Super Cluster, and production clusters for pretraining. Fine-tuning, annotation, and evaluation were also performed on third-party cloud compute.

Pretrain data:

Llama 2 was pretrained on 2 trillion tokens of data from publicly available sources.

Training Details:

Token counts refer to pretraining data only. All models are trained with a global batch-size of 4M tokens.

License:

A custom commercial license is available at: https://ai.meta.com/resources/models-and-libraries/llama-downloads/

Strategy, generation and parameters:

Code version v.1.1.0 All the parameters were not changed and are used as prepared by the organizers. Details: - 1 x NVIDIA A100 - dtype auto - Pytorch 2.1.2 + CUDA 12.1 - Transformers 4.36.2 - Context length 4096