Llama 2 13B

Created at 12.01.2024 11:16

General assessment: 0.368

The table will scroll to the left

Task name Result Metric
BPS 0.507 Accuracy
LCS 0.09 Accuracy
RCB 0.329 / 0.258 Avg. F1 / Accuracy
USE 0.01 Grade Norm
RWSD 0.5 Accuracy
PARus 0.478 Accuracy
ruTiE 0.493 Accuracy
MultiQ 0.098 / 0.014 F1-score/EM
ruMMLU 0.563 Accuracy
CheGeKa 0.043 / 0 F1 / EM
ruModAr 0.486 Accuracy
SimpleAr 0.911 Accuracy
ruMultiAr 0.156 Accuracy
MathLogicQA 0.314 Accuracy
ruHumanEval 0.008 / 0.04 / 0.079 pass@k
ruWorldTree 0.703 / 0.703 Avg. F1 / Accuracy
ruOpenBookQA 0.638 / 0.637 Avg. F1 / Accuracy

Evaluation on diagnostic datasets:

It is not taken into account in the overall rating

The table will scroll to the left

Task name Result Metric
ruHHH

0.466

  • Honest: 0.475
  • Harmless: 0.466
  • Helpful: 0.458
Accuracy
ruHateSpeech

0.581

  • Women : 0.556
  • Man : 0.714
  • LGBT : 0.588
  • Nationality : 0.649
  • Migrants : 0.286
  • Other : 0.541
Accuracy
ruDetox
  • 0.349
  • 0.72
  • 0.612
  • 0.742

Overall average score (J)

Assessment of the preservation of meaning (SIM)

Assessment of naturalness (FL)

Style Transfer Accuracy (STA)

ruEthics
Correct God Ethical
Virtue -0.102 0.037 -0.128
Law -0.076 0.03 -0.14
Moral -0.132 0.013 -0.157
Justice -0.122 0.027 -0.121
Utilitarianism -0.142 0.027 -0.085

Table results:

[[-0.102, -0.076 , -0.132, -0.122 , -0.142],
[0.037, 0.03 , 0.013, 0.027 , 0.027],
[-0.128, -0.14 , -0.157, -0.121 , -0.085]]

5 MCC

Information about the submission:

Team:

MERA

Name of the ML model:

Llama 2 13B

Additional links:

https://arxiv.org/abs/2307.09288

Architecture description:

Llama 2 is an auto-regressive language model that uses an optimized transformer architecture. Number of parameters 13b.

Description of the training:

Authors used custom training libraries, Meta's Research Super Cluster, and production clusters for pretraining. Fine-tuning, annotation, and evaluation were also performed on third-party cloud compute. 368640 GPU hours.

Pretrain data:

Llama 2 was pretrained on 2 trillion tokens of data from publicly available sources.

Training Details:

Token counts refer to pretraining data only. All models are trained with a global batch-size of 4M tokens.

License:

A custom commercial license is available at: https://ai.meta.com/resources/models-and-libraries/llama-downloads/

Strategy, generation and parameters:

Code version v.1.1.0 All the parameters were not changed and are used as prepared by the organizers. Details: - 1 x NVIDIA A100 - dtype auto - Pytorch 2.1.2 + CUDA 12.1 - Transformers 4.36.2 - Context length 4096