Llama 2 70b

Created at 03.02.2024 13:55

General assessment: 0.453

The table will scroll to the left

Task name Result Metric
BPS 0.495 Accuracy
LCS 0.08 Accuracy
RCB 0.466 / 0.424 Avg. F1 / Accuracy
USE 0.031 Grade Norm
RWSD 0.5 Accuracy
PARus 0.744 Accuracy
ruTiE 0.453 Accuracy
MultiQ 0.185 / 0.041 F1-score/EM
ruMMLU 0.741 Accuracy
CheGeKa 0.076 / 0 F1 / EM
ruModAr 0.65 Accuracy
SimpleAr 0.965 Accuracy
ruMultiAr 0.216 Accuracy
MathLogicQA 0.388 Accuracy
ruHumanEval 0.02 / 0.101 / 0.201 pass@k
ruWorldTree 0.914 / 0.915 Avg. F1 / Accuracy
ruOpenBookQA 0.818 / 0.817 Avg. F1 / Accuracy

Evaluation on diagnostic datasets:

It is not taken into account in the overall rating

The table will scroll to the left

Task name Result Metric
ruHHH

0.573

  • Honest: 0.557
  • Harmless: 0.655
  • Helpful: 0.508
Accuracy
ruHateSpeech

0.585

  • Women : 0.583
  • Man : 0.571
  • LGBT : 0.706
  • Nationality : 0.595
  • Migrants : 0.429
  • Other : 0.574
Accuracy
ruDetox
  • 0.341
  • 0.716
  • 0.633
  • 0.697

Overall average score (J)

Assessment of the preservation of meaning (SIM)

Assessment of naturalness (FL)

Style Transfer Accuracy (STA)

ruEthics
Correct God Ethical
Virtue -0.113 -0.182 -0.143
Law -0.124 -0.228 -0.171
Moral -0.151 -0.21 -0.162
Justice -0.065 -0.169 -0.145
Utilitarianism -0.076 -0.153 -0.107

Table results:

[[-0.113, -0.124 , -0.151, -0.065 , -0.076],
[-0.182, -0.228 , -0.21, -0.169 , -0.153],
[-0.143, -0.171 , -0.162, -0.145 , -0.107]]

5 MCC

Information about the submission:

Team:

NLP Team

Name of the ML model:

Llama 2 70b

Additional links:

https://arxiv.org/abs/2307.09288

Architecture description:

Llama 2 is an auto-regressive language model that uses an optimized transformer architecture. Number of parameters 70b.

Description of the training:

Authors used custom training libraries, Meta's Research Super Cluster, and production clusters for pretraining. Fine-tuning, annotation, and evaluation were also performed on third-party cloud compute. 1720320 GPU hours.

Pretrain data:

Llama 2 was pretrained on 2 trillion tokens of data from publicly available sources. Use standard transformer architecture, apply pre-normalization using RMSNorm, use the SwiGLU activation function, and rotary positional embeddings. The primary architectural differences from Llama 1 include increased context length and grouped-query attention (GQA).

Training Details:

Token counts refer to pretraining data only. All models are trained with a global batch-size of 4M tokens.

License:

A custom commercial license is available at: https://ai.meta.com/resources/models-and-libraries/llama-downloads/

Strategy, generation and parameters:

Code version v.1.1.0 All the parameters were not changed and are used as prepared by the organizers. Details: - 4 x NVIDIA A100 + accelerate - dtype float16 - Pytorch 2.0.1 + CUDA 11.7 - Transformers 4.36.2 - Context length 4096