Llama 2 70b

NLP Team Created at 03.02.2024 13:55
0.453
The overall result
The submission does not contain all the required tasks

Ratings for leaderboard tasks

The table will scroll to the left

Task name Result Metric
LCS 0.08 Accuracy
RCB 0.466 / 0.424 Accuracy F1 macro
USE 0.031 Grade norm
RWSD 0.5 Accuracy
PARus 0.744 Accuracy
ruTiE 0.453 Accuracy
MultiQ 0.185 / 0.041 F1 Exact match
CheGeKa 0.076 / 0 F1 Exact match
ruModAr 0.65 Exact match
ruMultiAr 0.216 Exact match
MathLogicQA 0.388 Accuracy
ruWorldTree 0.914 / 0.915 Accuracy F1 macro
ruOpenBookQA 0.818 / 0.817 Accuracy F1 macro

Evaluation on open tasks:

Go to the ratings by subcategory

The table will scroll to the left

Task name Result Metric
BPS 0.495 Accuracy
ruMMLU 0.741 Accuracy
SimpleAr 0.965 Exact match
ruHumanEval 0.02 / 0.101 / 0.201 Pass@k
ruHHH 0.573
ruHateSpeech 0.585
ruDetox 0.341
ruEthics
Correct God Ethical
Virtue -0.113 -0.182 -0.143
Law -0.124 -0.228 -0.171
Moral -0.151 -0.21 -0.162
Justice -0.065 -0.169 -0.145
Utilitarianism -0.076 -0.153 -0.107

Information about the submission:

Mera version
-
Torch Version
-
The version of the codebase
-
CUDA version
-
Precision of the model weights
-
Seed
-
Butch
-
Transformers version
-
The number of GPUs and their type
-
Architecture
-

Team:

NLP Team

Name of the ML model:

Llama 2 70b

Additional links:

https://arxiv.org/abs/2307.09288

Architecture description:

Llama 2 is an auto-regressive language model that uses an optimized transformer architecture. Number of parameters 70b.

Description of the training:

Authors used custom training libraries, Meta's Research Super Cluster, and production clusters for pretraining. Fine-tuning, annotation, and evaluation were also performed on third-party cloud compute. 1720320 GPU hours.

Pretrain data:

Llama 2 was pretrained on 2 trillion tokens of data from publicly available sources. Use standard transformer architecture, apply pre-normalization using RMSNorm, use the SwiGLU activation function, and rotary positional embeddings. The primary architectural differences from Llama 1 include increased context length and grouped-query attention (GQA).

Training Details:

Token counts refer to pretraining data only. All models are trained with a global batch-size of 4M tokens.

License:

A custom commercial license is available at: https://ai.meta.com/resources/models-and-libraries/llama-downloads/

Strategy, generation and parameters:

Code version v.1.1.0 All the parameters were not changed and are used as prepared by the organizers. Details: - 4 x NVIDIA A100 + accelerate - dtype float16 - Pytorch 2.0.1 + CUDA 11.7 - Transformers 4.36.2 - Context length 4096

Expand information

Ratings by subcategory

Metric: Accuracy
Model, team Honest Helpful Harmless
Llama 2 70b
NLP Team
0.557 0.508 0.655
Model, team Anatomy Virology Astronomy Marketing Nutrition Sociology Management Philosophy Prehistory Human aging Econometrics Formal logic Global facts Jurisprudence Miscellaneous Moral disputes Business ethics Biology (college) Physics (college) Human Sexuality Moral scenarios World religions Abstract algebra Medicine (college) Machine learning Medical genetics Professional law PR Security studies Chemistry (школьная) Computer security International law Logical fallacies Politics Clinical knowledge Conceptual_physics Math (college) Biology (high school) Physics (high school) Chemistry (high school) Geography (high school) Professional medicine Electrical engineering Elementary mathematics Psychology (high school) Statistics (high school) History (high school) Math (high school) Professional accounting Professional psychology Computer science (college) World history (high school) Macroeconomics Microeconomics Computer science (high school) European history Government and politics
Llama 2 70b
NLP Team
0.6 0.75 0.7 0.714 0.762 0.9 0.867 0.765 0.9 1 0.727 0.6 0.4 0.462 0.545 0.7 0.8 0.778 0.8 1 0.4 0.865 0.8 0.745 0.7 0.818 0.625 0.714 1 0.636 0.5 0.778 0.8 0.8 0.818 1 0.7 0.81 0.4 0.6 0.823 0.9 0.8 0.6 0.938 0.6 1 0.5 0.6 0.9 0.636 1 0.853 0.867 0.5 0.485 0.778
Model, team SIM FL STA
Llama 2 70b
NLP Team
0.716 0.633 0.697
Coorect
Good
Ethical
Model, team Virtue Law Moral Justice Utilitarianism
Llama 2 70b
NLP Team
-0.113 -0.124 -0.151 -0.065 -0.076
Model, team Virtue Law Moral Justice Utilitarianism
Llama 2 70b
NLP Team
-0.182 -0.228 -0.21 -0.169 -0.153
Model, team Virtue Law Moral Justice Utilitarianism
Llama 2 70b
NLP Team
-0.143 -0.171 -0.162 -0.145 -0.107
Model, team Women Men LGBT Nationalities Migrants Other
Llama 2 70b
NLP Team
0.583 0.571 0.706 0.595 0.429 0.574