The table will scroll to the left
Task name | Result | Metric |
---|---|---|
LCS | 0.098 | Accuracy |
RCB | 0.372 / 0.344 | Avg. F1 / Accuracy |
USE | 0.022 | Grade Norm |
RWSD | 0.512 | Accuracy |
PARus | 0.518 | Accuracy |
ruTiE | 0.502 | Accuracy |
MultiQ | 0.124 / 0.067 | F1-score/EM |
CheGeKa | 0.038 / 0 | F1 / EM |
ruModAr | 0.516 | EM |
ruMultiAr | 0.195 | EM |
MathLogicQA | 0.344 | Accuracy |
ruWorldTree | 0.81 / 0.811 | Avg. F1 / Accuracy |
ruOpenBookQA | 0.735 / 0.732 | Avg. F1 / Accuracy |
The table will scroll to the left
Task name | Result | Metric | ||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
BPS | 0.392 | Accuracy | ||||||||||||||||||||||||
ruMMLU | 0.676 | Accuracy | ||||||||||||||||||||||||
SimpleAr | 0.95 | EM | ||||||||||||||||||||||||
ruHumanEval | 0.012 / 0.058 / 0.116 | pass@k | ||||||||||||||||||||||||
ruHHH |
0.556
|
Accuracy | ||||||||||||||||||||||||
ruHateSpeech |
0.619
|
Accuracy | ||||||||||||||||||||||||
ruDetox |
|
Overall average score (J) Assessment of the preservation of meaning (SIM) Assessment of naturalness (FL) Style Transfer Accuracy (STA) |
||||||||||||||||||||||||
ruEthics |
Table results:
[[-0.12, -0.091
, -0.114, -0.141
, -0.129], |
5 MCC |
MERA
Mistral 7B
The Mistral-7B-v0.1 Large Language Model (LLM) is a pretrained generative text model with 7 billion parameters. Mistral-7B-v0.1 outperforms Llama 2 13B on all benchmarks we tested.
Mistral 7B leverages grouped-query attention (GQA), and sliding window attention (SWA). GQA significantly accelerates the inference speed, and also reduces the memory requirement during decoding, allowing for higher batch sizes hence higher throughput, a crucial factor for real-time applications. In addition, SWA is designed to handle longer sequences more effectively at a reduced computational cost, thereby alleviating a common limitation in LLMs. These attention mechanisms collectively contribute to the enhanced performance and efficiency of Mistral 7B.
-
Mistral-7B-v0.1 is a transformer model, with the following architecture choices: Grouped-Query Attention Sliding-Window Attention Byte-fallback BPE tokenizer.
Apache 2.0 license
Code version v.1.1.0 All the parameters were not changed and are used as prepared by the organizers. Details: - 1 x NVIDIA A100 - dtype auto - Pytorch 2.1.2 + CUDA 12.1 - Transformers 4.36.2 - Context length 11500