A-Vibe

Avito Created at 28.03.2025 05:57
0.578
The overall result
44
Place in the rating
In the top by tasks:
8
ruHumanEval
9
ruModAr
The task is one of the main ones
8
ruCodeEval
The task is one of the main ones
Weak tasks:
144
RWSD
134
PARus
129
RCB
123
ruEthics
65
MultiQ
141
ruWorldTree
135
ruOpenBookQA
164
CheGeKa
142
ruMMLU
45
ruHateSpeech
58
ruDetox
69
ruHHH
93
ruTiE
65
USE
90
MathLogicQA
102
ruMultiAr
38
SimpleAr
398
LCS
57
BPS
119
MaMuRAMu
+16
Hide

Ratings for leaderboard tasks

The table will scroll to the left

Task name Result Metric
LCS 0.076 Accuracy
RCB 0.55 / 0.523 Accuracy F1 macro
USE 0.307 Grade norm
RWSD 0.565 Accuracy
PARus 0.876 Accuracy
ruTiE 0.777 Accuracy
MultiQ 0.523 / 0.387 F1 Exact match
CheGeKa 0.163 / 0.118 F1 Exact match
ruModAr 0.887 Exact match
MaMuRAMu 0.739 Accuracy
ruMultiAr 0.319 Exact match
ruCodeEval 0.605 / 0.754 / 0.793 Pass@k
MathLogicQA 0.487 Accuracy
ruWorldTree 0.933 / 0.933 Accuracy F1 macro
ruOpenBookQA 0.85 / 0.849 Accuracy F1 macro

Evaluation on open tasks:

Go to the ratings by subcategory

The table will scroll to the left

Task name Result Metric
BPS 0.984 Accuracy
ruMMLU 0.634 Accuracy
SimpleAr 0.994 Exact match
ruHumanEval 0.587 / 0.739 / 0.774 Pass@k
ruHHH 0.831
ruHateSpeech 0.83
ruDetox 0.311
ruEthics
Correct God Ethical
Virtue 0.417 0.348 0.419
Law 0.407 0.326 0.403
Moral 0.431 0.366 0.44
Justice 0.387 0.299 0.362
Utilitarianism 0.339 0.318 0.361

Information about the submission:

Mera version
v.1.2.0
Torch Version
2.3.1
The version of the codebase
30667dc322678fdec25b3d425d3dcee7bc371564
CUDA version
12.1
Precision of the model weights
bf16
Seed
1234
Butch
6
Transformers version
4.44.2
The number of GPUs and their type
1 x NVIDIA H100 PCIe
Architecture
vllm

Team:

Avito

Name of the ML model:

A-Vibe

Model size

7.0B

Model type:

Closed

SFT

Architecture description:

Based on Qwen2.5-7b

Description of the training:

We adapted the tokenizer and fine-tuned the model for instruction following

Inference parameters

Generation Parameters:
simplear - do_sample=false;until=[" \n"]; \nchegeka - do_sample=false;until=[" \n"]; \nrudetox - do_sample=false;until=[" \n"]; \nrumultiar - do_sample=false;until=[" \n"]; \nuse - do_sample=false;until=[" \n","."]; \nmultiq - do_sample=false;until=[" \n"]; \nrumodar - do_sample=false;until=[" \n"]; \nruhumaneval - do_sample=true;until=[" \nclass"," \ndef"," \n#"," \nif"," \nprint"];temperature=0.6; \nrucodeeval - do_sample=true;until=[" \nclass"," \ndef"," \n#"," \nif"," \nprint"];temperature=0.6;

System prompt:
Реши задачу по инструкции ниже. Не давай никаких объяснений и пояснений к своему ответу. Не пиши ничего лишнего. Пиши только то, что указано в инструкции. Если по инструкции нужно решить пример, то напиши только числовой ответ без хода решения и пояснений. Если по инструкции нужно вывести букву, цифру или слово, выведи только его. Если по инструкции нужно выбрать один из вариантов ответа и вывести букву или цифру, которая ему соответствует, то выведи только эту букву или цифру, не давай никаких пояснений, не добавляй знаки препинания, только 1 символ в ответе. Если по инструкции нужно дописать код функции на языке Python, пиши сразу код, соблюдая отступы так, будто ты продолжаешь функцию из инструкции, не давай пояснений, не пиши комментарии, используй только аргументы из сигнатуры функции в инструкции, не пробуй считывать данные через функцию input. Не извиняйся, не строй диалог. Выдавай только ответ и ничего больше.

Description of the template:
{%- if messages[0]['role'] == 'system' -%} {%- set system_message = messages[0]['content'] -%} {%- set loop_messages = messages[1:] -%}{%- else -%} {%- set system_message = None -%} {%- set loop_messages = messages -%}{%- endif -%}{%- set ns = namespace(index=0, last_user_idx=None) -%}{%- for i in range(loop_messages|length) -%} {%- if loop_messages[i]['role'] == 'user' -%} {%- set ns.last_user_idx = i -%} {%- endif -%}{%- endfor -%}{{- bos_token -}}{%- if system_message is not none -%}{{- '[INST]' + system_message + '[/INST]' -}}{%- endif -%}{%- for i in range(loop_messages|length) -%} {%- set message = loop_messages[i] -%} {%- if (message['role'] == 'user') != (ns.index % 2 == 0) -%} {{- raise_exception('After the optional system message, conversation roles must alternate user/assistant/user/assistant/...') -}} {%- endif -%} {%- if message['role'] == 'user' -%} {{- '[INST]' + message['content'] + '[/INST]' -}} {%- elif message['role'] == 'assistant' -%} {% generation %}{{- ' ' + message['content'] + eos_token -}}{% endgeneration %} {%- else -%} {{- raise_exception('Only user and assistant roles are supported, with the exception of an initial optional system message!') -}} {%- endif -%} {%- set ns.index = ns.index + 1 -%}{%- endfor -%}

Expand information

Ratings by subcategory

Metric: Grade Norm
Model, team 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 8_0 8_1 8_2 8_3 8_4
A-Vibe
Avito
0.667 0.467 0.867 0.267 0.1 0.4 0 - 0.133 0.1 0.1 0.067 0.1 0.033 0.133 0.55 0.033 0 0.033 0.033 0.033 0.7 0.367 0.033 0.167 0.542 0.2 0.367 0.633 0.467 0.667
Model, team Honest Helpful Harmless
A-Vibe
Avito
0.77 0.814 0.914
Model, team Anatomy Virology Astronomy Marketing Nutrition Sociology Management Philosophy Prehistory Human aging Econometrics Formal logic Global facts Jurisprudence Miscellaneous Moral disputes Business ethics Biology (college) Physics (college) Human Sexuality Moral scenarios World religions Abstract algebra Medicine (college) Machine learning Medical genetics Professional law PR Security studies Chemistry (школьная) Computer security International law Logical fallacies Politics Clinical knowledge Conceptual_physics Math (college) Biology (high school) Physics (high school) Chemistry (high school) Geography (high school) Professional medicine Electrical engineering Elementary mathematics Psychology (high school) Statistics (high school) History (high school) Math (high school) Professional accounting Professional psychology Computer science (college) World history (high school) Macroeconomics Microeconomics Computer science (high school) European history Government and politics
A-Vibe
Avito
0.607 0.5 0.737 0.795 0.696 0.776 0.825 0.675 0.673 0.655 0.596 0.476 0.41 0.722 0.742 0.662 0.7 0.722 0.467 0.679 0.423 0.76 0.42 0.613 0.554 0.74 0.432 0.657 0.718 0.49 0.74 0.777 0.669 0.798 0.721 0.662 0.48 0.816 0.503 0.581 0.818 0.625 0.634 0.637 0.822 0.593 0.784 0.478 0.44 0.623 0.62 0.797 0.713 0.748 0.82 0.776 0.741
Model, team SIM FL STA
A-Vibe
Avito
0.727 0.676 0.665
Model, team Anatomy Virology Astronomy Marketing Nutrition Sociology Managment Philosophy Pre-History Gerontology Econometrics Formal logic Global facts Jurisprudence Miscellaneous Moral disputes Business ethics Bilology (college) Physics (college) Human sexuality Moral scenarios World religions Abstract algebra Medicine (college) Machine Learning Genetics Professional law PR Security Chemistry (college) Computer security International law Logical fallacies Politics Clinical knowledge Conceptual physics Math (college) Biology (high school) Physics (high school) Chemistry (high school) Geography (high school) Professional medicine Electrical Engineering Elementary mathematics Psychology (high school) Statistics (high school) History (high school) Math (high school) Professional Accounting Professional psychology Computer science (college) World history (high school) Macroeconomics Microeconomics Computer science (high school) Europe History Government and politics
A-Vibe
Avito
0.622 0.842 0.667 0.676 0.776 0.741 0.569 0.684 0.692 0.646 0.782 0.758 0.508 0.76 0.713 0.679 0.71 0.711 0.526 0.807 0.614 0.763 0.733 0.805 0.8 0.742 0.782 0.649 0.895 0.8 0.822 0.808 0.705 0.86 0.621 0.786 0.8 0.778 0.667 0.723 0.824 0.794 0.733 0.889 0.845 0.867 0.879 0.864 0.769 0.895 0.822 0.754 0.823 0.662 0.558 0.561 0.8
Coorect
Good
Ethical
Model, team Virtue Law Moral Justice Utilitarianism
A-Vibe
Avito
0.417 0.407 0.431 0.387 0.339
Model, team Virtue Law Moral Justice Utilitarianism
A-Vibe
Avito
0.348 0.326 0.366 0.299 0.318
Model, team Virtue Law Moral Justice Utilitarianism
A-Vibe
Avito
0.419 0.403 0.44 0.362 0.361
Model, team Women Men LGBT Nationalities Migrants Other
A-Vibe
Avito
0.843 0.714 1 0.784 0.857 0.852