RuadaptQwen-32B-instruct

RCC MSU Created at 13.11.2024 13:20
0.615
The overall result
18
Place in the rating
In the top by tasks:
4
ruHateSpeech
Weak tasks:
34
RWSD
48
PARus
104
RCB
172
ruEthics
56
MultiQ
44
ruOpenBookQA
105
CheGeKa
49
ruMMLU
48
ruDetox
47
ruHHH
22
ruTiE
26
ruHumanEval
27
USE
34
SimpleAr
57
LCS
55
BPS
65
ruModAr
52
MaMuRAMu
+14
Hide

Ratings for leaderboard tasks

The table will scroll to the left

Task name Result Metric
LCS 0.152 Accuracy
RCB 0.557 / 0.519 Accuracy F1 macro
USE 0.35 Grade norm
RWSD 0.638 Accuracy
PARus 0.924 Accuracy
ruTiE 0.866 Accuracy
MultiQ 0.55 / 0.403 F1 Exact match
CheGeKa 0.215 / 0.166 F1 Exact match
ruModAr 0.686 Exact match
MaMuRAMu 0.812 Accuracy
ruMultiAr 0.433 Exact match
ruCodeEval 0.426 / 0.561 / 0.598 Pass@k
MathLogicQA 0.726 Accuracy
ruWorldTree 0.985 / 0.985 Accuracy F1 macro
ruOpenBookQA 0.915 / 0.915 Accuracy F1 macro

Evaluation on open tasks:

Go to the ratings by subcategory

The table will scroll to the left

Task name Result Metric
BPS 0.984 Accuracy
ruMMLU 0.737 Accuracy
SimpleAr 0.994 Exact match
ruHumanEval 0.389 / 0.506 / 0.543 Pass@k
ruHHH 0.848
ruHateSpeech 0.872
ruDetox 0.316
ruEthics
Correct God Ethical
Virtue 0.361 0.307 0.357
Law 0.351 0.301 0.339
Moral 0.378 0.312 0.382
Justice 0.315 0.267 0.326
Utilitarianism 0.31 0.256 0.29

Information about the submission:

Mera version
v.1.2.0
Torch Version
2.4.0
The version of the codebase
430295f
CUDA version
12.1
Precision of the model weights
bfloat16
Seed
1234
Butch
4
Transformers version
4.45.2
The number of GPUs and their type
1 x NVIDIA A100
Architecture
vllm

Team:

RCC MSU

Name of the ML model:

RuadaptQwen-32B-instruct

Model size

32.0B

Model type:

Opened

SFT

Architecture description:

Инструктивная версия адаптированного на русский язык Qwen2.5-32B. В модели был заменен токенизатор, затем произведено дообучение (Continued pretraining) на русскоязычном корпусе, после чего была применена техника LEP (Learned Embedding Propagation, paper will be soon). Благодаря новому токенизатору (расширенный tiktoken cl100k с помощью униграм токенизатора на 48 т. токенов) скорость генерации* русскоязычных текстов возрасла до 60% по сравнению с исходной моделью Qwen-2.5-32B-Instruct. Tikhomirov M., Chernyshev D. Facilitating large language model Russian adaptation with Learned Embedding Propagation // 2024 (will be soon) Tikhomirov M., Chernyshev D. Impact of Tokenization on LLaMa Russian Adaptation //2023 Ivannikov Ispras Open Conference (ISPRAS). – IEEE, 2023. – С. 163-168. *Под скоростью генерации подразумевается количество русскоязычных символов/слов в секунду на одинаковых текстовых последовательностях.

License:

Apache license 2.0

Inference parameters

Generation Parameters:
simplear - do_sample=false;until=["\n"]; \nchegeka - do_sample=false;until=["\n"]; \nrudetox - do_sample=false;until=["\n"]; \nrumultiar - do_sample=false;until=["\n"]; \nuse - do_sample=false;until=["\n","."]; \nmultiq - do_sample=false;until=["\n"]; \nrumodar - do_sample=false;until=["\n"]; \nruhumaneval - do_sample=true;until=["\nclass","\ndef","\n#","\nif","\nprint"];temperature=0.6; \nrucodeeval - do_sample=true;until=["\nclass","\ndef","\n#","\nif","\nprint"];temperature=0.6;

The size of the context:
simplear, chegeka, rudetox, rumultiar, use, multiq, rumodar, ruhumaneval, rucodeeval - 8192 \nrutie - 3000

System prompt:
Решай задачу строго по инструкции. Только ответ, без объяснений. Числовой ответ — только число. Буква, цифра или слово — только их. Выбор варианта ответа — одна буква или цифра. Ответ должен быть точным, без лишних символов или слов. В случае, если нужно сгенерировать код на Python — твоим ответом должен быть только код (продолжения кода из инструкции), не повтореняй имя функции, не давай пояснений, не пиши комментариев, не используй input, пиши код так, чтобы он дополнял функцию из инструкции (с соответствующими отступами) и всегда начинай написание кода с переноса строки!

Description of the template:
{%- if tools %} \n {{- '<|im_start|>system\n' }} \n {%- if messages[0]['role'] == 'system' %} \n {{- messages[0]['content'] }} \n {%- else %} \n {{- 'You are Qwen, created by Alibaba Cloud. You are a helpful assistant.' }} \n {%- endif %} \n {{- "\n\n# Tools\n\nYou may call one or more functions to assist with the user query.\n\nYou are provided with function signatures within <tools></tools> XML tags:\n<tools>" }} \n {%- for tool in tools %} \n {{- "\n" }} \n {{- tool | tojson }} \n {%- endfor %} \n {{- "\n</tools>\n\nFor each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:\n<tool_call>\n{\"name\": <function-name>, \"arguments\": <args-json-object>}\n</tool_call><|im_end|>\n" }} \n{%- else %} \n {%- if messages[0]['role'] == 'system' %} \n {{- '<|im_start|>system\n' + messages[0]['content'] + '<|im_end|>\n' }} \n {%- endif %} \n{%- endif %} \n{%- for message in messages %} \n {%- if (message.role == "user") or (message.role == "system" and not loop.first) or (message.role == "assistant" and not message.tool_calls) %} \n {{- '<|im_start|>' + message.role + '\n' + message.content + '<|im_end|>' + '\n' }} \n {%- elif message.role == "assistant" %} \n {{- '<|im_start|>' + message.role }} \n {%- if message.content %} \n {{- '\n' + message.content }} \n {%- endif %} \n {%- for tool_call in message.tool_calls %} \n {%- if tool_call.function is defined %} \n {%- set tool_call = tool_call.function %} \n {%- endif %} \n {{- '\n<tool_call>\n{"name": "' }} \n {{- tool_call.name }} \n {{- '", "arguments": ' }} \n {{- tool_call.arguments | tojson }} \n {{- '}\n</tool_call>' }} \n {%- endfor %} \n {{- '<|im_end|>\n' }} \n {%- elif message.role == "tool" %} \n {%- if (loop.index0 == 0) or (messages[loop.index0 - 1].role != "tool") %} \n {{- '<|im_start|>user' }} \n {%- endif %} \n {{- '\n<tool_response>\n' }} \n {{- message.content }} \n {{- '\n</tool_response>' }} \n {%- if loop.last or (messages[loop.index0 + 1].role != "tool") %} \n {{- '<|im_end|>\n' }} \n {%- endif %} \n {%- endif %} \n{%- endfor %} \n{%- if add_generation_prompt %} \n {{- '<|im_start|>assistant\n' }} \n{%- endif %}

Expand information

Ratings by subcategory

Metric: Grade Norm
Model, team 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 8_0 8_1 8_2 8_3 8_4
RuadaptQwen-32B-instruct
RCC MSU
0.567 0.533 0.8 0.2 0.167 0.633 0.033 - 0.133 0.067 0.1 0.033 0.267 0.067 0.133 0.417 0.067 0.033 0 0.067 0.067 0.733 0.4 0.267 0.3 0.758 0.367 0.3 0.567 0.567 0.567
Model, team Honest Helpful Harmless
RuadaptQwen-32B-instruct
RCC MSU
0.836 0.814 0.897
Model, team Anatomy Virology Astronomy Marketing Nutrition Sociology Management Philosophy Prehistory Human aging Econometrics Formal logic Global facts Jurisprudence Miscellaneous Moral disputes Business ethics Biology (college) Physics (college) Human Sexuality Moral scenarios World religions Abstract algebra Medicine (college) Machine learning Medical genetics Professional law PR Security studies Chemistry (школьная) Computer security International law Logical fallacies Politics Clinical knowledge Conceptual_physics Math (college) Biology (high school) Physics (high school) Chemistry (high school) Geography (high school) Professional medicine Electrical engineering Elementary mathematics Psychology (high school) Statistics (high school) History (high school) Math (high school) Professional accounting Professional psychology Computer science (college) World history (high school) Macroeconomics Microeconomics Computer science (high school) European history Government and politics
RuadaptQwen-32B-instruct
RCC MSU
0.681 0.536 0.868 0.842 0.81 0.836 0.796 0.791 0.843 0.762 0.64 0.579 0.57 0.778 0.822 0.769 0.82 0.861 0.644 0.794 0.494 0.854 0.65 0.723 0.688 0.82 0.544 0.62 0.792 0.56 0.78 0.843 0.773 0.879 0.77 0.838 0.67 0.881 0.728 0.749 0.854 0.794 0.731 0.838 0.863 0.769 0.887 0.622 0.535 0.734 0.73 0.873 0.836 0.853 0.86 0.812 0.886
Model, team SIM FL STA
RuadaptQwen-32B-instruct
RCC MSU
0.615 0.665 0.803
Model, team Anatomy Virology Astronomy Marketing Nutrition Sociology Managment Philosophy Pre-History Gerontology Econometrics Formal logic Global facts Jurisprudence Miscellaneous Moral disputes Business ethics Bilology (college) Physics (college) Human sexuality Moral scenarios World religions Abstract algebra Medicine (college) Machine Learning Genetics Professional law PR Security Chemistry (college) Computer security International law Logical fallacies Politics Clinical knowledge Conceptual physics Math (college) Biology (high school) Physics (high school) Chemistry (high school) Geography (high school) Professional medicine Electrical Engineering Elementary mathematics Psychology (high school) Statistics (high school) History (high school) Math (high school) Professional Accounting Professional psychology Computer science (college) World history (high school) Macroeconomics Microeconomics Computer science (high school) Europe History Government and politics
RuadaptQwen-32B-instruct
RCC MSU
0.711 0.901 0.75 0.713 0.908 0.793 0.707 0.684 0.827 0.815 0.808 0.792 0.558 0.783 0.772 0.765 0.757 0.822 0.754 0.842 0.807 0.898 0.889 0.858 0.844 0.773 0.808 0.737 0.93 0.822 0.844 0.859 0.821 0.86 0.727 0.821 0.933 0.8 0.772 0.754 0.861 0.905 0.844 1 0.914 0.911 0.931 0.932 0.877 0.93 0.867 0.826 0.873 0.766 0.628 0.743 0.878
Coorect
Good
Ethical
Model, team Virtue Law Moral Justice Utilitarianism
RuadaptQwen-32B-instruct
RCC MSU
0.361 0.351 0.378 0.315 0.31
Model, team Virtue Law Moral Justice Utilitarianism
RuadaptQwen-32B-instruct
RCC MSU
0.307 0.301 0.312 0.267 0.256
Model, team Virtue Law Moral Justice Utilitarianism
RuadaptQwen-32B-instruct
RCC MSU
0.357 0.339 0.382 0.326 0.29
Model, team Women Men LGBT Nationalities Migrants Other
RuadaptQwen-32B-instruct
RCC MSU
0.889 0.8 0.882 0.838 0.857 0.902