Meno-Tiny-0.1

НГУ Created at 07.12.2024 18:27
0.365
The overall result
315
Place in the rating
Weak tasks:
251
RWSD
386
PARus
462
RCB
343
ruEthics
170
MultiQ
326
ruWorldTree
285
ruOpenBookQA
421
CheGeKa
314
ruMMLU
389
ruHateSpeech
83
ruDetox
386
ruHHH
277
ruTiE
334
ruHumanEval
375
USE
250
MathLogicQA
268
ruMultiAr
234
SimpleAr
358
LCS
377
BPS
330
ruModAr
306
MaMuRAMu
+18
Hide

Ratings for leaderboard tasks

The table will scroll to the left

Task name Result Metric
LCS 0.084 Accuracy
RCB 0.365 / 0.278 Accuracy F1 macro
USE 0.069 Grade norm
RWSD 0.527 Accuracy
PARus 0.574 Accuracy
ruTiE 0.603 Accuracy
MultiQ 0.399 / 0.29 F1 Exact match
CheGeKa 0.016 / 0.005 F1 Exact match
ruModAr 0.375 Exact match
MaMuRAMu 0.536 Accuracy
ruMultiAr 0.231 Exact match
ruCodeEval 0 / 0 / 0 Pass@k
MathLogicQA 0.388 Accuracy
ruWorldTree 0.728 / 0.728 Accuracy F1 macro
ruOpenBookQA 0.683 / 0.683 Accuracy F1 macro

Evaluation on open tasks:

Go to the ratings by subcategory

The table will scroll to the left

Task name Result Metric
BPS 0.7 Accuracy
ruMMLU 0.453 Accuracy
SimpleAr 0.941 Exact match
ruHumanEval 0.001 / 0.003 / 0.006 Pass@k
ruHHH 0.517
ruHateSpeech 0.543
ruDetox 0.264
ruEthics
Correct God Ethical
Virtue 0.157 0.264 0.181
Law 0.182 0.247 0.145
Moral 0.196 0.268 0.164
Justice 0.137 0.212 0.161
Utilitarianism 0.106 0.203 0.147

Information about the submission:

Mera version
v.1.2.0
Torch Version
2.3.1
The version of the codebase
30667dc
CUDA version
11.8
Precision of the model weights
bfloat16
Seed
1234
Butch
1
Transformers version
4.46.3
The number of GPUs and their type
1 x NVIDIA A100 80GB PCIe
Architecture
hf

Team:

НГУ

Name of the ML model:

Meno-Tiny-0.1

Model size

1.5B

Model type:

Opened

SFT

Architecture description:

Meno-Tiny is a descendant of Qwen2.5 1.5B Instruct which is fine-tuned on a special Russian instruct dataset. It is a language model including decoder of 1.5B size. It is based on the Transformer architecture with SwiGLU activation, attention QKV bias, group query attention, etc. The name "Meno" is associated with the adaptation of this model to the tasks of answering questions to the text in the RAG pipeline (in honor of the theory of knowledge as recollection from the Socratic dialogue "Meno").

Description of the training:

The model was pretrained with a large amount of data, after that it was post-trained with both supervised finetuning and direct preference optimization. The last stage of training is a supervised fine-tuning on a instruct dataset in Russian. This dataset includes common instructions (creativity, question answering) and special instructions concerned with ASR correction, NER, text segmentation etc.

Pretrain data:

The model was pretrained with a large amount of data of English, Chinese and more 27 additional languages including Russian. In terms of the context length, the model was pretrained on data of the context length of 32K tokens. The model was fine-tuned with a more than 200 thousands instructions in Russian, devoted to common tasks and questions (including questions by documents), and special tasks as like as named entity recognition, speech-to-text error correstion etc. The dataset for fine-tuning was ordered by complexity, and this order was used for curriculum learning (here, the complexity was directly proportional to the loss function of the pretained model).

License:

Apache 2.0

Inference parameters

Generation Parameters:
simplear - do_sample=false;until=["\n"]; \nchegeka - do_sample=false;until=["\n"]; \nrudetox - do_sample=false;until=["\n"]; \nrumultiar - do_sample=false;until=["\n"]; \nuse - do_sample=false;until=["\n","."]; \nmultiq - do_sample=false;until=["\n"]; \nrumodar - do_sample=false;until=["\n"]; \nruhumaneval - do_sample=true;until=["\nclass","\ndef","\n#","\nif","\nprint"];temperature=0.6; \nrucodeeval - do_sample=true;until=["\nclass","\ndef","\n#","\nif","\nprint"];temperature=0.6;

The size of the context:
simplear, chegeka, rudetox, rumultiar, use, multiq, rumodar, ruhumaneval, rucodeeval - 32768

System prompt:
Ты - Менон, разработанный Иваном Бондаренко. Ты полезный ассистент. Реши задачу по инструкции ниже.

Description of the template:
{%- if tools %} \n {{- '<|im_start|>system\n' }} \n {%- if messages[0]['role'] == 'system' %} \n {{- messages[0]['content'] }} \n {%- else %} \n {{- 'You are Meno, created by Ivan Bondarenko. You are a helpful assistant.' }} \n {%- endif %} \n {{- "\n\n# Tools\n\nYou may call one or more functions to assist with the user query.\n\nYou are provided with function signatures within <tools></tools> XML tags:\n<tools>" }} \n {%- for tool in tools %} \n {{- "\n" }} \n {{- tool | tojson }} \n {%- endfor %} \n {{- "\n</tools>\n\nFor each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:\n<tool_call>\n{\"name\": <function-name>, \"arguments\": <args-json-object>}\n</tool_call><|im_end|>\n" }} \n{%- else %} \n {%- if messages[0]['role'] == 'system' %} \n {{- '<|im_start|>system\n' + messages[0]['content'] + '<|im_end|>\n' }} \n {%- else %} \n {{- '<|im_start|>system\nYou are Meno, created by Ivan Bondarenko. You are a helpful assistant.<|im_end|>\n' }} \n {%- endif %} \n{%- endif %} \n{%- for message in messages %} \n {%- if (message.role == "user") or (message.role == "system" and not loop.first) or (message.role == "assistant" and not message.tool_calls) %} \n {{- '<|im_start|>' + message.role + '\n' + message.content + '<|im_end|>' + '\n' }} \n {%- elif message.role == "assistant" %} \n {{- '<|im_start|>' + message.role }} \n {%- if message.content %} \n {{- '\n' + message.content }} \n {%- endif %} \n {%- for tool_call in message.tool_calls %} \n {%- if tool_call.function is defined %} \n {%- set tool_call = tool_call.function %} \n {%- endif %} \n {{- '\n<tool_call>\n{"name": "' }} \n {{- tool_call.name }} \n {{- '", "arguments": ' }} \n {{- tool_call.arguments | tojson }} \n {{- '}\n</tool_call>' }} \n {%- endfor %} \n {{- '<|im_end|>\n' }} \n {%- elif message.role == "tool" %} \n {%- if (loop.index0 == 0) or (messages[loop.index0 - 1].role != "tool") %} \n {{- '<|im_start|>user' }} \n {%- endif %} \n {{- '\n<tool_response>\n' }} \n {{- message.content }} \n {{- '\n</tool_response>' }} \n {%- if loop.last or (messages[loop.index0 + 1].role != "tool") %} \n {{- '<|im_end|>\n' }} \n {%- endif %} \n {%- endif %} \n{%- endfor %} \n{%- if add_generation_prompt %} \n {{- '<|im_start|>assistant\n' }} \n{%- endif %}

Expand information

Ratings by subcategory

Metric: Grade Norm
Model, team 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 8_0 8_1 8_2 8_3 8_4
Meno-Tiny-0.1
НГУ
0.067 0.067 0.433 0.2 0 0.1 0 - 0 0 0 0 0 0 0.1 0.083 0 0.033 0.033 0 0 0.067 0.033 0 0.1 0.1 0.067 0.1 0.267 0.033 0.067
Model, team Honest Helpful Harmless
Meno-Tiny-0.1
НГУ
0.541 0.508 0.5
Model, team Anatomy Virology Astronomy Marketing Nutrition Sociology Management Philosophy Prehistory Human aging Econometrics Formal logic Global facts Jurisprudence Miscellaneous Moral disputes Business ethics Biology (college) Physics (college) Human Sexuality Moral scenarios World religions Abstract algebra Medicine (college) Machine learning Medical genetics Professional law PR Security studies Chemistry (школьная) Computer security International law Logical fallacies Politics Clinical knowledge Conceptual_physics Math (college) Biology (high school) Physics (high school) Chemistry (high school) Geography (high school) Professional medicine Electrical engineering Elementary mathematics Psychology (high school) Statistics (high school) History (high school) Math (high school) Professional accounting Professional psychology Computer science (college) World history (high school) Macroeconomics Microeconomics Computer science (high school) European history Government and politics
Meno-Tiny-0.1
НГУ
0.378 0.404 0.559 0.705 0.526 0.642 0.612 0.502 0.432 0.511 0.237 0.5 0.32 0.63 0.516 0.491 0.56 0.34 0.322 0.496 0.257 0.503 0.31 0.445 0.268 0.5 0.347 0.509 0.547 0.45 0.58 0.736 0.54 0.707 0.494 0.457 0.38 0.497 0.318 0.433 0.535 0.349 0.428 0.475 0.582 0.44 0.456 0.367 0.376 0.374 0.43 0.57 0.479 0.496 0.59 0.648 0.492
Model, team SIM FL STA
Meno-Tiny-0.1
НГУ
0.645 0.597 0.732
Model, team Anatomy Virology Astronomy Marketing Nutrition Sociology Managment Philosophy Pre-History Gerontology Econometrics Formal logic Global facts Jurisprudence Miscellaneous Moral disputes Business ethics Bilology (college) Physics (college) Human sexuality Moral scenarios World religions Abstract algebra Medicine (college) Machine Learning Genetics Professional law PR Security Chemistry (college) Computer security International law Logical fallacies Politics Clinical knowledge Conceptual physics Math (college) Biology (high school) Physics (high school) Chemistry (high school) Geography (high school) Professional medicine Electrical Engineering Elementary mathematics Psychology (high school) Statistics (high school) History (high school) Math (high school) Professional Accounting Professional psychology Computer science (college) World history (high school) Macroeconomics Microeconomics Computer science (high school) Europe History Government and politics
Meno-Tiny-0.1
НГУ
0.444 0.554 0.45 0.546 0.513 0.534 0.414 0.491 0.385 0.508 0.654 0.558 0.367 0.55 0.45 0.444 0.589 0.467 0.386 0.614 0.386 0.492 0.533 0.509 0.578 0.515 0.628 0.491 0.754 0.6 0.756 0.603 0.482 0.544 0.53 0.5 0.533 0.6 0.509 0.508 0.547 0.508 0.667 0.667 0.776 0.822 0.534 0.591 0.738 0.754 0.689 0.377 0.658 0.468 0.442 0.368 0.644
Coorect
Good
Ethical
Model, team Virtue Law Moral Justice Utilitarianism
Meno-Tiny-0.1
НГУ
0.157 0.182 0.196 0.137 0.106
Model, team Virtue Law Moral Justice Utilitarianism
Meno-Tiny-0.1
НГУ
0.264 0.247 0.268 0.212 0.203
Model, team Virtue Law Moral Justice Utilitarianism
Meno-Tiny-0.1
НГУ
0.181 0.145 0.164 0.161 0.147
Model, team Women Men LGBT Nationalities Migrants Other
Meno-Tiny-0.1
НГУ
0.583 0.571 0.588 0.568 0.143 0.475