Cotype Pro 2.5

MWS AI Created at 21.10.2025 11:45
0.671
The overall result
35
Place in the rating

Ratings for leaderboard tasks

The table will scroll to the left

Task name Result Metric
LCS 0.414 Accuracy
RCB 0.584 / 0.572 Accuracy F1 macro
USE 0.358 Grade norm
RWSD 0.577 Accuracy
PARus 0.912 Accuracy
ruTiE 0.873 Accuracy
MultiQ 0.656 / 0.494 F1 Exact match
CheGeKa 0.201 / 0.154 F1 Exact match
ruModAr 0.933 Exact match
MaMuRAMu 0.829 Accuracy
ruMultiAr 0.427 Exact match
ruCodeEval 0.652 / 0.816 / 0.848 Pass@k
MathLogicQA 0.71 Accuracy
ruWorldTree 0.985 / 0.985 Accuracy F1 macro
ruOpenBookQA 0.94 / 0.94 Accuracy F1 macro

Evaluation on open tasks:

Go to the ratings by subcategory

The table will scroll to the left

Task name Result Metric
BPS 0.995 Accuracy
ruMMLU 0.905 Accuracy
SimpleAr 0.993 Exact match
ruHumanEval 0.647 / 0.809 / 0.841 Pass@k
ruHHH 0.843
ruHateSpeech 0.811
ruDetox 0.304
ruEthics
Correct God Ethical
Virtue 0.271 0.266 0.353
Law 0.258 0.254 0.336
Moral 0.273 0.278 0.378
Justice 0.225 0.221 0.323
Utilitarianism 0.218 0.221 0.313

Information about the submission

Mera version
v1.2.0
Torch Version
2.7.0
The version of the codebase
9e50515
CUDA version
12.6
Precision of the model weights
bfloat16
Seed
1234
Batch
1
Transformers version
4.53.1
The number of GPUs and their type
4 x NVIDIA A100-SXM4-40GB
Architecture
vllm

Team:

MWS AI

Name of the ML model:

Cotype Pro 2.5

Model size

32.5B

Model type:

Closed

SFT

Architecture description:

Cotype Pro 2.5 — большая языковая модель для создания ИИ-помощников с продвинутыми агентными навыками и расширенными возможностями интеграции с корпоративными базами знаний. Эффективно генерирует идеи, извлекает, классифицирует и обобщает информацию, работает с компьютерным кодом, вызывает функции и планирует выполнение задач.

License:

Проприетарная модель от MWS AI

Inference parameters

Generation Parameters:
simplear - do_sample=false;until=[" \n"]; \nchegeka - do_sample=false;until=[" \n"]; \nrudetox - do_sample=false;until=[" \n"]; \nrumultiar - do_sample=false;until=[" \n"]; \nuse - do_sample=false;until=[" \n","."]; \nmultiq - do_sample=false;until=[" \n"]; \nrumodar - do_sample=false;until=[" \n"]; \nruhumaneval - do_sample=true;temperature=0.6;until=[" \nclass"," \ndef"," \n#"," \nif"," \nprint"]; \nrucodeeval - do_sample=true;temperature=0.6;until=[" \nclass"," \ndef"," \n#"," \nif"," \nprint"];

System prompt:
Решай задачу строго по инструкции. Только ответ, без объяснений. Числовой ответ — только число. Буква, цифра или слово — только их. Выбор варианта ответа — одна буква или цифра. Ответ должен быть точным, без лишних символов или слов. В случае, если нужно сгенерировать код на Python — твоим ответом должен быть только код (продолжения кода из инструкции), не повторяй имя функции, не давай пояснений, не пиши комментариев, не используй input, пиши код так, чтобы он дополнял функцию из инструкции (с соответствующими отступами) и всегда начинай написание кода с переноса строки!

Description of the template:
{%- if tools %} \n {{- '<|im_start|>system \n' }} \n {%- if messages[0]['role'] == 'system' %} \n {{- messages[0]['content'] }} \n {%- else %} \n {{- 'Ты — ИИ-помощник. Тебе дано задание: необходимо сгенерировать подробный и развернутый ответ.' }} \n {%- endif %} \n {{- " \n \n# Tools \n \nYou may call one or more functions to assist with the user query. \n \nYou are provided with function signatures within <tools></tools> XML tags: \n<tools>" }} \n {%- for tool in tools %} \n {{- " \n" }} \n {{- tool | tojson }} \n {%- endfor %} \n {{- " \n</tools> \n \nFor each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags: \n<tool_call> \n{"name": <function-name>, "arguments": <args-json-object>} \n</tool_call><|im_end|> \n" }} \n{%- else %} \n {%- if messages[0]['role'] == 'system' %} \n {{- '<|im_start|>system \n' + messages[0]['content'] + '<|im_end|> \n' }} \n {%- else %} \n {{- '<|im_start|>system \nТы — ИИ-помощник. Тебе дано задание: необходимо сгенерировать подробный и развернутый ответ.<|im_end|> \n' }} \n {%- endif %} \n{%- endif %} \n{%- for message in messages %} \n {%- if (message.role == "user") or (message.role == "system" and not loop.first) or (message.role == "assistant" and not message.tool_calls) %} \n {{- '<|im_start|>' + message.role + ' \n' + message.content + '<|im_end|>' + ' \n' }} \n {%- elif message.role == "assistant" %} \n {{- '<|im_start|>' + message.role }} \n {%- if message.content %} \n {{- ' \n' + message.content }} \n {%- endif %} \n {%- for tool_call in message.tool_calls %} \n {%- if tool_call.function is defined %} \n {%- set tool_call = tool_call.function %} \n {%- endif %} \n {{- ' \n<tool_call> \n{"name": "' }} \n {{- tool_call.name }} \n {{- '", "arguments": ' }} \n {{- tool_call.arguments | tojson }} \n {{- '} \n</tool_call>' }} \n {%- endfor %} \n {{- '<|im_end|> \n' }} \n {%- elif message.role == "tool" %} \n {%- if (loop.index0 == 0) or (messages[loop.index0 - 1].role != "tool") %} \n {{- '<|im_start|>user' }} \n {%- endif %} \n {{- ' \n<tool_response> \n' }} \n {{- message.content }} \n {{- ' \n</tool_response>' }} \n {%- if loop.last or (messages[loop.index0 + 1].role != "tool") %} \n {{- '<|im_end|> \n' }} \n {%- endif %} \n {%- endif %} \n{%- endfor %} \n{%- if add_generation_prompt %} \n {{- '<|im_start|>assistant \n' }} \n{%- endif %}

Ratings by subcategory

Metric: Grade Norm
Model, team 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 8_0 8_1 8_2 8_3 8_4
Cotype Pro 2.5
MWS AI
0.4 0.533 0.867 0.3 0.233 0.633 0.2 - 0.233 0.033 0.067 0.033 0.267 0.1 0.033 0.183 0.067 0.133 0 0.1 0.2 0.6 0.4 0.267 0.4 0.75 0.367 0.367 0.633 0.7 0.633
Model, team Honest Helpful Harmless
Cotype Pro 2.5
MWS AI
0.869 0.78 0.879
Model, team Anatomy Virology Astronomy Marketing Nutrition Sociology Management Philosophy Prehistory Human aging Econometrics Formal logic Global facts Jurisprudence Miscellaneous Moral disputes Business ethics Biology (college) Physics (college) Human Sexuality Moral scenarios World religions Abstract algebra Medicine (college) Machine learning Medical genetics Professional law PR Security studies Chemistry (школьная) Computer security International law Logical fallacies Politics Clinical knowledge Conceptual_physics Math (college) Biology (high school) Physics (high school) Chemistry (high school) Geography (high school) Professional medicine Electrical engineering Elementary mathematics Psychology (high school) Statistics (high school) History (high school) Math (high school) Professional accounting Professional psychology Computer science (college) World history (high school) Macroeconomics Microeconomics Computer science (high school) European history Government and politics
Cotype Pro 2.5
MWS AI
0.911 0.747 1 0.957 0.954 0.95 0.883 0.929 0.957 0.915 0.895 0.873 0.86 0.926 0.931 0.91 0.88 0.965 0.889 0.931 0.823 0.942 0.91 0.948 0.938 0.98 0.75 0.898 0.935 0.79 0.91 0.967 0.896 0.97 0.947 0.94 0.93 0.977 0.914 0.921 0.97 0.963 0.848 0.95 0.961 0.931 0.99 0.911 0.858 0.923 0.88 0.97 0.941 0.971 0.98 0.952 0.943
Model, team SIM FL STA
Cotype Pro 2.5
MWS AI
0.695 0.7 0.673
Model, team Anatomy Virology Astronomy Marketing Nutrition Sociology Managment Philosophy Pre-History Gerontology Econometrics Formal logic Global facts Jurisprudence Miscellaneous Moral disputes Business ethics Bilology (college) Physics (college) Human sexuality Moral scenarios World religions Abstract algebra Medicine (college) Machine Learning Genetics Professional law PR Security Chemistry (college) Computer security International law Logical fallacies Politics Clinical knowledge Conceptual physics Math (college) Biology (high school) Physics (high school) Chemistry (high school) Geography (high school) Professional medicine Electrical Engineering Elementary mathematics Psychology (high school) Statistics (high school) History (high school) Math (high school) Professional Accounting Professional psychology Computer science (college) World history (high school) Macroeconomics Microeconomics Computer science (high school) Europe History Government and politics
Cotype Pro 2.5
MWS AI
0.622 0.901 0.783 0.694 0.921 0.845 0.741 0.754 0.885 0.785 0.872 0.825 0.575 0.783 0.789 0.778 0.813 0.8 0.754 0.877 0.895 0.915 0.889 0.864 0.867 0.924 0.795 0.684 0.947 0.844 0.844 0.936 0.866 0.93 0.727 0.821 0.911 0.889 0.754 0.738 0.869 0.873 0.844 1 0.897 0.911 0.897 0.955 0.877 0.947 0.911 0.884 0.861 0.779 0.628 0.76 0.878
Coorect
Good
Ethical
Model, team Virtue Law Moral Justice Utilitarianism
Cotype Pro 2.5
MWS AI
0.271 0.258 0.273 0.225 0.218
Model, team Virtue Law Moral Justice Utilitarianism
Cotype Pro 2.5
MWS AI
0.266 0.254 0.278 0.221 0.221
Model, team Virtue Law Moral Justice Utilitarianism
Cotype Pro 2.5
MWS AI
0.353 0.336 0.378 0.323 0.313
Model, team Women Men LGBT Nationalities Migrants Other
Cotype Pro 2.5
MWS AI
0.87 0.686 0.765 0.757 0.857 0.82