Qwen2.5-1.5B-Instruct

MERA Created at 04.03.2026 14:13

0.178

The overall result

Ratings for leaderboard tasks

The table will scroll to the left

Task name	Result	Metric
YABLoCo	0.043 / 0.01	EM pass@k
stRuCom	0.16	chrF
RealCode	0.004 / 0.955	pass@k execution_success
UnitTests	0.088	CodeBLEU
ruCodeEval	0.006 / 0.026 / 0.043	pass@k
JavaTestGen	0.044 / 0.273	pass@k compile@1
ruHumanEval	0.007 / 0.024 / 0.037	pass@k
RealCodeJava	0.087 / 0.973	pass@k execution_success
CodeLinterEval	0.403 / 0.566 / 0.6	pass@k
ruCodeReviewer	0.014 / 0.123 / 0 / 0 / 0	chrF BLEU judge@1 judge@5 judge@10
CodeCorrectness	0.837	EM

Information about the submission

Mera version

v1.0.0

Torch Version

2.9.1

The version of the codebase

6aae2a5

CUDA version

12.8

Precision of the model weights

bfloat16

Seed

1234

Batch

Transformers version

4.57.6

The number of GPUs and their type

1 x NVIDIA A100-SXM4-80GB

Architecture

vllm

Team:

MERA

Name of the ML model:

Qwen2.5-1.5B-Instruct

Link to the ML model:

https://huggingface.co/Qwen/Qwen2.5-1.5B-Instruct

Model size

1.5B

Model type:

Opened

SFT

Additional links:

https://qwenlm.github.io/blog/qwen2.5/ https://arxiv.org/pdf/2412.15115

Architecture description:

Qwen 2.5 is the new generation of the QWEN model series. It has significantly more knowledge and has greatly improved capabilities in coding and mathematics

Description of the training:

Qwen 2.5 pre-training process consists of several key components. The authors carefully curate high-quality training data through sophisticated filtering and scoring mechanisms, combined with strategic data mixture. Second, they conduct extensive research on hyperparameter optimization to effectively train models at various scales. Finally, they incorporate specialized long-context pre-training to enhance the model’s ability to process and understand extended sequences. Then SFT is performed as well as multistage reinforcement learning.

Pretrain data:

Pre-training: the high-quality pre-training datasets of 18 trillion tokens, SFT with with over 1 million samples

License:

apache-2.0

Inference parameters

Generation Parameters:
rucodeeval - do_sample=true;temperature=0.6;max_gen_toks=1024;until=["\nclass","\ndef","\n#","\nif","\nprint"]; \ncodelintereval - do_sample=true;temperature=0.6;max_gen_toks=1024;until=["\n\n"]; \nrucodereviewer - temperature=0;do_sample=false;max_gen_toks=1000;until=["\n\n"]; \nruhumaneval - do_sample=true;temperature=0.6;max_gen_toks=1024;until=["\nclass","\ndef","\n#","\nif","\nprint"]; \nstrucom - do_sample=false;max_gen_toks=512;until=["\n\n"]; \nunittests - do_sample=false;max_gen_toks=1024;until=["\n\n"]; \ncodecorrectness - until=["\n\n"];do_sample=false;temperature=0; \nrealcode - do_sample=true;max_gen_toks=4096;temperature=0.7;repetition_penalty=1.05;top_p=0.8;until=["<|endoftext|>","<|im_end|>"]; \nrealcodejava - do_sample=true;max_gen_toks=4096;temperature=0.7;repetition_penalty=1.05;top_p=0.8;until=["<|endoftext|>","<|im_end|>"]; \njavatestgen - do_sample=true;max_gen_toks=4096;temperature=0.2;top_p=0.9;until=["<|endoftext|>","<|im_end|>"]; \nyabloco_oracle - max_gen_toks=2048;do_sample=false;until=["<|endoftext|>","<|im_end|>","\n\n\n","\\sclass\\s","\\sdef\\s","^def\\s","^class\\s","^if\\s","@","^#"];

The size of the context:
rucodeeval, codelintereval, rucodereviewer, ruhumaneval, strucom, unittests, codecorrectness, realcode, realcodejava, javatestgen, yabloco_oracle - 32768

Description of the template:
{%- if tools %} \n {{- '<|im_start|>system\n' }} \n {%- if messages[0]['role'] == 'system' %} \n {{- messages[0]['content'] }} \n {%- else %} \n {{- 'You are Qwen, created by Alibaba Cloud. You are a helpful assistant.' }} \n {%- endif %} \n {{- "\n\n# Tools\n\nYou may call one or more functions to assist with the user query.\n\nYou are provided with function signatures within <tools></tools> XML tags:\n<tools>" }} \n {%- for tool in tools %} \n {{- "\n" }} \n {{- tool | tojson }} \n {%- endfor %} \n {{- "\n</tools>\n\nFor each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:\n<tool_call>\n{\"name\": <function-name>, \"arguments\": <args-json-object>}\n</tool_call><|im_end|>\n" }} \n{%- else %} \n {%- if messages[0]['role'] == 'system' %} \n {{- '<|im_start|>system\n' + messages[0]['content'] + '<|im_end|>\n' }} \n {%- else %} \n {{- '<|im_start|>system\nYou are Qwen, created by Alibaba Cloud. You are a helpful assistant.<|im_end|>\n' }} \n {%- endif %} \n{%- endif %} \n{%- for message in messages %} \n {%- if (message.role == "user") or (message.role == "system" and not loop.first) or (message.role == "assistant" and not message.tool_calls) %} \n {{- '<|im_start|>' + message.role + '\n' + message.content + '<|im_end|>' + '\n' }} \n {%- elif message.role == "assistant" %} \n {{- '<|im_start|>' + message.role }} \n {%- if message.content %} \n {{- '\n' + message.content }} \n {%- endif %} \n {%- for tool_call in message.tool_calls %} \n {%- if tool_call.function is defined %} \n {%- set tool_call = tool_call.function %} \n {%- endif %} \n {{- '\n<tool_call>\n{"name": "' }} \n {{- tool_call.name }} \n {{- '", "arguments": ' }} \n {{- tool_call.arguments | tojson }} \n {{- '}\n</tool_call>' }} \n {%- endfor %} \n {{- '<|im_end|>\n' }} \n {%- elif message.role == "tool" %} \n {%- if (loop.index0 == 0) or (messages[loop.index0 - 1].role != "tool") %} \n {{- '<|im_start|>user' }} \n {%- endif %} \n {{- '\n<tool_response>\n' }} \n {{- message.content }} \n {{- '\n</tool_response>' }} \n {%- if loop.last or (messages[loop.index0 + 1].role != "tool") %} \n {{- '<|im_end|>\n' }} \n {%- endif %} \n {%- endif %} \n{%- endfor %} \n{%- if add_generation_prompt %} \n {{- '<|im_start|>assistant\n' }} \n{%- endif %}

Qwen2.5-1.5B-Instruct

Ratings for leaderboard tasks

Information about the submission

Team:

Name of the ML model:

Link to the ML model:

Model size

Model type:

Additional links:

Architecture description:

Description of the training:

Pretrain data:

License:

Inference parameters

Confirm the deletion of the sub