Kimi-VL-A3B-Instruct

MERA Created at 28.03.2026 14:57

Ratings for leaderboard tasks

The table will scroll to the left

Board Result Attempted Score Coverage Place in the rating
Multi 0.037 0.122 0.303 65
Images 0.111 0.122 0.909 43

Tasks

The table will scroll to the left

Task Modality Result Metric
0.269
EM JudgeScore
0.132
EM JudgeScore
0.156
EM JudgeScore
0.05
EM JudgeScore
0.003
EM JudgeScore
0.223
EM JudgeScore
0.076
EM JudgeScore
0.044
EM JudgeScore
culture 0 / 0.073
business 0.002 / 0.11
medicine 0.003 / 0.086
social_sciences 0.007 / 0.136
fundamental_sciences 0 / 0.066
applied_sciences 0 / 0.102
0.098
EM JudgeScore
biology 0.003 / 0.239
chemistry 0.002 / 0.175
physics 0.003 / 0.251
economics 0.01 / 0.195
ru 0.008 / 0.144
all 0 / 0.156
0.168
EM JudgeScore
biology 0 / 0.228
chemistry 0 / 0.299
physics 0.005 / 0.359
science 0 / 0.415

Information about the submission

Mera version
v1.0.0
Torch Version
2.8.0
The version of the codebase
eea0c30
CUDA version
12.8
Precision of the model weights
bfloat16
Seed
1234
Batch
1
Transformers version
4.57.1
The number of GPUs and their type
4 x NVIDIA A100-SXM4-80GB
Architecture
vllm-vlm

Team:

MERA

Name of the ML model:

Kimi-VL-A3B-Instruct

Model size

16.0B

Model type:

Opened

SFT

Additional links:

https://arxiv.org/abs/2504.07491

Architecture description:

The model adopts an MoE language model, a native-resolution visual encoder (MoonViT), and an MLP projector.

License:

MIT License

Inference parameters

Generation Parameters:
labtabvqa - until=["\n\n"];do_sample=false;temperature=0; \nrealvqa - until=["\n\n"];do_sample=false;temperature=0; \nruclevr - until=["\n\n"];do_sample=false;temperature=0; \nrucommonvqa - until=["\n\n"];do_sample=false;temperature=0; \nruhhh_image - until=["\n\n"];do_sample=false;temperature=0; \nrumathvqa - until=["\n\n"];do_sample=false;temperature=0; \nrunaturalsciencevqa_biology - until=["\n\n"];do_sample=false;temperature=0; \nrunaturalsciencevqa_chemistry - until=["\n\n"];do_sample=false;temperature=0; \nrunaturalsciencevqa_earth_science - until=["\n\n"];do_sample=false;temperature=0; \nrunaturalsciencevqa_physics - until=["\n\n"];do_sample=false;temperature=0; \nschoolsciencevqa_biology - until=["\n\n"];do_sample=false;temperature=0; \nschoolsciencevqa_chemistry - until=["\n\n"];do_sample=false;temperature=0; \nschoolsciencevqa_earth_science - until=["\n\n"];do_sample=false;temperature=0; \nschoolsciencevqa_economics - until=["\n\n"];do_sample=false;temperature=0; \nschoolsciencevqa_history_all - until=["\n\n"];do_sample=false;temperature=0; \nschoolsciencevqa_history_ru - until=["\n\n"];do_sample=false;temperature=0; \nschoolsciencevqa_physics - until=["\n\n"];do_sample=false;temperature=0; \nunisciencevqa_applied_sciences - until=["\n\n"];do_sample=false;temperature=0; \nunisciencevqa_business - until=["\n\n"];do_sample=false;temperature=0; \nunisciencevqa_cultural_studies - until=["\n\n"];do_sample=false;temperature=0; \nunisciencevqa_fundamental_sciences - until=["\n\n"];do_sample=false;temperature=0; \nunisciencevqa_health_and_medicine - until=["\n\n"];do_sample=false;temperature=0; \nunisciencevqa_social_sciences - until=["\n\n"];do_sample=false;temperature=0; \nweird - until=["\n\n"];do_sample=false;temperature=0;

The size of the context:
labtabvqa, realvqa, ruclevr, rucommonvqa, ruhhh_image, rumathvqa, runaturalsciencevqa_biology, runaturalsciencevqa_chemistry, runaturalsciencevqa_earth_science, runaturalsciencevqa_physics, schoolsciencevqa_biology, schoolsciencevqa_chemistry, schoolsciencevqa_earth_science, schoolsciencevqa_economics, schoolsciencevqa_history_all, schoolsciencevqa_history_ru, schoolsciencevqa_physics, unisciencevqa_applied_sciences, unisciencevqa_business, unisciencevqa_cultural_studies, unisciencevqa_fundamental_sciences, unisciencevqa_health_and_medicine, unisciencevqa_social_sciences, weird - 4096

Description of the template:
{%- for message in messages -%} \n {%- if loop.first and messages[0]['role'] != 'system' -%} \n {{'<|im_system|>system<|im_middle|>You are a helpful assistant<|im_end|>'}} \n {%- endif -%} \n {%- if message['role'] == 'system' -%} \n {{'<|im_system|>'}} \n {%- endif -%} \n {%- if message['role'] == 'user' -%} \n {{'<|im_user|>'}} \n {%- endif -%} \n {%- if message['role'] == 'assistant' -%} \n {{'<|im_assistant|>'}} \n {%- endif -%} \n {{- message['role'] -}} \n {{'<|im_middle|>'}} \n {%- if message['content'] is string -%} \n {{- message['content'] + '<|im_end|>' -}} \n {%- else -%} \n {%- for content in message['content'] -%} \n {%- if content['type'] == 'image' or 'image' in content or 'image_url' in content -%} \n {{'<|media_start|>image<|media_content|><|media_pad|><|media_end|>'}} \n {%- else -%} \n {{content['text']}} \n {%- endif -%} \n {%- endfor -%} \n {{'<|im_end|>'}} \n {%- endif -%} \n{%- endfor -%} \n{%- if add_generation_prompt -%} \n {{'<|im_assistant|>assistant<|im_middle|>'}} \n{%- endif -%}