GLM-5.1

MERA Создан 14.04.2026 18:09

Оценки по задачам лидерборда

Таблица скроллится влево

Задача	Результат	Место в рейтинге
Сельское хозяйство	0.637	3
Медицина и здравоохранение	0.81	6

ruTXTAgroBench

Метрика: F1, Exact Match

Дисциплина	Результат
Ботаника	0.762
Общая генетика	0.692
Основы селекции	0.73
Растениеводство	0.613
Общее земледелие	0.577
Мелиоративное земледелие	0.593
Семеноводство и семеноведение	0.628
Кормопроизводство и луговодство	0.53
Системы земледелия на различных агроландшафтах	0.634
Технологии возделывания сельскохозяйственных культур	0.69

ruTXTAquaBench

Метрика: F1, Exact Match

Дисциплина	Результат
Индустриальная аквакультура	0.602
Кормление рыбы и других гидробионтов	0.587
Марикультура. Разведение раков, креветок. Искусственное выращивание жемчуга	0.559
Ихтиопатология: ветеринария, профилактика и оптимизация технологий рыборазведения	0.537

ruTXTMedQFundamental

Метрика: F1, Exact Match

Дисциплина	Результат
Анатомия	0.852
Гигиена	0.793
Гистология	0.807
Биофизика	0.781
Биохимия	0.826
Микробиология	0.841
Биология (паразитология)	0.796
Фармакология	0.844
Факультетская хирургия	0.785
Общая хирургия	0.781
Общая химия	0.733
Нормальная физиология	0.83
Биоорганическая химия	0.822
Патологическая анатомия	0.796
Патофизиология	0.819
Клиническая лабораторная диагностика	0.804
Пропедевтика внутренних болезней	0.837

Информация о сабмите

Версия MERA

v1.0.0

Версия Torch

2.10.0

Версия кодовой базы

0ac3a14

Версия CUDA

12.8

Precision весов модели

auto

Сид

1234

Батч

Версия transformers

4.57.6

Количество GPU и их тип

1 x NVIDIA A100-SXM4-80GB

Архитектура

openai-chat-completions

Команда:

MERA

Название ML-модели:

GLM-5.1

Ссылка на ML-модель:

https://openrouter.ai/z-ai/glm-5.1

Размер модели

754.0B

Тип модели:

API

Открытая

SFT

Дополнительные ссылки:

https://arxiv.org/abs/2602.15763

Описание архитектуры:

GLM-5.1 is the next-generation flagship model for agentic engineering, with significantly stronger coding capabilities than its predecessor. It achieves state-of-the-art performance on SWE-Bench Pro and leads GLM-5 by a wide margin on NL2Repo (repo generation) and Terminal-Bench 2.0 (real-world terminal tasks).

Лицензия:

MIT

Параметры инференса

Параметры генерации:
agro_bench - do_sample=false;until=[];max_gen_toks=16384;reasoning={"enabled":true,"max_tokens":10000}; \naqua_bench - do_sample=false;until=[];max_gen_toks=16384;reasoning={"enabled":true,"max_tokens":10000}; \nmed_bench - do_sample=false;until=[];max_gen_toks=16384;reasoning={"enabled":true,"max_tokens":10000};

Описание темплейта:
[gMASK]<sop> {%- if tools -%} {%- macro tool_to_json(tool) -%} {%- set ns_tool = namespace(first=true) -%} {{ '{' -}} {%- for k, v in tool.items() -%} {%- if k != 'defer_loading' and k != 'strict' -%} {%- if not ns_tool.first -%}{{- ', ' -}}{%- endif -%} {%- set ns_tool.first = false -%} "{{ k }}": {{ v | tojson(ensure_ascii=False) }} {%- endif -%} {%- endfor -%} {{- '}' -}} {%- endmacro -%} <|system|> # Tools You may call one or more functions to assist with the user query. You are provided with function signatures within <tools></tools> XML tags: <tools> {% for tool in tools %} {%- if 'function' in tool -%} {%- set tool = tool['function'] -%} {%- endif -%} {% if tool.defer_loading is not defined or not tool.defer_loading %} {{ tool_to_json(tool) }} {% endif %} {% endfor %} </tools> For each function call, output the function name and arguments within the following XML format: <tool_call>{function-name}<arg_key>{arg-key-1}</arg_key><arg_value>{arg-value-1}</arg_value><arg_key>{arg-key-2}</arg_key><arg_value>{arg-value-2}</arg_value>...</tool_call>{%- endif -%} {%- macro visible_text(content) -%} {%- if content is string -%} {{- content }} {%- elif content is iterable and content is not mapping -%} {%- for item in content -%} {%- if item is mapping and item.type == 'text' -%} {{- item.text }} {%- elif item is string -%} {{- item }} {%- endif -%} {%- endfor -%} {%- else -%} {{- content }} {%- endif -%} {%- endmacro -%} {%- set ns = namespace(last_user_index=-1, thinking_indices='') -%} {%- for m in messages %} {%- if m.role == 'user' %} {%- set ns.last_user_index = loop.index0 -%} {%- elif m.role == 'assistant' %} {%- if m.reasoning_content is string %} {%- set ns.thinking_indices = ns.thinking_indices ~ ',' ~ ns.last_user_index ~ ',' -%} {%- endif %} {%- endif %} {%- endfor %} {%- set ns.has_thinking = false -%} {%- for m in messages -%} {%- if m.role == 'user' -%}<|user|>{{ visible_text(m.content) }}{% set ns.has_thinking = (',' ~ loop.index0 ~ ',') in ns.thinking_indices -%} {%- elif m.role == 'assistant' -%} <|assistant|> {%- set content = visible_text(m.content) %} {%- if m.reasoning_content is string %} {%- set reasoning_content = m.reasoning_content %} {%- elif '</think>' in content %} {%- set reasoning_content = content.split('</think>')[0].split('<think>')[-1] %} {%- set content = content.split('</think>')[-1] %} {%- elif loop.index0 > ns.last_user_index and not (enable_thinking is defined and not enable_thinking) %} {%- set reasoning_content = '' %} {%- elif loop.index0 < ns.last_user_index and ns.has_thinking %} {%- set reasoning_content = '' %} {%- endif %} {%- if ((clear_thinking is defined and not clear_thinking) or loop.index0 > ns.last_user_index) and reasoning_content is defined -%} {{ '<think>' + reasoning_content + '</think>'}} {%- else -%} {{ '</think>' }} {%- endif -%} {%- if content.strip() -%} {{ content.strip() }} {%- endif -%} {% if m.tool_calls %} {% for tc in m.tool_calls %} {%- if tc.function %} {%- set tc = tc.function %} {%- endif %} {{- '<tool_call>' + tc.name -}} {% set _args = tc.arguments %}{% for k, v in _args.items() %}<arg_key>{{ k }}</arg_key><arg_value>{{ v | tojson(ensure_ascii=False) if v is not string else v }}</arg_value>{% endfor %}</tool_call>{% endfor %} {% endif %} {%- elif m.role == 'tool' -%} {%- if loop.first or (messages[loop.index0 - 1].role != "tool") %} {{- '<|observation|>' -}} {%- endif %} {%- if m.content is string -%} {{- '<tool_response>' + m.content + '</tool_response>' -}} {%- else -%} {{- '<tool_response><tools>\n' -}} {% for tr in m.content %} {%- for tool in tools -%} {%- if 'function' in tool -%} {%- set tool = tool['function'] -%} {%- endif -%} {%- if tool.name == tr.name -%} {{- tool_to_json(tool) + '\n' -}} {%- endif -%} {%- endfor -%} {%- endfor -%} {{- '</tools></tool_response>' -}} {% endif -%} {%- elif m.role == 'system' -%} <|system|>{{ visible_text(m.content) }} {%- endif -%} {%- endfor -%} {%- if add_generation_prompt -%} <|assistant|>{{- '</think>' if (enable_thinking is defined and not enable_thinking) else '<think>' -}} {%- endif -%}

GLM-5.1

Оценки по задачам лидерборда

ruTXTAgroBench

ruTXTAquaBench

ruTXTMedQFundamental

Информация о сабмите

Команда:

Название ML-модели:

Ссылка на ML-модель:

Размер модели

Тип модели:

Дополнительные ссылки:

Описание архитектуры:

Лицензия:

Параметры инференса

Подтвердите удаление сабмита