Benchmark for Multimodal LLM

Multimodal evaluation

The first open multimodal benchmark for the Russian language, created by experts with consideration for the cultural specifics of the Russian Federation, recognized by the community as a national standard

Images

Does the model see as we do?

We check how well AI understands visual context, can recognize objects, interpret scenes, and match them with text in Russian. This is important for generation, search, and security status in applied applications of multimodal models

Audio

Can the model hear speech nuances?

We check speech perception, intonation, commands, and audio context in Russian. Relevant for voice assistants and models operating in noisy environments

Video

Can the model recognize and interpret temporal dynamics?

We evaluate how AI works with dynamics, actions, context, and cause-and-effect relationships in video. This is the basis for complex assistants, agent systems, and multimodal search

Multimodality

Does the model tie everything together?

Scenarios where text, images, audio, and video are intertwined. This is the pinnacle of AI — not just recognizing, but understanding in context, modeling, and perceiving the world like a human being

Why is this
important now?

The new reality

AI is rapidly penetrating everyday life: from searching and generating content to diagnostics, education, and decision-making.

The danger of illusions

But without honest testing, we don't know what exactly the model «understands», and we may overestimate its capabilities. This is especially true in the context of the Russian language and cultural realities.

Our response

We are creating a standard to measure progress and develop AI responsibly

What we suggest

Quantitative metrics and qualitative analysis, fixed launch parameters
and a unified prompt methodology for transparent and detailed evaluation

Independent leaderboard —
comparing the best AI models on equal terms

Follow the progress of frontier models and submit your own:

  • Honest comparison of omni- and multimodal models in one place
  • Accurate identification of strengths and weaknesses: by modalities, task types, and skills
  • Useful for researchers, ML engineers, and teams selecting a model for production

Your model could be next in the top rankings

Personal account —
manage submissions easily

Everything you need at your fingertips:

  • Instant registration and quick start
  • Track all your submissions and progress
  • Detailed reports on tasks and modalities — from high-level overviews to in-depth analytics

Control and transparency at every stage

Catalog of multimodal tasks — from simple to real challenges

The most relevant and challenging tests:

  • Audio, images, video, and combinations of them
  • Scenarios that require real «intelligence» rather than tricks
  • Suitable for both stress testing models and fine-tuning

Test what your model is capable of in real-world conditions

Open methodology — no magic, just science

We explain exactly how everything works:

  • A transparent approach to creating tasks and selecting metrics
  • Taxonomy of cognitive and multimodal abilities
  • The ability to verify, repeat, and improve

Trust is built on openness — it's at the core of what we do

Bringing together leaders for the future of technology

The Alliance in the field of artificial intelligence is a unique organization created to unite the efforts of leading technology companies, researchers and experts. Our mission is accelerated development and implementation of artificial intelligence in key areas: education, science and business.

Learn more about the Alliance
Honesty, security, and transparency

When developing the multimodal benchmark, we did everything possible to ensure that the content was licensed, protected, and not used for harmful purposes

License for multimodal content

We have developed a special license that prohibits the use of test data for training and commercial purposes. Multimodal content is made exclusively for testing models

Watermarks on media content

All images and audio are marked with visible and invisible watermarks. This protects against leaks and makes it clear to automated crawlers that this is not training data

Verification tools

We offer tools for detecting data leakage and contamination. Want to check if your AI has seen this data before? We can help you make sure

Why is this important?

Benchmarks are not just «tests». They represent the trust of the community, the basis for scientific comparisons, and a benchmark for the entire industry. We make sure that this trust is well deserved.

24 Sep 2025

AI Alliance Launches Dynamic SWE-MERA Benchmark for Evaluating Code Models

The AI Alliance's benchmark lineup has been expanded with a new tool — the dynamic benchmark SWE-MERA, designed for comprehensive evaluation of coding models on tasks close to real development conditions. SWE-MERA was created as a result of collaboration among leading Russian AI teams: MWS AI (part of MTS Web Services), Sber, and ITMO University.

18 Jul 2025

The AI Alliance Russia launches MERA Code — the first open benchmark for evaluating code generation across Tasks

The AI Alliance Russia launches MERA Code: A Unified Framework for Evaluating Code Generation Across Tasks

04 Jun 2025

The AI Alliance Russia launches MERA Industrial: A New Standard for Assessing Industry LLMs to Solve Business Problems

The AI Alliance Russia has announced the launch of a new MERA section, MERA Industrial, a unique benchmark for assessing large language models (LLMs) in various industries.