The AI Alliance Russia launches MERA Industrial: A New Standard for Assessing Industry LLMs to Solve Business Problems

The AI Alliance Russia has announced the launch of a new MERA section, MERA Industrial, a unique benchmark for assessing large language models (LLMs) in various industries. The platform already has benchmarks for medicine and agriculture, which will help companies and experts select and implement LLMs that best suit their business needs.
There are currently three tasks posted on the site, two of which are on agriculture and one on medicine:
ruTXTAgroBench: a dataset designed to measure the professional knowledge of the model acquired during pre-training in the field of agronomy. Consists of 2935 original questions on agronomy, covering botany, forage production and grassland farming, melioration agriculture, general genetics, general agriculture, basics of selection, plant growing, seed production and seed science, farming systems in various agricultural landscapes, and crop cultivation technologies.
ruTXTAquaBench: a dataset designed to measure the professional knowledge of the model acquired during pre-training in the field of aquaculture. Consists of 1102 tasks on aquaculture, including industrial aquaculture, feeding of fish and aquatic organisms, mariculture (e.g. breeding crayfish, shrimp, pearl farming), and ichthyopathology (veterinary science, prevention and optimization of fish farming technologies).
ruTXTMedQFundamental: a dataset covering 17 fundamental medical disciplines from cell biology to clinical practices (surgery, therapy, laboratory diagnostics, pharmacology). The test includes 270 questions and 30 training tasks for each discipline, which allows you to compare the level of knowledge of the models with the level of a medical university graduate.
The datasets are completely original and compiled in Russian.
The MERA Industrial benchmark was created with the support of the academic community, in particular, the Skolkovo Institute of Science and Technology, Kuban State Agrarian University, Almazov National Medical Research Centre, RANEPA, Nizhny Novgorod State University of Architecture and Civil Engineering and others took part in the project. Leading experts carefully formulate tasks to ensure:
• Reliability of information based on verified sources
• Full coverage of industry taxonomy
• Diversity of complexity and types of tasks (from academic to practical cases)
• Originality of wording and absence of Internet borrowings.
MERA Industrial is not only a tool for assessing large language models, but also a platform for formulating new tasks and cases, validating tasks, using ready-made benchmarks for selecting and implementing LLM in business processes.
The MERA benchmark, created with the participation of teams from Sberbank, MTS AI, Skoltech AI and the National Research University Higher School of Economics, was presented at the AI Journey international conference in 2023. Subsequently, the test methodology was also presented at ACL, the leading academic conference on computational linguistics, which has been held since 1963 and is supported by major IT companies from around the world, including Apple, Google Deep Mind, Baidu, IBM and others. Last year, the benchmark for Russian-speaking LLMs became even better: new datasets, support for APIs and features of SFT models, and an updated leaderboard with a convenient system for filtering results were added.