MERA is a new open independent benchmark for the evaluation of fundamental models for the Russian language.

ruTXTAgroBench is a dataset designed to measure the professional knowledge of a model acquired during pre-training in the field of agronomy.

Agronomy is in the core of agricultural production, studying various aspects of crop cultivation and developing methods for protecting agriculture from adverse natural factors. Agronomy is linked to the farming efficiency, environmental protection, and sustainable use of land resources.

The dataset is created in Russian and is entirely original. The benchmark consists of 2,935 multiple-choice questions with one or several correct answers. Each question may contain from four to eight options. The questions cover various topics (disciplines), such as botany, forage production and grassland management, reclamation farming, general genetics, general agriculture, basics of breeding, crop production, seed breeding and seed science, farming systems in different agro-landscapes, and technologies for cultivating agricultural crops.

Keywords: Agriculture, Agricultural industry, Farming, Agronomy, Botany, General Agriculture, Crop Production, General Genetics, Fundamentals of Breeding, Seed Production and Seed Science, Forage Production and Meadow Management, Reclamation Agriculture, Technologies for Cultivating Agricultural Crops, Agricultural Systems in Various Agro-Landscapes

Authors: Kuban State Agrarian University

Motivation

This task is one of eight benchmarks in the agriculture set, which is intended to assess professional knowledge in the field of agronomy. It resembles the well-known MMLU test in its structure and purpose, and is suitable for comprehensive testing of language models for the professional quality of understanding and responses. We provide a public MMLU test version of AgroBench in Russian to assess capabilities of our model on real professional tasks.

Data description

Data fields

instruction — a string containing the instruction for the task;
inputs — a dict with the input data:
- question — a string with the task question;
- option_a — answer option A;
- option_b — answer option B;
- option_c — answer option C;
- option_d — answer option D;
- option_e — answer option E;
- option_f — answer option F;
- option_g — answer option G;
- option_h — answer option H;
outputs — a string containing the right answer for the task (one or more letters (A-H) separated with comma and written in alphabetic order);
meta — a dict with task meta information:
- id — an integer, the task's unique number in dataset;
- domain — a string with the task's domain name.

Prompts

10 promptes of varying complexity were prepared for the dataset.

Example:

"Subject: {domain}. Question: {question}\n\nAnswer options:\nA. {option_a}\nB. {option_b}\nC. {option_c}\nD. {option_d}\ n\nOutput format requirement: only the letter or letters corresponding to the correct answers; for multiple answers, alphabetical sorting, separator “, ” (e.g., “A, B, C”)."

Dataset Creation

All tasks in this set were written by top agronomists, professionally edited, and then manually double-checked by 3 different experts.

Metric

Quality metrics: Exact Match and F1.