MERA is a new open independent benchmark for the evaluation of fundamental models for the Russian language.

ruTXTAquaBench is a dataset designed to assess the professional knowledge of a model during pre-training in the field of Aquaculture.

Aquaculture is an important part of industrial agriculture, which is focused on aquatic breeding (fish, crustaceans, mollusks, algae). Aquacultural enterprises produce a valuable source of protein and help to preserve endangered species, such as sturgeon and salmon, by releasing fry into water bodies. It is strategically important to develop aquaculture for national food security and cultivate various aquatic species that cannot be harvested in the wild.

The dataset is created in Russian and is entirely original. It contains 1102 multiple-choice questions. Each question has from four to eight options, and one or several answers are correct. The topics cover several areas, such as industrial aquaculture, feeding of fish and aquatic organisms, mariculture (e.g. crayfish and shrimp breeding, pearl cultivation), as well as ichthyopathology (veterinary science, prevention and optimization of fish cultivation technologies).

Keywords: Agriculture, Agricultural Industry, Fishery, Industrial Aquaculture, Feeding of Fish and Other Aquatic Organisms, Mariculture, Crayfish and Shrimp Farming, Artificial Pearl Cultivation, Ichthyopathology.

Authors: Kuban State Agrarian University

Motivation

This task is one of eight benchmarks in the agriculture set, which is intended to assess professional knowledge in the field of aquaculture. It resembles the well-known MMLU test in its structure and purpose, and is suitable for comprehensive testing of language models for the professional quality of understanding and responses. We provide a public MMLU test version of AquaBench in Russian to assess capabilities of our model on real professional tasks.

Data description

Data fields

instruction — a string containing the instruction for the task;
inputs — a dict with the input data:
- question — a string with the task question;
- option_a — answer option A;
- option_b — answer option B;
- option_c — answer option C;
- option_d — answer option D;
- option_e — answer option E;
- option_f — answer option F;
- option_g — answer option G;
- option_h — answer option H;
outputs — a string containing the right answer for the task (one or more letters (A-H) separated with comma and written in alphabetic order);
meta — a dict with task meta information:
- id — an integer, the task's unique number in dataset;
- domain — a string with the task's domain name.

Prompts

10 promptes of varying complexity were prepared for the dataset.

Example:

"Select the correct answer options on the topic “{domain}” for the question:\n{question}\n\nA. {option_a}\nB. {option_b}\nC. {option_c}\nD. {option_d}\n\nAnswer: letters only. Multiple answers should be listed in alphabetical order, separated by commas and spaces (“A, B, C”)."

Dataset Creation

All tasks in this set were written by top aquaculturists, professionally edited, and then manually double-checked by 3 different experts.

Metric

Quality metrics: Exact Match and F1.