Go back to the task list

ruTXTMedQFundamental

Type of task
Reasoning
Output format
Choosing an answer
Metric
Accuracy
Exact Match
Domains
Medicine and Healthcare
Statistics
dev: 510
test: 4590

Task Description

Large language models handle routine tasks more effectively than ever before, but in order to provide high-quality answers to highly specialized questions, they need to delve deeply into the essence of specific fields. In this test, we are making such a “step deeper” into medicine, bringing the model’s knowledge closer to that of a general practitioner who has recently graduated from university. This test covers the following fundamental medical sciences: in-depth knowledge of how the human body functions at every level — from the cell (Biology, Biophysics, Biochemistry) to organ systems (Anatomy, Physiology, Pathological disciplines) — as well as skills in the main areas of medicine, such as surgery, therapy, hygiene, laboratory diagnostics, and pharmacology. Fundamental sciences are the necessary basis upon which clinical specialities are built, so this set of knowledge is not only possessed by every graduate of the “General Medicine” specialty, but by any specialist in the medical field in general. Without this knowledge, a language model will not be able to provide a detailed and accurate answer to a medical question, nor will it be able to explain the significance of a pathology or justify the importance of following the instructions for a medicinal product.

The test includes 17 fundamental medical sciences, each of which contains 270 tests and 30 thematic training tasks. Each question has four possible answers, with only one of them being correct.

Keywords: Medicine, fundamental medicine, Anatomy, Biochemistry, Bioorganic Chemistry, Biophysics, Clinical Laboratory Diagnostics, Faculty Surgery, General Chemistry, General Surgery, Histology, Hygiene, Microbiology, Normal Physiology, Parasitology, Pathological Anatomy, Pathophysiology, Pharmacology, Propaedeutics of Internal Diseases

Authors: Almazov National Medical Research Center of the Ministry of Health of the Russian Federation

Motivation

This task is one of six benchmarks in the medicine and healthcare set, which is intended to assess professional knowledge in the field of fundamental medical sciences. It resembles the well-known MMLU test in its structure and purpose, and is suitable for comprehensive testing of language models for the professional quality of understanding and responses. We provide a public MMLU test version of the medical benchmark in Russian to assess capabilities of our model on real professional tasks.

Dataset Description

Data Fields

  • instruction — a string containing the instructions for the task and information about the required format for the model's output;
  • inputs — a dictionary containing the following information:
    • text — the test question;
    • optiona — answer option A;
    • optionb — answer option B;
    • optionc — answer option C;
    • optiond — answer option D.
  • subject — the topic of the question (a generalization of a group of subdomains by meaning);
  • outputs — the result: it can be one of the following string variables: "A", "B", "C", "D";
  • meta — a dictionary containing meta information;
  • id — an integer specifying the example index;
  • domain — the question's subdomain.

Prompts

10 prompts of varying difficulty were created for this task. Example

Example:

"Ниже приведены вопросы с множественным выбором (с ответами) по теме {subset}. Напиши только букву\/буквы ответа."

Dataset Creation

All tasks in this set were written by top experts (practicing physicians and medical researchers), professionally edited, and then manually double-checked by 3 different experts.

Metrics

Accuracy and Exact Match are used as the evaluation metrics.

Domains
Medicine and Healthcare
Statistics
dev: 510
test: 4590