Task Description
Large language models handle routine tasks more effectively than ever before, but in order to provide high-quality answers to highly specialized questions, they need to delve deeply into the essence of specific fields. In this test, we are making such a “step deeper” into medicine, bringing the model’s knowledge closer to that of a general practitioner who has recently graduated from university. This test covers the following fundamental medical sciences: in-depth knowledge of how the human body functions at every level — from the cell (Biology, Biophysics, Biochemistry) to organ systems (Anatomy, Physiology, Pathological disciplines) — as well as skills in the main areas of medicine, such as surgery, therapy, hygiene, laboratory diagnostics, and pharmacology. Fundamental sciences are the necessary basis upon which clinical specialities are built, so this set of knowledge is not only possessed by every graduate of the “General Medicine” specialty, but by any specialist in the medical field in general. Without this knowledge, a language model will not be able to provide a detailed and accurate answer to a medical question, nor will it be able to explain the significance of a pathology or justify the importance of following the instructions for a medicinal product.
The test includes 17 fundamental medical sciences, each of which contains 270 tests and 30 thematic training tasks. Each question has four possible answers, with only one of them being correct.
Keywords: Medicine, fundamental medicine, Anatomy, Biochemistry, Bioorganic Chemistry, Biophysics, Clinical Laboratory Diagnostics, Faculty Surgery, General Chemistry, General Surgery, Histology, Hygiene, Microbiology, Normal Physiology, Parasitology, Pathological Anatomy, Pathophysiology, Pharmacology, Propaedeutics of Internal Diseases
Authors: Almazov National Medical Research Center of the Ministry of Health of the Russian Federation
Motivation
This task is one of six benchmarks in the medicine and healthcare set, which is intended to assess professional knowledge in the field of fundamental medical sciences. It resembles the well-known MMLU test in its structure and purpose, and is suitable for comprehensive testing of language models for the professional quality of understanding and responses. We provide a public MMLU test version of the medical benchmark in Russian to assess capabilities of our model on real professional tasks.
Dataset Description
Data Fields
instruction
— a string containing the instructions for the task and information about the required format for the model's output;inputs
— a dictionary containing the following information:text
— the test question;optiona
— answer option A;optionb
— answer option B;optionc
— answer option C;optiond
— answer option D.
subject
— the topic of the question (a generalization of a group of subdomains by meaning);outputs
— the result: it can be one of the following string variables: "A", "B", "C", "D";meta
— a dictionary containing meta information;id
— an integer specifying the example index;domain
— the question's subdomain.
Prompts
10 prompts of varying difficulty were created for this task. Example
Example:
"Ниже приведены вопросы с множественным выбором (с ответами) по теме {subset}. Напиши только букву\/буквы ответа."
Dataset Creation
All tasks in this set were written by top experts (practicing physicians and medical researchers), professionally edited, and then manually double-checked by 3 different experts.
Metrics
Accuracy and Exact Match are used as the evaluation metrics.