CheGeKa

Task Description

CheGeKa is a Jeopardy!-like the Russian QA dataset collected from the official Russian quiz database ChGK and belongs to the open-domain question-answering group of tasks. The dataset was created based on the corresponding dataset from the TAPE benchmark [1].

Keywords: Reasoning, World Knowledge, Logic, Question-Answering, Open-Domain QA

Authors: Ekaterina Taktasheva, Tatiana Shavrina, Alena Fenogenova, Denis Shevelev, Nadezhda Katricheva, Maria Tikhonova, Albina Akhmetgareeva, Oleg Zinkevich, Anastasiia Bashmakova, Svetlana Iordanskaia, Alena Spiridonova, Valentina Kurenshchikova, Ekaterina Artemova, Vladislav Mikhailov

Motivation

The task can be considered the most challenging in terms of reasoning, knowledge, and logic, as the task implies the QA pairs with a free response form (no answer choices); however, a long chain of causal relationships between facts and associations forms the correct answer.

Dataset Description

Data Fields

meta is a dictionary containing meta-information about the example:
- id is the task ID;
- author is the author of the question;
- tour name is the name of the game in which the question was used;
- tour_link is a link to the game in which the question was used (None for the test set);
instruction is an instructional prompt specified for the current task;
inputs is a dictionary containing the following input information:
- text is a text fragment with a question from the game “What? Where? When?";
- topic is a string containing the category of the question;
outputs is a string containing the correct answer to the question.

Prompts

We use 10 different prompts written in natural language for this task. An example of the prompt is given below:

"Вы участвуете в викторине “Что? Где? Когда?”. Категория вопроса: {topic}\nВнимательно прочитайте и ответьте на него только словом или фразой. Вопрос: {text}\nОтвет:"

Dataset Creation

The dataset was created using the corresponding dataset from the TAPE benchmark [1], which is, in turn, based on the original corpus of the CheGeKa game introduced in [2].

Human Benchmark

Human Benchmark was measured on a test set with Yandex.Toloka project with the overlap of 3 reviewers per task.

The F1-score / Exact Match results are 0.719 / 0.645, respectively.

References

[1] Taktasheva, Ekaterina, et al. "TAPE: Assessing Few-shot Russian Language Understanding." Findings of the Association for Computational Linguistics: EMNLP 2022. 2022.

[2] Mikhalkova, Elena, and Alexander A. Khlyupin. "Russian Jeopardy! Data Set for Question-Answering Systems." Proceedings of the Thirteenth Language Resources and Evaluation Conference. 2022.