Task Description
The Russian Commitment Bank is a corpus of naturally occurring discourses whose final sentence contains a clause-embedding predicate under an entailment canceling operator (question, modal, negation, antecedent of conditional). It was first introduced in the Russian SuperGLUE benchmark [1].
Keywords: Reasoning, Common Sense, Causality, Textual Entailment
Authors: Shavrina Tatiana, Fenogenova Alena, Emelyanov Anton, Shevelev Denis, Artemova Ekaterina, Malykh Valentin, Mikhailov Vladislav, Tikhonova Maria, Evlampiev Andrey
The dataset allows to evaluate how well the models solve a logical text entailment. The dataset is constructed in such a way as to take into account discoursive characteristics. This dataset in the Russian SuperGLUE benchmark is one of the few for which there is still a significant gap between model and human estimates.
Dataset Description
Data Fields
Each dataset sample represents some text situation:
— is an instructional prompt specified for the current task;inputs
— is a dictionary containing the following input information:premise
— is a text situation;hypothesis
— is a text of the hypothesis for which it is necessary to define whether it can be inferred from the hypothesis or not;
— are the results: can be the following string values: 1 — hypothesis follows from the situation, 2 — hypothesis contradicts the situation, or 3 — hypothesis is neutral;meta
— is meta-information about the task:genre
— is where the text was taken from;verb
— is the action by which the texts were selected;negation
— is the flag;id
— is the id of the example from the dataset.
We prepare 10 different prompts of various difficulties for this task.
An example of the prompt is given below:
"Определите отношение приведенной гипотезы к описываемой логической ситуации. Ситуация: \"{premise}\"\nГипотеза: \"{hypothesis}\"\nЕсли гипотеза следует из ситуации, выведите цифру 1, если противоречит – 2, если гипотеза не зависит от ситуации – 3. Больше ничего не добавляйте к ответу."
Dataset creation
The number of sentences for the entire set is 2715, and the total number of tokens is 3.7 · 10^3. The dataset is an instruction-based version of the Russian SuperGLUE benchmark RCB. The set was filtered out of Taiga (news, literature domains) [4] with several rules and the extracted passages were manually post-processed. Final labeling was conducted by three of the authors. The original dataset corresponds to CommitmentBank dataset [2, 3].
Human Benchmark
Human Benchmark was measured on a test set with Yandex.Toloka project with the overlap of 3 reviewers per task.
Accuracy and Average Macro F1 and results are
/ 0.587
, respectively.
[1] Tatiana Shavrina, Alena Fenogenova, Emelyanov Anton, Denis Shevelev, Ekaterina Artemova, Valentin Malykh, Vladislav Mikhailov, Maria Tikhonova, Andrey Chertok, and Andrey Evlampiev. 2020. RussianSuperGLUE: A Russian Language Understanding Evaluation Benchmark. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 4717–4726, Online. Association for Computational Linguistics.
[2] Marie-Catherine de Marneffe, Mandy Simons, and Judith Tonhauser (2019). The CommitmentBank: Investigating projection in naturally occurring discourse. Proceedings of Sinn und Bedeutung 23.
[3] Wang A. et al. Superglue: A stickier benchmark for general-purpose language understanding systems //Advances in Neural Information Processing Systems. – 2019. – С. 3261-3275.
[4] Shavrina, Tatiana, and Olga Shapovalova. "To the methodology of corpus construction for machine learning:“Taiga” syntax tree corpus and parser." Proceedings of “CORPORA-2017” International Conference. 2017.