SimpleAr

Task Description

Simple arithmetic is a mathematical task from BIG-Bench. The task itself tests language models' basic arithmetic capabilities by asking them to perform n-digit addition for a range of n.

Warning: This is a diagnostic dataset with an open test and is not used for general model evaluation on the benchmark.

Keywords: arithmetic, example task, free response, mathematics, numerical response, zero-shot

Motivation

The goal of the task is to analyze the ability of the model to solve simple mathematical addition tasks.

Dataset Description

Data Fields

instruction — is a string containing instructions for the task and information about the requirements for the model output format;
inputs — is the example of arithmetic expression;
outputs — is a string containing the correct answer of summation of two numbers;
meta — is a dictionary containing meta information:
- id — is an integer indicating the index of the example.

Prompts

The number of prompts used for the task is 10. The following prompts for the task are used:

Below is a prompt example:

"Реши математическую задачу на сложение чисел. Выведи ответ в формате \"number\", где number - число, которое является результатом сложения.\nОтвет:".

Dataset Creation

N-digit addition was created for n in the range [1;5] for both train and test sets.

Human Benchmark

The human benchmark is measured on a subset of size 200 (sampled with the same original distribution). The final score for this task is 1.0.