SimpleAr
Task Description
Simple arithmetic is a mathematical task from BIG-Bench. The task itself tests language models' basic arithmetic capabilities by asking them to perform n-digit addition for a range of n.
Warning: This is a diagnostic dataset with an open test and is not used for general model evaluation on the benchmark.
Keywords: arithmetic, example task, free response, mathematics, numerical response, zero-shot
Motivation
The goal of the task is to analyze the ability of the model to solve simple mathematical addition tasks.
Dataset Description
Data Fields
instruction
— is a string containing instructions for the task and information about the requirements for the model output format;inputs
— is the example of arithmetic expression;outputs
— is a string containing the correct answer of summation of two numbers;meta
— is a dictionary containing meta information:id
— is an integer indicating the index of the example.
Data Instances
Below is an example from the dataset:
{
"instruction": "Напишите ответ для математического выражения.\n{inputs}",
"inputs": "663 + 806 = ",
"outputs": "1469",
"meta": {
"id": 412
}
}
Data Splits
The train set consists of 1000
examples of arithmetic expressions. The test set consists of 1000
examples of arithmetic expressions.
Prompts
The number of prompts used for the task is 10. The following prompts for the task are used:
Below is a prompt example:
"Реши математическую задачу на сложение чисел. Выведи ответ в формате \"number\", где number - число, которое является результатом сложения.\nОтвет:"
.
Dataset Creation
N-digit addition was created for n in the range [1;5] for both train and test sets.
Evaluation
Metrics
Exact Match (EM) is used for evaluation.
Human Benchmark
The human benchmark is measured on a subset of size 200
(sampled with the same original distribution). The final score for this task is 1.0
.