Task Description

The longest common subsequence is an algorithmic task from BIG-Bench. This problem consists of pairs of strings as input, and language models are expected to predict the length of the longest common subsequence between them correctly.

LCS is a prototypical dynamic programming problem and this task measures the model's ability to capture that approach.


Recently, large language models have started to do well on simple algorithmic tasks like few-shot arithmetic, so we want to extend this evaluation to more complicated algorithms.

Dataset Description

Data Fields

  • instruction — is a string containing instructions for the task and information about the requirements for the model output format;
  • inputs — is an example of two sequences to be compared;
  • outputs — is a string containing the correct answer, the length of the longest common subsequence;
  • meta — is a dictionary containing meta information:
    • id — is an integer indicating the index of the example.

Data Instances

Below is an example from the dataset:

    "instruction": "Даны две строки: \\\\"{inputs}\\\\"\\\\nОпределите длину их самой длинной общей подпоследовательности.",
    "outputs": "5",
    "meta": {
        "id": 186

Data Splits

The public test (train split) includes 320 examples, and the closed test (test split) set includes 500 examples.


6 prompts of varying difficulty were created for this task. Example:

"Для двух строк: \\\\"{inputs}\\\\" найдите длину наибольшей общей подпоследовательности. Пересекающиеся символы должны идти в том же порядке, но могут быть разделены другими символами.".

Dataset Creation

Sequences of length in the range [4; 32) were generated with a Python script for  the closed test set. For the open tast set the examples ware taken from the corresponding BIG-bench task.

For the open public test set we use the same seed for generation as in the Big-Bench.



The task is evaluated using Accuracy.

Human Benchmark

The human benchmark is measured on a subset of size 100 (sampled with the same original distribution). The accuracy for this task is 0.56.