Go back to the task list

stRuCom

Taxonomies
Instruction Following
Code Perception
Simulation
Documentation
Metric
chrF
Languages
Python
Go
Java
C#
JavaScript

Task description

The dataset contains structured Russian-language docstrings for functions in 5 programming languages (Python, Java, C#, Go, JavaScript). Dataset contains 500 tasks.

Key features:

  • First specialized corpus for Russian-language documentation
  • Combination of real GitHub data (for testing) and synthetic data from Qwen2.5-Coder-32B-Instruct (for training)
  • Strict filtering for completeness and compliance with documentation standards
  • All comments conform to specified formats (Python - GoogleDoc, JavaScript - JSDoc, Java - JavaDoc, C# - XML, Go - GoDoc)

Evaluated skills: Instruction Following, Code Perception, Simulation, Documentation

Contributors: Maria Dziuba, Valentin Malykh

Motivation

Target Models and Limitations

Designed for evaluating models supporting structured documentation generation (DeepSeek-Coder, Qwen2.5-Coder)

Not suitable for:

  • Unstructured comment generation
  • Code summarization
  • Code explanation

Users and Result Interpretation

Primary users:

  • NLP developers and researchers working on automated documentation tools

Results allow to:

  • Assess models' ability to generate technically accurate comments compliant with documentation standards

Metrics:

  • chrF evaluates similarity between generated and reference texts using character n-grams, considering morphology, spelling and grammatical endings - particularly crucial for Russian due to its morphological complexity

Data description

Data fields

Each dataset question includes data in the following fields:

  • instruction [str] — Instruction prompt template with question elements placeholders.
  • inputs — Input data that forms the task for the model. Can include one or multiple modalities - video, audio, image, text.
    • function [str] — The function to generate a structured comment for.
  • outputs [str] — The correct answer to the question.
  • meta — Metadata related to the test example, not used in the question (hidden from the tested model).
    • id [int] — Identification number of the question in the dataset.
    • language [str] — The programming language in which the function is written.

Prompts

For the task, 10 prompts were prepared and evenly distributed among the questions on the principle of "one prompt per question". The templates in curly braces in each prompt are filled in from the fields inside the inputs field in each question.

Prompt example:

"Напиши русскоязычную документацию к функции.

Функция:

{function}"

Dataset creation

Stage 1: Data Collection

  • Crawling Russian-language GitHub repositories with permissive/no licenses, language identification via Lingua
  • Function extraction using function_parser and Code-Text

Stage 2: Synthetic Data

  • Qwen2.5-Coder-32B-Instruct model used for synthetic data generation

Stage 3: Cleaning and Standardization

  • Strict structural filtering (requiring complete coverage of all documented code elements)
  • Style standardization of all comments
  • Length filtering (250-1000 characters)

Metrics

Metrics for aggregated evaluation of responses:

chrF: Metric evaluating character n-gram matches with reference text, suitable for Russian morphology and spelling accuracy

Languages
Python
Go
Java
C#
JavaScript