Current LLMs evaluation problems:

There is no unified methodology and standards for independent, expert comparisons of SOTA models.

Previous benchmarks (such as RussianSuperGLUE and TAPE) are becoming outdated; new models operate on instruction data and work with different modalities.

Each model creator evaluates their solution under their own local conditions, metrics, scenarios, and benchmarks. Thus, it’s leading to a lack of result reproducibility.

What does this project offer?

01

A unified platform for model evaluation, comparison, and reflection of their capabilities across domains, tasks, and modalities.

02

Tasks that are challenging even for humans, not only automatic systems, and comparison to human capabilities.

03

Formation of a realistic view of the abilities of AI technologies.

04

An informational portal and platform for research in the field of large language models.

Details of the project

Partners:

SberDevices

SberDevices is a full-cycle company covering everything from ideas to ready-to-use devices. The company possesses extensive expertise in speech technologies, computer vision algorithms, biometrics systems, models for generating media content, and even neurointerfaces. Among its developments is the large language model GigaChat. The SberDevices teams are also among the creators of popular benchmarks such as Russian SuperGLUE, TAPE, and RuCoLa.

Sber AI

Sber AI is a division that specializes in the application of artificial intelligence at Sber, as well as its use in various non-financial areas, such as medicine, management and law. The team's flagship development is the Kandinsky neural network.

MTS AI

MTS AI is one of the leaders in the field of artificial intelligence in Russia. MTS AI's developments find application in various sectors, from banking and telecommunications to medicine, industry, and online cinemas. Colleagues are working on products based on computer vision, natural language processing technologies, and generative AI. Among their projects are TenVision, Audiogram, WordPulse, and many others.

Skoltech (AI Center)

The Skolkovo Institute of Science and Technology was founded in 2011 in collaboration with the Massachusetts Institute of Technology, one of the leading international scientific and technological institutions. The mission of SKOLTECH AI is to create, study, and disseminate transformative artificial intelligence (AI) technologies. Research is conducted to address issues with artificial intelligence, develop advanced computational algorithms and AI technologies, and create prototypes of AI-based products.

HSE University

The National Research University "Higher School of Economics" is a research university fulfilling its mission through scientific, educational, project, expert-analytical, and socio-cultural activities based on international scientific and organizational standards. HSE actively engages in artificial intelligence research, collaborates with international laboratories and industries, and has its own AI Center. It is one of the partners and organizers of the Russian SuperGLUE project.

AIRI

AIRI is the largest autonomous non -profit organization in Russia, which is engaged in fundamental and applied research in the field of artificial intelligence. Airi mission is the creation of universal artificial intelligence systems that solve the tasks of the real world. In the work on their projects, researchers of the Institute seek to get breakthrough results in the field of artificial intelligence and its applications, participating in the formation of a global research agenda.