The table will scroll to the left
| Task name | Result | Metric |
|---|---|---|
| LCS | 0.058 | Accuracy |
| RCB | 0.505 / 0.318 | Accuracy F1 macro |
| USE | 0.192 | Grade norm |
| RWSD | 0.338 | Accuracy |
| PARus | 0.892 | Accuracy |
| ruTiE | 0.75 | Accuracy |
| MultiQ | 0.506 / 0.351 | F1 Exact match |
| CheGeKa | 0.298 / 0.24 | F1 Exact match |
| ruModAr | 0.354 | Exact match |
| MaMuRAMu | 0.712 | Accuracy |
| ruMultiAr | 0.271 | Exact match |
| ruCodeEval | 0.023 / 0.035 / 0.043 | Pass@k |
| MathLogicQA | 0.379 | Accuracy |
| ruWorldTree | 0.92 / 0.92 | Accuracy F1 macro |
| ruOpenBookQA | 0.815 / 0.814 | Accuracy F1 macro |
The table will scroll to the left
| Task name | Result | Metric | ||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| BPS | 0.921 | Accuracy | ||||||||||||||||||||||||
| ruMMLU | 0.588 | Accuracy | ||||||||||||||||||||||||
| SimpleAr | 0.994 | Exact match | ||||||||||||||||||||||||
| ruHumanEval | 0.042 / 0.059 / 0.061 | Pass@k | ||||||||||||||||||||||||
| ruHHH | 0.697 | |||||||||||||||||||||||||
| ruHateSpeech | 0.781 | |||||||||||||||||||||||||
| ruDetox | 0.301 | |||||||||||||||||||||||||
| ruEthics |
|
| Model, team | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 | 22 | 23 | 24 | 25 | 26 | 8_0 | 8_1 | 8_2 | 8_3 | 8_4 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
Command R
llmarena.ru |
0.433 | 0.133 | 0.733 | 0.2 | 0.1 | 0.4 | 0.067 | - | 0 | 0 | 0.033 | 0 | 0.133 | 0 | 0.1 | 0.35 | 0.033 | 0.033 | 0 | 0.033 | 0.067 | 0.633 | 0.133 | 0.1 | 0.167 | 0.242 | 0.067 | 0.3 | 0.3 | 0.333 | 0.333 |
| Model, team | Honest | Helpful | Harmless |
|---|---|---|---|
|
Command R
llmarena.ru |
0.557 | 0.797 | 0.741 |
| Model, team | Anatomy | Virology | Astronomy | Marketing | Nutrition | Sociology | Management | Philosophy | Prehistory | Human aging | Econometrics | Formal logic | Global facts | Jurisprudence | Miscellaneous | Moral disputes | Business ethics | Biology (college) | Physics (college) | Human Sexuality | Moral scenarios | World religions | Abstract algebra | Medicine (college) | Machine learning | Medical genetics | Professional law | PR | Security studies | Chemistry (школьная) | Computer security | International law | Logical fallacies | Politics | Clinical knowledge | Conceptual_physics | Math (college) | Biology (high school) | Physics (high school) | Chemistry (high school) | Geography (high school) | Professional medicine | Electrical engineering | Elementary mathematics | Psychology (high school) | Statistics (high school) | History (high school) | Math (high school) | Professional accounting | Professional psychology | Computer science (college) | World history (high school) | Macroeconomics | Microeconomics | Computer science (high school) | European history | Government and politics |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
Command R
llmarena.ru |
0.57 | 0.506 | 0.743 | 0.782 | 0.67 | 0.751 | 0.767 | 0.64 | 0.688 | 0.655 | 0.395 | 0.492 | 0.48 | 0.676 | 0.782 | 0.636 | 0.62 | 0.701 | 0.411 | 0.756 | 0.226 | 0.801 | 0.33 | 0.607 | 0.446 | 0.69 | 0.449 | 0.648 | 0.616 | 0.41 | 0.68 | 0.752 | 0.564 | 0.798 | 0.608 | 0.56 | 0.35 | 0.787 | 0.371 | 0.552 | 0.722 | 0.599 | 0.552 | 0.448 | 0.798 | 0.449 | 0.789 | 0.356 | 0.447 | 0.605 | 0.42 | 0.781 | 0.621 | 0.639 | 0.76 | 0.758 | 0.813 |
| Model, team | SIM | FL | STA |
|---|---|---|---|
|
Command R
llmarena.ru |
0.727 | 0.754 | 0.594 |
| Model, team | Anatomy | Virology | Astronomy | Marketing | Nutrition | Sociology | Managment | Philosophy | Pre-History | Gerontology | Econometrics | Formal logic | Global facts | Jurisprudence | Miscellaneous | Moral disputes | Business ethics | Bilology (college) | Physics (college) | Human sexuality | Moral scenarios | World religions | Abstract algebra | Medicine (college) | Machine Learning | Genetics | Professional law | PR | Security | Chemistry (college) | Computer security | International law | Logical fallacies | Politics | Clinical knowledge | Conceptual physics | Math (college) | Biology (high school) | Physics (high school) | Chemistry (high school) | Geography (high school) | Professional medicine | Electrical Engineering | Elementary mathematics | Psychology (high school) | Statistics (high school) | History (high school) | Math (high school) | Professional Accounting | Professional psychology | Computer science (college) | World history (high school) | Macroeconomics | Microeconomics | Computer science (high school) | Europe History | Government and politics |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
Command R
llmarena.ru |
0.556 | 0.842 | 0.65 | 0.648 | 0.711 | 0.759 | 0.621 | 0.702 | 0.769 | 0.692 | 0.718 | 0.65 | 0.5 | 0.736 | 0.702 | 0.667 | 0.654 | 0.711 | 0.544 | 0.789 | 0.579 | 0.831 | 0.711 | 0.722 | 0.711 | 0.758 | 0.679 | 0.596 | 0.895 | 0.733 | 0.844 | 0.808 | 0.661 | 0.86 | 0.652 | 0.714 | 0.667 | 0.733 | 0.509 | 0.631 | 0.812 | 0.794 | 0.778 | 0.667 | 0.862 | 0.778 | 0.914 | 0.614 | 0.738 | 0.912 | 0.756 | 0.725 | 0.684 | 0.623 | 0.488 | 0.702 | 0.8 |
| Model, team | Virtue | Law | Moral | Justice | Utilitarianism |
|---|---|---|---|---|---|
|
Command R
llmarena.ru |
0.429 | 0.441 | 0.464 | 0.412 | 0.371 |
| Model, team | Virtue | Law | Moral | Justice | Utilitarianism |
|---|---|---|---|---|---|
|
Command R
llmarena.ru |
0.394 | 0.382 | 0.429 | 0.361 | 0.375 |
| Model, team | Virtue | Law | Moral | Justice | Utilitarianism |
|---|---|---|---|---|---|
|
Command R
llmarena.ru |
0.428 | 0.417 | 0.442 | 0.372 | 0.35 |
| Model, team | Women | Men | LGBT | Nationalities | Migrants | Other |
|---|---|---|---|---|---|---|
|
Command R
llmarena.ru |
0.806 | 0.657 | 0.824 | 0.784 | 0.714 | 0.803 |