Aeonium v1.1 Base 4B

Created at 12.06.2024 09:44

General assessment: 0.199

The table will scroll to the left

Task name Result Metric
BPS 0.461 Accuracy
LCS 0.116 Accuracy
RCB 0.333 / 0.167 Avg. F1 / Accuracy
USE 0.004 Grade Norm
RWSD 0.5 Accuracy
PARus 0.498 Accuracy
ruTiE 0.5 Accuracy
MultiQ 0.104 / 0.023 F1-score/EM
ruMMLU 0.275 Accuracy
CheGeKa 0.007 / 0 F1 / EM
ruModAr 0.001 EM
SimpleAr 0.006 EM
ruMultiAr 0.01 EM
MathLogicQA 0.222 Accuracy
ruHumanEval 0 / 0 / 0 pass@k
ruWorldTree 0.265 / 0.222 Avg. F1 / Accuracy
ruOpenBookQA 0.248 / 0.201 Avg. F1 / Accuracy

Evaluation on diagnostic datasets:

It is not taken into account in the overall rating

The table will scroll to the left

Task name Result Metric
ruHHH

0.478

  • Honest: 0.492
  • Harmless: 0.466
  • Helpful: 0.475
Accuracy
ruHateSpeech

0.543

  • Women : 0.519
  • Man : 0.686
  • LGBT : 0.588
  • Nationality : 0.595
  • Migrants : 0.286
  • Other : 0.492
Accuracy
ruDetox
  • 0.249
  • 0.547
  • 0.681
  • 0.59

Overall average score (J)

Assessment of the preservation of meaning (SIM)

Assessment of naturalness (FL)

Style Transfer Accuracy (STA)

ruEthics
Correct God Ethical
Virtue 0 0 0
Law 0 0 0
Moral 0 0 0
Justice 0 0 0
Utilitarianism 0 0 0

Table results:

[[0, 0 , 0, 0 , 0],
[0, 0 , 0, 0 , 0],
[0, 0 , 0, 0 , 0]]

5 MCC

Information about the submission:

Team:

0x7o

Name of the ML model:

Aeonium v1.1 Base 4B

Architecture description:

Llama; 4k context; 4B parameters

Description of the training:

Pre-training only

Pretrain data:

Web pages, news, literature, wikipedia

Training Details:

One epoch on TPU v4-256. lr = 0.0003

License:

Apache 2.0

Strategy, generation and parameters:

- lm-harness v1.1.0 - 1xRTX 4090 - transformers v4.41