Research · Validation Data

NeuroAutomata Validation Results

Published · Updated

Benchmark validation results for the NeuroAutomata scoring engine against ProteinGym assays and additional internal validation runs.

Historical Phase-1 validation — superseded. See methodology for the current verified cohort →

About This Data

NeuroAutomata uses ESM-2 (650M, Meta AI) to score the functional impact of amino acid substitutions. Metric: Spearman rank correlation (ρ) between predicted scores and experimental fitness values. ρ ranges from −1 to 1; higher values indicate better rank-ordering of mutations by effect.

Baseline: Cross-model category medians from the live ProteinGym leaderboard CSV (OATML-Markslab/ProteinGym, accessed 2026-05-08). Original benchmark design: Notin et al. 2023, NeurIPS. The CALM1 result (ρ = 0.212) falls well below ESM-2 650M's own published aggregate of 0.414 — a documented binding-affinity limitation.

Raw per-variant scoring outputs are available on request: team@axonagentic.ai

↓ Download CSV
Protein Assay Dataset ρ ESM-2 Baseline Notes
Beta-lactamase (BLAT) Enzymatic activity
BLAT_ECOLX_Stiffler_2015
Stiffler et al. 2015
5-Protein Benchmark 0.731 0.420
BRCA1 SGE activity
BRCA1_HUMAN_Findlay_2018
Findlay et al. 2018
5-Protein Benchmark 0.515 0.420
UBC9 Expression
UBC9_HUMAN_Weile_2017
Weile et al. 2017
5-Protein Benchmark 0.473 0.418
PTEN Organismal fitness
PTEN_HUMAN_Mighell_2018
Mighell et al. 2018
5-Protein Benchmark 0.519 0.384
Calmodulin (CALM1) Binding
CALM1_HUMAN_Weile_2017
Weile et al. 2017
5-Protein Benchmark 0.212 0.414 Well below ESM-2 650M's own published aggregate — confirms protein-protein binding as a known weak category
Protein G B1 domain (GB1) Fitness (single mutants)
Olson et al. 2014
n = 1,045
Pre-validation 0.276 Pre-benchmark signal check. p < 10⁻¹⁹. Not included in 5-protein median.
CYP2C9 Overall (pharmacogenomic)
Axon Agentic CYP2C9 benchmark
CYP2C9 0.679
CYP2C9 Heme-binding domain
Axon Agentic CYP2C9 benchmark
CYP2C9 0.811
CYP2C9 SRS5 substrate recognition site
Axon Agentic CYP2C9 benchmark
CYP2C9 0.422 ~2× performance gap vs heme-binding domain
5-protein benchmark (aggregate) Median across all 5 assays
Internal validation
Aggregate 0.515 24% above the published ESM-2 650M aggregate of 0.414 (ProteinGym leaderboard CSV, accessed 2026-05-08)
ESM-2 Benchmark Series Curated 20-assay ProteinGym subset
Internal validation
Aggregate 0.487 0.414 ~18% above the published ESM-2 650M aggregate of 0.414

ρ values in amber are below the ESM-2 category baseline for that assay type. Baselines: cross-model category medians from the ProteinGym leaderboard CSV (OATML-Markslab/ProteinGym, accessed 2026-05-08).