Research · Validation Data
NeuroAutomata Validation Results
Benchmark validation results for the NeuroAutomata scoring engine against ProteinGym assays and additional internal validation runs.
About This Data
NeuroAutomata uses ESM-2 (650M, Meta AI) to score the functional impact of amino acid substitutions. Metric: Spearman rank correlation (ρ) between predicted scores and experimental fitness values. ρ ranges from −1 to 1; higher values indicate better rank-ordering of mutations by effect.
Baseline: Cross-model category medians from the live ProteinGym leaderboard CSV (OATML-Markslab/ProteinGym, accessed 2026-05-08). Original benchmark design: Notin et al. 2023, NeurIPS. The CALM1 result (ρ = 0.212) falls well below ESM-2 650M's own published aggregate of 0.414 — a documented binding-affinity limitation.
Raw per-variant scoring outputs are available on request: team@axonagentic.ai
| Protein | Assay | Dataset | ρ | ESM-2 Baseline | Notes |
|---|---|---|---|---|---|
| Beta-lactamase (BLAT) | Enzymatic activity | 5-Protein Benchmark | 0.731 | 0.420 | |
| BRCA1 | SGE activity | 5-Protein Benchmark | 0.515 | 0.420 | |
| UBC9 | Expression | 5-Protein Benchmark | 0.473 | 0.418 | |
| PTEN | Organismal fitness | 5-Protein Benchmark | 0.519 | 0.384 | |
| Calmodulin (CALM1) | Binding | 5-Protein Benchmark | 0.212 | 0.414 | Well below ESM-2 650M's own published aggregate — confirms protein-protein binding as a known weak category |
| Protein G B1 domain (GB1) | Fitness (single mutants) | Pre-validation | 0.276 | — | Pre-benchmark signal check. p < 10⁻¹⁹. Not included in 5-protein median. |
| CYP2C9 | Overall (pharmacogenomic) | CYP2C9 | 0.679 | — | |
| CYP2C9 | Heme-binding domain | CYP2C9 | 0.811 | — | |
| CYP2C9 | SRS5 substrate recognition site | CYP2C9 | 0.422 | — | ~2× performance gap vs heme-binding domain |
| 5-protein benchmark (aggregate) | Median across all 5 assays | Aggregate | 0.515 | — | 24% above the published ESM-2 650M aggregate of 0.414 (ProteinGym leaderboard CSV, accessed 2026-05-08) |
| ESM-2 Benchmark Series | Curated 20-assay ProteinGym subset | Aggregate | 0.487 | 0.414 | ~18% above the published ESM-2 650M aggregate of 0.414 |
ρ values in amber are below the ESM-2 category baseline for that assay type. Baselines: cross-model category medians from the ProteinGym leaderboard CSV (OATML-Markslab/ProteinGym, accessed 2026-05-08).