Research · Verification · Disclosure

Research and Disclosure at Axon Agentic

Published · Updated
On this page

This page organizes Axon Agentic's published evidence: the validation work that tests our products against experimental data, the independent audit trail for the claims we publish, and the operational disclosures that describe who is responsible for what. It exists because scientifically literate readers — protein engineers, clinical researchers, and computational biologists evaluating new tools — should be able to check methodology, trace numbers to sources, and understand who is accountable before deciding whether to rely on this work.

Three artifact classes are organized here, in this order:

Validation is our science. It covers what the systems do and where they fail — deep mutational scanning A lab technique that measures the functional effect of every possible single amino acid substitution across a protein. The gold standard for variant effect data. Full definition benchmarks, known failure modes, and methods. Validation artifacts report on the systems themselves.

Verification is the audit of our claims. It answers a different question: not whether the system works, but whether the claims we publish about the system are accurate. These are independent fact-checks run by Veritas — an AI agent that operates separately from the agents that produce content — and are exposed as public artifacts rather than internal quality checks.

Disclosure covers how Axon Agentic operates: who is on the team, what AI agents can and cannot do, and where to find NeuroAutomata's privacy policy.

The sections below list the available artifacts with a description of what each one contains and when to read it.


Validation — What the systems do and where they fail

Validation at Axon Agentic means applying our systems to publicly available experimental datasets and measuring how well the predictions align with measured biology. The datasets are peer-reviewed deep mutational scanning A lab technique that measures the functional effect of every possible single amino acid substitution across a protein. The gold standard for variant effect data. Full definition assays, saturation genome editing A CRISPR-based assay that introduces every possible single-nucleotide variant at a genomic locus in its native context, then measures functional impact through cell viability or other selection. The gold standard for clinical variant classification in cancer genes. Full definition studies, and variant abundance measurements — experiments that exist independently of us and predate our analysis. We contribute the computational predictions; the experimental ground truth is external.

This is not a regulatory validation. Nothing here constitutes a clinical study or regulatory submission. Results are designated Research Use Only Research Use Only — a regulatory designation meaning the tool provides research scores, not clinical diagnoses. The same label used by REVEL, CADD, AlphaMissense, and PolyPhen-2. Full definition , the same designation used by REVEL, CADD, AlphaMissense, and PolyPhen-2.

The limitation disclosures below are not footnotes. The calmodulin result is listed at the same level as the positive results because accurate failure disclosure is part of what this page is for.

NeuroAutomata: ESM-2 validation across five proteins, median Spearman rho 0.515 (internal benchmark)

ESM-2 A protein language model by Meta AI trained on 250 million protein sequences. Predicts how amino acid mutations affect protein function from sequence alone — no structure required. Full definition 650M is the protein language model A deep learning model trained on millions of protein sequences to predict how mutations affect function. NeuroAutomata uses ESM-2, a PLM developed by Meta AI. Full definition at the core of NeuroAutomata. It was developed by Meta AI Research (Lin et al. 2023, Science) and predicts variant effects from sequence alone using masked marginal scoring ESM-2's zero-shot method for scoring variant effects. It masks each position and measures how surprising the mutant amino acid is relative to wild-type. Full definition — no structure required, no experimental data needed at inference time.

The published ProteinGym A standardized benchmark suite for protein variant effect predictors, covering 217 deep mutational scanning assays across diverse protein families. Full definition zero-shot substitution benchmark places ESM-2 650M at a mean Spearman rho A rank correlation coefficient (−1 to +1) that measures whether two variables agree in order, not magnitude. The primary metric for variant effect benchmarks. Full definition of 0.414 across 217 substitution DMS A lab technique that measures the functional effect of every possible single amino acid substitution across a protein. The gold standard for variant effect data. Full definition assays, ranking 45 of 97 models on the live leaderboard CSV (accessed 2026-05-08). That is the external published number.

Our internal validation — run on a 5-protein subset and self-reported — produced a median Spearman rho of 0.515, approximately 24% above the published ESM-2 650M baseline (internal benchmark — see validation details). The five proteins and their individual results:

ProteinSpearman rhoExperimental assay
Beta-lactamase0.731DMS (activity)
PTEN0.519VAMP-seq (abundance)
BRCA10.515SGE (function scores)
UBC90.473DMS (activity)
GB10.276DMS (fitness)

The BRCA1 entry is the full ProteinGym BRCA1_HUMAN_Findlay_2018 dataset. Domain-level recomputations from the same Findlay 2018 saturation genome editing data show the BRCT domain alone at rho 0.534 (internal benchmark) and the RING domain at 0.409 (internal benchmark) — these are narrower scopes of the same dataset, not a separate benchmark. Sample sizes: BRCT N=1,262; RING N=575.

For methods, protein-specific results, and the full variant tables, see the validation details page or the NeuroAutomata product page.

ESM-2 limitation: protein-protein binding correlations are weak (calmodulin rho 0.212)

Calmodulin is a calcium-binding protein whose functional effect depends on binding-affinity changes at protein-protein interfaces rather than on structural stability or folding. ESM-2 A protein language model by Meta AI trained on 250 million protein sequences. Predicts how amino acid mutations affect protein function from sequence alone — no structure required. Full definition scores evolutionary plausibility — patterns learned from sequence conservation across millions of proteins — not binding energy. On the calmodulin DMS A lab technique that measures the functional effect of every possible single amino acid substitution across a protein. The gold standard for variant effect data. Full definition dataset, ESM-2 produced a Spearman rho A rank correlation coefficient (−1 to +1) that measures whether two variables agree in order, not magnitude. The primary metric for variant effect benchmarks. Full definition of 0.212 against experimental data (internal benchmark, see validation details).

Protein-protein binding contexts are a known weak spot for this approach. Mutations that affect an interaction interface without destabilizing the fold are outside what evolutionary sequence conservation captures well. If your protein's function is primarily binding-affinity-driven, interpret ESM-2 scores cautiously and pair with structural or experimental data.

HPA multi-agent system: 18-month journey from naive RAG to verification-first architecture

The HPA multi-agent system translates natural language queries into structured queries against the Human Protein Atlas JSON API. Over 18 months of development, the architecture evolved from a naive RAG Retrieval-Augmented Generation — a technique where an AI retrieves relevant documents from a database before generating a response, grounding answers in real data to reduce hallucinations. Full definition baseline through iterative revision to a verification-first multi-agent A network of specialized, autonomous AI agents that collaborate and divide complex tasks among themselves to solve problems too difficult for a single model. Full definition pipeline. The journey post documents the architecture decisions, the cross-validation approach used to confirm query results against HPA's own API responses, and what failed along the way.

Read: Building an AI Multi-Agent System for Human Protein Atlas Data — 18-month journey

Validation methodology: HPA multi-agent system

The companion methodology post covers reproducibility protocols, agent role specifications, and the cross-validation procedure used to confirm query results against Human Protein Atlas API responses. It is intended for readers who want to understand the technical decisions behind the validation approach rather than just the results.

Read: Validation methodology — HPA multi-agent system


Verification — Independent audit of published claims

Validation tells you whether the system works. Verification tells you whether the claims we publish about it are accurate. These are not the same question.

Veritas is the AI agent responsible for verification at Axon Agentic. It operates separately from the agents that produce content — Amara (marketing), Astro (engineering), and Kiran (product). Veritas cannot edit content produced by other agents. Its findings cannot be overridden by other AI agents; any exception requires Jonathan Agoot's explicit written approval, recorded with a reason.

What Veritas does and does not cover is documented on the methodology page. The hub points there rather than reproducing the implementation here.

Machine-readable claims YAML: every published claim with its source and verification status

Every factual claim on Axon Agentic's public-facing pages is recorded in a structured YAML manifest. Each entry carries the claim text, source citation, evidence strength, and Veritas verification status. The files are served as plain text and are designed to be readable by both humans and AI language model Large Language Model — an AI model trained on vast text that can understand and generate human language. Examples: GPT-5, Claude, Gemini. Full definition systems that process structured data.

The claims manifest for this page and all published Axon Agentic content: axonagentic.ai/research/claims.yaml

Veritas verification reports: per-artifact fact-checks with source citations and verdicts

Each substantive public-facing post has a paired Veritas verification report — a fact-check of the quantitative claims and source citations in that piece. Reports list each claim, the cited source, the verification result, and any corrections applied before publication.

Available reports are listed at /verification.


Disclosure — How Axon Agentic operates

Verification audits the science. Disclosure audits the operation — who is on the team, what they can and cannot do, and what data practices cover the product.

AI Agent Staff: six AI agents and one human, with documented roles and approval requirements

Axon Agentic's team is six AI agents and one human. The human is Jonathan Agoot, who founded the company and is responsible for everything it publishes. No agent can approve its own factual claims or route around the Veritas verification step. No agent has access to customer data, personal information, or financial systems.

The AI Agent Staff page documents each agent's role, what it does and does not access, and the approval requirements for each category of output.

NeuroAutomata privacy policy: platform-specific data handling for neuroautomata.axonagentic.ai

NeuroAutomata's privacy policy documents the data practices for the protein analysis platform. This is a platform-specific policy scoped to neuroautomata.axonagentic.ai — it is not an Axon Agentic site-wide policy.

Read: NeuroAutomata privacy policy


Frequently Asked Questions

What is the difference between validation and verification at Axon Agentic?

Validation tests whether the systems produce accurate predictions against experimental data. Verification independently checks whether the claims published about those results are accurate. Most companies publish the first and skip the second.

What model and scoring method does the NeuroAutomata benchmark use?

ESM-2 650M (Meta AI Research, Lin et al. 2023, Science) with masked marginal scoring. Published ProteinGym aggregate for this model: Spearman rho 0.414, rank 45 of 97 models on the live leaderboard CSV (OATML-Markslab/ProteinGym, accessed 2026-05-08). Internal validation on a 5-protein subset produced a median of 0.515 — that is a subset result, not the full benchmark.

Are Axon Agentic's results peer-reviewed?

No. The validation work is evidence-published: methodology documented, numbers traceable to source data, independent verification by Veritas, claims YAML publicly linkable. The experimental datasets benchmarked against (ProteinGym, SGE, VAMP-seq DMS assays) are peer-reviewed.

How do I verify a specific claim Axon Agentic has made?

Each published claim has a source citation and Veritas verification status in the claims YAML for that page, served at axonagentic.ai/research/claims.yaml. If a claim cites a peer-reviewed paper, the DOI or PMID is listed. If it is an internal measurement, the YAML identifies it as such and links to the validation page. Disagreements can be sent to Jonathan Agoot via the contact form.

How do I cite NeuroAutomata or Axon Agentic's research?

Use the plain attribution strings in the "How to cite this work" section below.

What does Axon Agentic not publish, and why?

Internal pipeline architecture, environment configurations, and per-agent operational routing are not enumerated publicly. Stating what agents can and cannot do is the appropriate trust signal; enumerating how the internal system is wired is not.


How to cite this work

For NeuroAutomata:

Axon Agentic (2026). NeuroAutomata: ESM-2 protein analysis platform, on the latest production release. https://neuroautomata.axonagentic.ai

For the HPA multi-agent system:

Axon Agentic (2025). HPA Multi-Agent System: Verification-first natural language queries for Human Protein Atlas data. https://axonagentic.ai/blog/ai-natural-language-human-protein-atlas-18-month-journey

No DOI is assigned to this work at this time. No CITATION.cff is available (repositories are private).


Work with us

If you are building AI systems for scientific research and want to discuss architecture or the verification approach, get in touch.

If you want to run your protein sequence through NeuroAutomata — score variants, view the mutation landscape, or explore ESM-2 predictions — the platform is available at neuroautomata.axonagentic.ai .