Blog

Published December 20, 2025 · Updated June 25, 2026

We Ran a 10× Bigger Protein Model. It Didn't Rank Variants Any Better.

Scaling ESM-C 600M to 6B gave no zero-shot gain over ESM-2 650M on 201 ProteinGym assays: a non-significant regression (grouped Spearman rho 0.419 vs 0.431).

July 13, 2026

researchbenchmarks

Use NeuroAutomata in Claude and ChatGPT with MCP

Add NeuroAutomata's protein tools to Claude or ChatGPT with one MCP connector URL. Which tools are free, which need Pro, and how the connection is secured.

July 2, 2026

producthow-to

An Independent Check Caught a Claim We'd Overstated

An independent check corrected one of our own live ESM-2 claims about disordered regions — and overturned an internal assessment too. Why we don't self-certify.

June 15, 2026

The Feature We Didn't Ship

Before building a per-position confidence view for our variant scorer, we pre-registered a pass/fail bar — witnessed, no escape clause. It failed. We didn't ship it.

June 15, 2026

Ranking TP53 VUS with ESM-2: Strong in the DNA-Binding Domain, Blind in the Disordered Regions

ESM-2 ranks germline TP53 DNA-binding-domain variants reliably (Spearman rho 0.46–0.68) but carries no usable signal in the disordered regions (0 to −0.26).

June 4, 2026

researchbenchmarks

Joint coverage on INSR: where ESM-2 lands and where it inverts

Across 13,927 INSR ectodomain variants, ESM-2 ranks at ρ=0.594 on the L1 leucine-rich repeat and ρ=−0.088 at the αCT helix — same protein, same model, same assay. The contrast is structural.

May 15, 2026

researchbenchmarks

Why ESM-2 catches the TPMT alleles VAMP-seq misses

Across the 5 canonical TPMT clinical alleles, ESM-2 ranks every one deleterious from sequence alone. VAMP-seq catches *2, *3B, *3C. ESM-2 covers *5 and *7.

May 12, 2026

researchbenchmarks

rho = 0.772 in BRCA1 BRCT: What 5,949 Cancer Variants Reveal About ESM-2's Limits

ESM-2 vs 5,949 BRCA1 and PTEN variants — validated against SGE and VAMP-seq. Where it works, where it's blind, and why the combination matters.

April 14, 2026

researchbenchmarks

Scoring 6,142 CYP2C9 Variants in 30 Seconds: ESM-2 vs Deep Mutational Scanning

ESM-2 predictions vs 6,142 CYP2C9 variants from the largest pharmacogenomic DMS dataset. What we found — including where it fails.

April 7, 2026

researchbenchmarks

Why We Built NeuroAutomata

We built NeuroAutomata to make ESM-2 protein variant scoring accessible without setup. Validation results, including the one protein where it failed.

March 31, 2026

researchbenchmarks

Building an AI Multi-Agent System to Enable Natural Language Queries for Human Protein Atlas (HPA) data: An 18-Month Journey

How I built a multi-agent system for natural language queries for Human Protein Atlas data, from naive RAG to AI verification architecture

December 23, 2025

announcementagentic-ai

Validation Methodology: HPA Multi-Agent System

Detailed validation methodology, reproducibility protocols, and AI agent architecture for the HPA natural language query system

December 23, 2025

validationmethodology