On this page
Early Access

Know which mutations will work before you spend weeks in the lab.

Published · Updated

NeuroAutomata scores every possible amino acid substitution at every position in your protein. Paste a sequence, get a fitness landscape in seconds. Download everything. Your sequences are never stored.

What Is NeuroAutomata

NeuroAutomata is a web-based protein mutation scoring tool built on ESM-2, a protein language model developed by Meta AI and trained on millions of protein sequences from across the tree of life. Think of it as a second opinion on your mutagenesis experiments — one that's read every protein sequence in UniProt.

You paste a sequence, and NeuroAutomata predicts which mutations are likely tolerable and which ones will probably break your protein. It works on any protein from any organism — even sequences that have never been experimentally characterized — because its predictions come from evolutionary patterns across all known protein families, not from labeled training data specific to your protein.

It runs in your browser. There's nothing to install. Every result — heatmaps, charts, raw scores — is downloadable. Your sequences are processed in memory and automatically deleted — nothing is ever saved to disk. Full details in our privacy policy.

We're currently looking for a small group of early testers — protein engineers, biochemists, and researchers who work with mutagenesis — to try NeuroAutomata on their own proteins and help us make it better. If that's you, request access below.

It Already Knew About EGFP

In 1995, Roger Tsien's lab mutated Serine 65 to Threonine in jellyfish GFP and created EGFP — the most widely used fluorescent protein in biology. We ran wild-type GFP (P42212) through NeuroAutomata. In under 30 seconds, it scored S65T as tolerable (-0.137) while flagging S65W and S65C as harmful. It identified the exact mutation that changed biology — without any GFP-specific training data.

GFP mutation landscape heatmap — 238 positions scored across all 20 amino acid substitutions. Blue indicates tolerable mutations, red indicates likely destructive mutations.
Every position in GFP, every possible substitution. Blue is tolerable. Red is likely destructive. At a glance: most of GFP is safe to engineer — but those red columns are positions you cannot touch.
GFP per-position sensitivity chart showing mutation tolerance across all 238 residues.
Which residues tolerate change — and which ones don't.
Single mutation analysis of S65T in GFP — scored as tolerable at -0.137, confirming the EGFP mutation.
S65T: the mutation that created EGFP. NeuroAutomata scores it as tolerable.

See It in Action

We loaded 9 proteins — fluorescent reporters, therapeutic targets, industrial enzymes, and a CRISPR domain — and scored all of them. Before running any mutations, NeuroAutomata shows you how your proteins relate to each other.

Protein similarity map showing 9 demo proteins — fluorescent proteins cluster together, enzymes group, therapeutic targets separate.
GFP and mCherry cluster together — both are fluorescent beta-barrels, but from completely different organisms. KRAS sits alone. Ubiquitin is an outlier. These groupings come from the protein sequences themselves, not from any labels we provided.

How It Works

1

Paste your sequence

Amino acid sequence or FASTA format. Up to 1,024 residues. You can also load example proteins to explore first.

2

Generate a protein fingerprint

The AI model reads your sequence and creates a numerical representation that captures what makes your protein unique — its evolutionary history, structural tendencies, and functional constraints.

3

Compare your proteins

Load multiple sequences and see which ones are related and which aren't. Proteins with similar functions cluster together — even across organisms.

4

Score every possible mutation

All 20 amino acid substitutions at every position, scored automatically. The heatmap shows you which mutations are tolerated (blue) and which are likely destructive (red) — so you know where to focus before running a single experiment.

5

See it in 3D

Mutation tolerance scores mapped onto your protein's 3D structure. Blue residues are safe to change. Red residues are conserved. See exactly where your planned mutations sit in physical space.

What It Found in GFP

The underlying model was trained on millions of protein sequences from across the tree of life. It learned which positions matter and which are flexible — without being told anything about structure, function, or experimental data. When we scored GFP:

S65T
-0.137 — tolerable. The model says "this is fine." It was. This mutation created EGFP, the most used fluorescent protein in history — work that contributed to the 2008 Nobel Prize in Chemistry.
S65W
Predicted harmful. A bulky tryptophan at this position would disrupt the chromophore — the part of GFP that actually produces fluorescence.
C48D
+3.39 — predicted most beneficial. Free cysteines cause proteins to aggregate and misfold. Protein engineers have independently targeted this exact position to improve GFP stability.

Nobody told the model about the GFP chromophore. Nobody provided GFP-specific training data. It figured this out from sequence patterns alone, in 30 seconds.

Validated Against Real Experiments

We compared NeuroAutomata's predictions against published experimental data — deep mutational scanning results from ProteinGym, a reference benchmark where researchers actually made the mutations and measured the effects.

Protein What was measured Correlation
Beta-lactamase (TEM-1) Enzyme activity 0.731
PTEN Organismal fitness 0.519
BRCA1 Enzyme activity 0.515
UBC9 Expression level 0.473
GB1 Binding activity 0.276

Median correlation: 0.515 across these five proteins. For context, the published ESM-2 650M baseline on the full ProteinGym benchmark is 0.414 (leaderboard CSV, accessed 2026-05-08) — these five proteins are ones where the model performs above its average.

Protein-protein binding predictions are weaker — for calmodulin, correlation was only 0.212. We report this openly. The in-app Learn section explains the full methodology, what works well, and where predictions fall short.

Your Sequences Stay Private

Your sequences are processed in memory, held in a temporary queue for up to one hour so you can download your results, and then automatically deleted. Nothing is saved to disk. We don't train models on your data. That's why everything is downloadable — if you don't save it yourself, nobody has it.

Structure prediction is the one exception: sequences are sent to Meta's ESMFold API for 3D structure generation. Everything else stays on our infrastructure. Full details in our privacy policy.

What You Get

  • Works in your browser — nothing to install, nothing to configure
  • Results in seconds — not days, not hours
  • Works on any protein — from any organism, even ones never studied before
  • Download everything — heatmaps, charts, raw data as CSV. Your results, your files.

Request Access

Submit your information below. We'll review your request and reach out if you qualify for early access.