ESM-2 — Glossary | Axon Agentic

ESM-2 (Evolutionary Scale Modeling 2) is a family of protein language models developed by Meta AI Research and published in Science (2023). The models range from 8 million to 15 billion parameters; the 650M parameter variant is widely used as a benchmark baseline in mutation effect prediction tasks.

How It Works

ESM-2 is trained with a masked language modeling objective on ~250 million protein sequences from UniRef. Given a protein sequence, the model learns a rich embedding that captures evolutionary, structural, and functional patterns without ever being shown experimental structures.

For mutation effect prediction, a zero-shot scoring approach is used: the model scores the likelihood of a mutant sequence relative to the wild-type. Higher likelihood = more tolerated mutation.

Benchmark Performance

On the ProteinGym substitution benchmark (217 deep mutational scanning assays), ESM-2 650M achieves a mean Spearman correlation of 0.414 across all proteins (rank 45 of 97 models on the live ProteinGym leaderboard, accessed 2026-05-08) — a solid zero-shot baseline given it uses no structural or experimental data at inference time.

Limitations

Performance varies significantly by protein family; some proteins are poorly represented in training data
Weaker on protein-protein binding predictions than single-protein stability
Larger ESM-2 variants (3B, 15B) improve performance but at substantial compute cost