VAMP-seq

Published

Also known as: Variant Abundance by Massively Parallel sequencing

A high-throughput assay that measures protein abundance (expression and stability) for thousands of variants simultaneously using fluorescent protein fusions and flow cytometry.

Source: Matreyek KA et al. 'Multiplex assessment of protein variant abundance by massively parallel sequencing.' Nat Genet 2018;50(6):874-882. https://doi.org/10.1038/s41588-018-0122-z

Primary reference ↗

VAMP-seq (Variant Abundance by Massively Parallel sequencing) measures how thousands of protein variants affect protein abundance — a proxy for stability and folding. Developed by Kenneth Matreyek at Case Western Reserve University, it fuses each variant to a fluorescent reporter, sorts cells by fluorescence intensity, and counts variants by deep sequencing.

How It Works

  1. Library construction: Every possible single amino acid substitution is cloned into a GFP-fusion expression vector
  2. Cellular expression: The library is expressed in human cells (typically HEK293T)
  3. Flow cytometry sorting: Cells are sorted into bins by fluorescence intensity (high = abundant protein, low = unstable/misfolded)
  4. Deep sequencing: Each bin is sequenced to count variant frequencies
  5. Score calculation: Abundance score = enrichment in high-fluorescence bins relative to wild-type

What It Measures — and What It Misses

VAMP-seq measures protein abundance, which reflects stability, folding, and expression. It does NOT directly measure:

  • Enzymatic activity (a variant can be abundant but catalytically dead)
  • Protein-protein binding
  • Subcellular localization

This is the “activity-without-abundance” blind spot: variants like TPMT*5 (L49S) and CYP2C19*6 (R132Q) retain wild-type abundance but have no enzymatic function. VAMP-seq scores them as normal; activity-based assays flag them as loss-of-function.

Relationship to ESM-2

ESM-2 masked marginal scores correlate more strongly with VAMP-seq abundance data than with some activity assays, because evolutionary conservation (what ESM-2 learns) partly reflects structural stability. For CYP2C9: ESM-2 vs. abundance rho = 0.634; ESM-2 vs. activity rho = 0.679. The activity correlation is higher because ESM-2 also captures catalytic constraint beyond just folding.

Key Datasets

ProteinVariantsVAMP-seq paper
TPMT3,685Matreyek et al. 2018, Nat Genet
PTEN4,112Matreyek et al. 2018, Nat Genet
NUDT152,844Suiter et al. 2020
CYP2C96,370Amorosi et al. 2021