VAMP-seq
Also known as: Variant Abundance by Massively Parallel sequencing
A high-throughput assay that measures protein abundance (expression and stability) for thousands of variants simultaneously using fluorescent protein fusions and flow cytometry.
Source: Matreyek KA et al. 'Multiplex assessment of protein variant abundance by massively parallel sequencing.' Nat Genet 2018;50(6):874-882. https://doi.org/10.1038/s41588-018-0122-z
Primary reference ↗VAMP-seq (Variant Abundance by Massively Parallel sequencing) measures how thousands of protein variants affect protein abundance — a proxy for stability and folding. Developed by Kenneth Matreyek at Case Western Reserve University, it fuses each variant to a fluorescent reporter, sorts cells by fluorescence intensity, and counts variants by deep sequencing.
How It Works
- Library construction: Every possible single amino acid substitution is cloned into a GFP-fusion expression vector
- Cellular expression: The library is expressed in human cells (typically HEK293T)
- Flow cytometry sorting: Cells are sorted into bins by fluorescence intensity (high = abundant protein, low = unstable/misfolded)
- Deep sequencing: Each bin is sequenced to count variant frequencies
- Score calculation: Abundance score = enrichment in high-fluorescence bins relative to wild-type
What It Measures — and What It Misses
VAMP-seq measures protein abundance, which reflects stability, folding, and expression. It does NOT directly measure:
- Enzymatic activity (a variant can be abundant but catalytically dead)
- Protein-protein binding
- Subcellular localization
This is the “activity-without-abundance” blind spot: variants like TPMT*5 (L49S) and CYP2C19*6 (R132Q) retain wild-type abundance but have no enzymatic function. VAMP-seq scores them as normal; activity-based assays flag them as loss-of-function.
Relationship to ESM-2
ESM-2 masked marginal scores correlate more strongly with VAMP-seq abundance data than with some activity assays, because evolutionary conservation (what ESM-2 learns) partly reflects structural stability. For CYP2C9: ESM-2 vs. abundance rho = 0.634; ESM-2 vs. activity rho = 0.679. The activity correlation is higher because ESM-2 also captures catalytic constraint beyond just folding.
Key Datasets
| Protein | Variants | VAMP-seq paper |
|---|---|---|
| TPMT | 3,685 | Matreyek et al. 2018, Nat Genet |
| PTEN | 4,112 | Matreyek et al. 2018, Nat Genet |
| NUDT15 | 2,844 | Suiter et al. 2020 |
| CYP2C9 | 6,370 | Amorosi et al. 2021 |