MaveDB

Published · Updated

Also known as: Multiplexed Assay of Variant Effect Database

The public repository for deep mutational scanning and multiplexed variant effect datasets. Each dataset gets a persistent URN for citation and reproducibility.

Source: Esposito D et al. 'MaveDB: an open-source platform to distribute and interpret data from multiplexed assays of variant effect.' Genome Biol 2019;20:223. https://doi.org/10.1186/s13059-019-1845-6

Primary reference ↗

MaveDB (Multiplexed Assay of Variant Effect Database) is the community repository for deep mutational scanning and related functional assay data. It provides persistent, citable identifiers for datasets and standardizes score formats across labs.

Structure

Each dataset is organized into:

  • Experiments: groups related score sets from the same paper
  • Score sets: individual assay results (e.g., CYP2C9 activity vs. CYP2C9 abundance are separate score sets)

Each score set has a URN (uniform resource name) like urn:mavedb:00000095-a-1 — a stable identifier that won’t change as datasets are updated.

API Access

MaveDB provides a REST API for programmatic access:

# Download scores for a specific score set
curl "https://api.mavedb.org/api/v1/score-sets/urn:mavedb:00000095-a-1/scores"

Why It Matters for Reproducibility

Every benchmark in the ESM-2 Benchmark Series traces to a MaveDB URN or a direct GitHub repository. Readers can download the identical dataset used to compute our results and verify every correlation coefficient independently.