MassNet: A Foundational Resource for Advancing AI in Proteomics

MassNet is the largest publicly available DDA-based proteomics dataset to date, and the first specifically optimized for AI applications.

~30 TB of raw DDA-MS data

1.54 billion MS/MS spectra;

558 million peptide-spectrum matches (PSMs);

Coverage across 35 species, including animals, plants, and microorganism;

The human subset covers ~98% of annotated proteins;

The release of MassNet marks a new chapter in AI-driven proteomics:

The first foundational training dataset in proteomics, comparable to those in NLP and CV;

Enables AI-driven applications in non-model organism research, biomarker discovery, and PTM identification;

Built on a standardized format and high-performance architecture to support scalable and reproducible proteomics analysis;