MassNET

MassNet: A Foundational Resource for Advancing AI in Proteomics

MassNet is the largest publicly available DDA-based proteomics dataset to date, and the first specifically optimized for AI applications.

~30 TB of raw DDA-MS data

1.54 billion MS/MS spectra;

548 million peptide-spectrum matches (PSMs);

Coverage across 35 species, including animals, plants, and microorganism;

The release of MassNet marks a new chapter in AI-driven proteomics:

The first foundational training dataset in proteomics, comparable to those in NLP and CV;

Enables AI-driven applications in non-model organism research, biomarker discovery, and PTM identification;

Built on a standardized format and high-performance architecture to support scalable and reproducible proteomics analysis;

MassNet structures raw spectra into 2D tensors, preserving key features like m/z and intensity through a unified data format.

XuanjiNovo is a MassNet-based decoding model integrating multiple core innovations.