MassNet: A Foundational Resource for Advancing AI in Proteomics

It integrates 27,643 mass spectrometry files sourced from authoritative repositories such as PRIDE and iProX, comprising approximately 30 TB of raw data and more than 1.5 billion MS/MS spectra across 35 species, including animals, plants, and microorganism. Model organisms such as mouse, rat, C. elegans, and D. melanogaster also show high protein coverage, supporting cross-species functional studies. In the plant domain, high-quality data for Arabidopsis, rice, soybean, and other key species significantly expands the spectral foundation of plant proteomics. For microbes, the dataset includes core model organisms such as yeast (S. cerevisiae), E. coli, and B. subtilis , and further extends to archaea, actinomycetes, and fungi, providing broad phylogenetic representation.

Beyond its extensive taxonomic breadth, MassNet achieves high standards in PSM count, peptide diversity, and annotation completeness, establishing a robust foundation for training and deploying AI models in proteomics.

Database
Animal
H. sapiens

282 million PSMs;

1.7 million precursors;

840,717 peptides;

19,960 proteins;

M. musculus

212 million PSMs;

1.3 million precursors;

681,094 peptides;

16,939 proteins;

R. norvegicus

4.3 million PSMs;

150,839 precursors;

97,554 peptides;

6228 proteins;

C. elegans

1.5 million PSMs;

110,648 precursors;

71,503 peptides;

3347 proteins;

D. melanogaster

958,399 PSMs;

63,945 precursors;

46,363 peptides;

2776 proteins;

O. cuniculus

748,022 PSMs;

19,632 precursors;

11,174 peptides;

750 proteins;

X. laevis

431,342 PSMs;

11,931 precursors;

8407 peptides;

1066 proteins;

B. taurus

264,058 PSMs;

32,043 precursors;

24,130 peptides;

2578 proteins;

S. scrofa

218,732 PSMs;

17,764 precursors;

11,600 peptides;

932 proteins;

E. caballus

155,325 PSMs;

3698 precursors;

2059 peptides;

125 proteins;

Z. rerio

101,944 PSMs;

10,713 precursors;

8193 peptides;

1255 proteins;

G. gallus

69,835 PSMs;

5542 precursors;

4065 peptides;

677 proteins;

C. jacchus

23,570 PSMs;

1476 precursors;

913 peptides;

71 proteins;

C. familiaris

23,915 PSMs;

4480 precursors;

2980 peptides;

282 proteins;

Plant
A. thaliana

6.7 million PSMs;

283,185 precursors;

174,266 peptides;

11,693 proteins;

O. sativa

832,587 PSMs;

42,620 precursors;

33,087 peptides;

2842 proteins;

G. max

53,483 PSMs;

3489 precursors;

2316 peptides;

291 proteins;

C. arabica

128,473 PSMs;

1328 precursors;

1717 peptides;

64 proteins;

C. annuum

24,310 PSMs;

689 precursors;

460 peptides;

52 proteins;

Microorganism
S. cerevisiae

22.4 million PSMs;

365,077 precursors;

216,503 peptides;

7211 proteins;

E. coli

5.8 million PSMs;

133,537 precursors;

70,699 peptides;

20,743 proteins;

B. subtilis

4.0 million PSMs;

77,453 precursors;

40,147 peptides;

3527 proteins;

T. gondii

843,678 PSMs;

67,028 precursors;

50,604 peptides;

5706 proteins;

S. aureus

772,887 PSMs;

37,050 precursors;

20,200 peptides;

9499 proteins;

T. brucei

78,028 PSMs;

24,371 precursors;

19,083 peptides;

2736 proteins;

B. cereus

694,175 PSMs;

22,465 precursors;

13,865 peptides;

3763 proteins;

Z. mobilis

570,739 PSMs;

15,848 precursors;

9353 peptides;

351 proteins;

A. baumannii

552,926 PSMs;

14,037 precursors;

8581 peptides;

1792 proteins;

P. aeruginosa

410,796 PSMs;

42,777 precursors;

25,585 peptides;

2975 proteins;

C. reinhardtii

404,679 PSMs;

20,677 precursors;

11,359 peptides;

330 proteins;

S. epidermidis

375,204 PSMs;

20,981 precursors;

13,109 peptides;

1387 proteins;

M. thermoacetica

100,995 PSMs;

7512 precursors;

5503 peptides;

325 proteins;

L. pneumophila

94,999 PSMs;

7638 precursors;

4898 peptides;

353 proteins;

R. palustris

37,864 PSMs;

5806 precursors;

3826 peptides;

366 proteins;

S. coelicolor

26,649 PSMs;

3282 precursors;

2773 peptides;

430 proteins;