DDA-BERT is an end-to-end rescoring tool for data-dependent acquisition (DDA) proteomics. Built on a Transformer-based deep learning architecture, it refines initially identified peptide–spectrum matches (PSMs) to improve identification accuracy and sensitivity. The model is trained on a large-scale dataset comprising 12,285 DDA-MS files and approximately 271 million high-confidence PSMs, enabling effective learning of complex relationships between peptide sequences and tandem mass spectra.
DDA-BERT demonstrates robust and consistent performance across a wide range of biological systems, including animal, plant, and microbial proteomes, and maintains strong performance when applied to HLA immunopeptidomics datasets. The software supports both a fully integrated, end-to-end workflow—encompassing database searching, data preprocessing and cleaning, PSM rescoring, FDR control, and protein inference—and a modular rescoring mode. In the modular setting, DDA-BERT can rescore PSMs derived from either a single search engine or combinations of multiple search engines, requiring only the corresponding search results as input.
By default, DDA-BERT provides built-in support for FragPipe (fp), Sage, and AlphaPept (ap). Users may also adapt the provided scripts to accommodate outputs from additional database search engines for non-commercial research purposes. By leveraging an end-to-end learning framework, DDA-BERT eliminates the need for search engine–specific feature engineering and can be readily integrated into diverse proteomics workflows.
DDA-BERT is actively maintained and continuously updated. For questions, feedback, or licensing inquiries, please contact: ajun@westlake.edu.cn; guotiannan@westlake.edu.cn.
Core Features
End-to-end deep learning
Transformer-based model trained on approximately 271 million PSMs from 11 species; no feature engineering required and readily integrates into diverse proteomics workflows.
Robust and versatile
Outperforms multiple state-of-the-art rescoring tools across diverse species and remains effective on trace-level samples and HLA immunopeptidomics data.
Flexible and extensible
Supports both end-to-end workflows and modular PSM rescoring from single or multiple search engines.
Technical specifications
Training sample size
~271 million PSMs
Model Architecture
Transformer-based end-to-end deep learning model
Supported MS data formats
Compatible with Bruker timsTOF (.d), Thermo (.raw), and Sciex (.wiff) data formats.
Application scenarios
Applicable to diverse sample types, trace-level proteomics, HLA immunopeptidomics, and multi-species proteome datasets.
Start using DDA-BERT
Download and experience the new generation of intelligent proteomics analysis tools now
Downloading the latest version