DDA-BERT Logo

DDA-BERT is an end-to-end rescoring tool for data-dependent acquisition (DDA) proteomics. Built on a Transformer-based deep learning architecture, it refines initially identified peptide–spectrum matches (PSMs) to improve identification accuracy and sensitivity. The model is trained on a large-scale dataset comprising 12,285 DDA-MS files and approximately 271 million high-confidence PSMs, enabling effective learning of complex relationships between peptide sequences and tandem mass spectra.

DDA-BERT demonstrates robust and consistent performance across a wide range of biological systems, including animal, plant, and microbial proteomes, and maintains strong performance when applied to HLA immunopeptidomics datasets. The software supports both a fully integrated, end-to-end workflow—encompassing database searching, data preprocessing and cleaning, PSM rescoring, FDR control, and protein inference—and a modular rescoring mode. In the modular setting, DDA-BERT can rescore PSMs derived from either a single search engine or combinations of multiple search engines, requiring only the corresponding search results as input.

By default, DDA-BERT provides built-in support for FragPipe (fp), Sage, and AlphaPept (ap). Users may also adapt the provided scripts to accommodate outputs from additional database search engines for non-commercial research purposes. By leveraging an end-to-end learning framework, DDA-BERT eliminates the need for search engine–specific feature engineering and can be readily integrated into diverse proteomics workflows.

DDA-BERT is actively maintained and continuously updated. For questions, feedback, or licensing inquiries, please contact: ajun@westlake.edu.cn; guotiannan@westlake.edu.cn.

Core Features

End-to-end deep learning

Transformer-based model trained on approximately 271 million PSMs from 11 species; no feature engineering required and readily integrates into diverse proteomics workflows.

Robust and versatile

Outperforms multiple state-of-the-art rescoring tools across diverse species and remains effective on trace-level samples and HLA immunopeptidomics data.

Flexible and extensible

Supports both end-to-end workflows and modular PSM rescoring from single or multiple search engines.

Technical specifications

Training sample size

~271 million PSMs

Model Architecture

Transformer-based end-to-end deep learning model

Supported MS data formats

Compatible with Bruker timsTOF (.d), Thermo (.raw), and Sciex (.wiff) data formats.

Application scenarios

Applicable to diverse sample types, trace-level proteomics, HLA immunopeptidomics, and multi-species proteome datasets.

Start using DDA-BERT

Download and experience the new generation of intelligent proteomics analysis tools now

Downloading the latest version