AIPC (Artificial Intelligence Proteomics Competition) series focuses on leveraging AI to uncover the ‘dark matter’ of MS-based proteomics.
1. Competition task
Peptide-spectrum matching (PSM) rescoring optimization: Participants will develop AI models to refine PSM and enhance the ranking of potential peptide sequences using machine learning techniques and existing protein databases.
2. Participants
Students, researchers, bioinformaticians, computational biologists, AI practitioners, and anyone worldwide interested in applying AI to proteomics.
3. Sponsors
4. MSDT Dataset
Each row in the MSDT files represents one PSM.
We plan to launch two competition tracks: the Enthusiast Track and the Professional Track, each corresponding to training datasets of different scales.
The Enthusiast Track will include approximately 37 million PSMs (16 million from Bruker, 16 million from Thermo, 5 million from SCIEX), while the Professional Track will consist of around 300 million PSMs from Thermo.
We will use two different test datasets for two tracks.
5. Baseline Model
We will offer a PSM-rescoring baseline to help participants know how to use MSDT files and train a base model on GPUs.
6. Evaluation Metrics
- Identification Quantity: Identification Quantity: In a two-species scenario, the number of unique peptides correctly identified when controlling the False Discovery Rate (FDR) at 1%.
- Evaluation Time: The model must complete processing a single file within 3 hours; otherwise, it will not be evaluated.
7. Competition Rules
7.1 Team Formation:
Each team must have a minimum of one and a maximum of five members.
Code sharing between teams is strictly prohibited—violators will be disqualified.
7.2 Submission Rules:
Each team can submit up to three times per day, with a total submission limit of 100 within 100 days.
Invalid submissions will not count toward the total submission limit.
7.3 Ranking Rules:
A/B Leaderboard System:
- The test dataset is split into A leaderboard (40%) and B leaderboard (60%).
- The A leaderboard updates in real-time and displays rankings.
- The B leaderboard (used to determine the final awards) will be revealed three days after the competition ends.
Final Ranking Criteria:
- Score > Submission Count > Submission Time
- If two teams have the same Score, the team with fewer submissions ranks higher.
- If both Score and submission count are identical, the team that submitted earlier ranks higher.
Final Submission Selection:
- Each team’s leader can designate two final submissions for the B leaderboard ranking.
- If no selection is made, the highest-ranked A leaderboard submission will be used by default.
8. Anticipated Competition Duration
Jul – Oct 2025 (100 days)
9. Awards & Prizes
First Prize: $5,000 (each track)
Second Prize: $1,500 (each track)
Third Prize: $500 (each track)
10. Additional Benefits
Computing Credit Rewards: Every registered participant will receive a $15 Bohrium® computing credit.
Best Notebook Award: Participants must submit complete code using the Bohrium Notebook platform. We encourage contestants to publish relevant content in Notebook format on the Case Plaza with the tag AI4SCUP-[AIPC]. The top three notebooks with the most likes will each receive a $150 computing credit.
Internship Opportunities: Outstanding participants may be recommended for internship opportunities at relevant institutions, gaining access to top-tier research resources and networking with leading interdisciplinary experts.
Invited tour in participating institutes.
11. Q&A
Q: What is PSM rescoring and why is it important?
A: PSM (Peptide-Spectrum Match) rescoring is the process of re-evaluating and improving the confidence of initial matches between tandem mass spectra and peptide sequences. In bottom-up proteomics, search engines generate candidate PSMs, but their scoring methods can be limited or biased. Rescoring leverages machine learning models to incorporate additional features and improve the discrimination between true and false matches. This leads to higher identification accuracy and deeper proteome coverage.
Q: Can I participate if I don’t have much computing resources?
A: Yes. Participants with limited computing resources can apply for support after submitting a few evaluation results. Based on performance, we will provide computing resources to promising participants.
Q: Can I use pre-trained models?
A: No. To ensure fairness and emphasize model development skills, participants are required to train their models from scratch using only the data provided. The code of winning teams will be reviewed to verify compliance.
Q: Is it allowed to use external datasets for training?
A: We encourage participants to explore and use external datasets to enhance their models, especially if they can bring new perspectives to PSM rescoring. However, to ensure transparency and reproducibility, all external data sources must be clearly documented and reported in your final submission.
Q: Can I design my own features for rescoring?
A: Absolutely! In fact, feature engineering is a key part of this competition. You are encouraged to extract new features from the PSMs.
Q: When does the competition start?
A: The competition is currently under preparation and registration has not yet opened. Once the competition is officially launched, a registration link will be posted on this website. Please stay tuned!