Multidisciplinary team with diverse backgrounds
00000+Manually Curated Proteins
Welcome to Guomics
01.1 Sample preparation. Our technical improvements are mainly based on our original pressure cycling technology (PCT)-based sample preparation method which processes ~ 1 mg fresh-frozen tissues (Guo et al. Nat Med, 2015). To meet the technical needs of large-scale clinical research, we further improved and simplified the protocol to enable effective analysis of ~ 0.1 mg fresh-frozen and formalin-fixed paraffin-embedded (FFPE) tissue samples (Gao, et al. J Prot Res. 2020; Zhu, et al. Mol Oncol. 2019. Cover paper; Gao, et al. J Prot Res. 2020; Cai, et al. Nat Protoc. 2022). More recently, we have coupled tissue expansion technology with diaPASEF analysis to achieve effective proteomic analysis of even smaller tissue samples i.e. down to ~0.7 nL FFPE tissue (~ 100 cells) (Li, et al. In revision).
01.2 MS analysis. Data-independent Acquisition (DIA) MS allows reproducible and high-throughput proteomic analysis of minute amounts of samples as we reviewed (Zhang, et al. Proteomics. 2020. Top 10 accessed paper in the journal). To further increase the throughput of MS analysis, we optimized LC-MS workflows, including microflow LC for SWATH/DIA-MS analysis (Sun, et al. J Prot Res. 2020) and ScanningSWATH (Gao, et al. J Prot Res. 2022). We also developed PulseDIA, a multi-injection gas-phase fractionation (GPF) strategy to increase proteomic depth of DIA-MS (Cai, et al. J Prot Res. 2021). In addition, we optimized the TMTpro 16plex workflow in our lab (Zhou, et al. in press).
01.3 MS raw data analysis. The computational pipeline for building spectral libraries has evolved rapidly in recent years. We developed an open-source computational pipeline for DIA library building and a comprehensive spectral library for human proteome analysis for over 14,000 protein groups for DIA and MRM/PRM analysis in Orbitrap (Zhu, et al. GPB. 2020. Highly accessed paper). Our team also developed a computational methodology to refine a pan-human spectral library for tissue-specific DIA analysis (Ge, et al. J Prot Res. 2021).
01.4 DIA tensor. We have developed a universal tensor format for DIA-MS data with reduced file size. DIAT can be easily visualized and fed directly into a deep neural network to predict phenotypes (Zhang, et al. JASMS. 2020). We have further developed computational pipelines for converting DIAT back to mzXML so that DIAT processed by AI-empowered signal processing algorithms could be subject to conventional peptide and protein identification (Zhang, et al. in preparation). We have also developed new de-noising technologies.
01.5 Bioinformatics analysis. Data quality control, effective data analysis, and visualization are major bottlenecks in analyzing large proteomic data sets. We developed a web server, ProteomeExpert, for effective experimental design and analysis of large-scale proteomics data sets (Zhu, et al. Bioinformatics. 2021), and a web server for batch effect evaluation and correction (Zhu, et al. J Prot Res. 2020).
Altogether, these technological advances, as summarized in an invited review (Xiao, et al. Adv Drug Deliv Rev, 2021) and an invited clinical proteomics snapshot paper (Zhu, et al. Cell. 2021), enables robust, high-throughput and deep proteomics analysis of minute amounts of clinical specimens.
02.1 The clinical needs. Thyroid nodules are detected in 50-60% of the general population, but only a small minority of these (10-20%) are malignant and/or clinically relevant. Precise diagnosis of thyroid nodules remains a global and pressing clinical need. In addition, there is urgent need to identify novel therapeutic targets for certain types of treatment-refractory thyroid cancers.
02.2 Development of protein-based classifiers for thyroid nodules. Although several nucleic acid-based mutation analysis and diagnostic tests have been developed, their limited specificities still lead to unnecessary surgeries. To develop a protein-based classifier for thyroid nodules, we first built a comprehensive spectral library covering five major types of thyroid tissues (Sun, et al. Mol Oncol. 2022. Cover paper). We further applied it to analyze over 1700 thyroid biopsy proteomes from 578 individuals in a multi-center retrospective cohort, and developed a neural network model of 19 protein biomarkers. The classifier was externally validated in independent cohorts (Sun, et al. Cell Disc. 2022). This study shows that integrating high-throughput proteomics and machine learning in multi-center retrospective and prospective clinical cohorts facilitates precise disease diagnosis which is otherwise difficult to achieve by other methods.
Since 2020, we considered it our top priority and societal responsibility to contribute our proteomic expertise and resources to deepen understanding of host responses to SARS-CoV-2 and thereby rapidly derive clinically relevant results. The central research objective was to understand the molecular mechanisms underlying diverse host responses to SARS-COV-2 and its vaccine. Our research proposed novel tests for COVID-19 diagnosis and prognosis, and nominated therapeutic targets.
03.1 Host responses in the COVID-19 blood and urine. We first performed proteomic and metabolomic profiling of sera from COVID-19 patients and control subjects, and established a protein classifier for identifying severe cases. We also identified molecular changes in the sera of COVID-19 patients compared to other groups which implicated macrophage dysregulation, platelet degranulation, complement system pathways, and massive metabolic suppression (Shen, et al. Cell. 2020). We further showed that the severity of COVID-19 could also be assessed by a protein classifier from urine (Bi, et al. Cell Rep. 2022). More cytokines and many other disease-related proteins can be detected in urine than in serum.
RT-PCR is the primary diagnostic method for COVID-19 and is also used to monitor the disease course. This approach, however, suffers from false negatives due to RNA instability and poses a high risk to medical practitioners. We thus investigated the potential of using serum proteomics to predict viral nucleic acid positivity during COVID-19, and showed that a serum protein-based machine learning model could monitor COVID-19 progression, thus complementing swab RT-PCR tests (Zhang, et al. J Prot Res. 2021).
03.2 Host responses in autopsies and potential therapeutic drug targets. We also reported a proteomic analysis of 144 autopsy samples from seven organs in 19 COVID-19 patients (Nie, et al. Cell. 2021). From this data resource, we identified a potential therapeutic target, cathepsin L1, and observed reduced testicular Leydig cells, among other findings.
03.3 Predicting long COVID-19. Through multiomics analysis of a COVID-19 cohort over two years, we found that COVID-19 patients surprisingly developed new symptoms at two-year revisits. Our findings provide useful evidence to predict and prevent these new consequences (Wang et al. In revision).
03.4 Host responses after vaccination. We performed proteomic analysis of both serum and peripheral blood mononuclear cells from vaccinated individuals with heterogeneous serological responses, and developed a protein classifier to predict the effectiveness of vaccination (Wang, et al. in revision). Our data are of value for individualized vaccination and booster shots planning.
03.5 Host responses to Omicron. Our proteomic analysis of blood samples of Omicron patients and other COVID-19 patients with mild symptoms, flu patients, and healthy controls showed that vaccinated patients infected with the Omicron variant exhibited weaker inflammatory responses than unvaccinated flu controls (Bao, et al. Cell Disc. 2022).
03.6 Proteomic database for COVID-19 proteome. We have compiled and combined all our COVID-19 proteomic data sets with other data sets in the literature in an online webserver (Zhang, et al. submitted).
Most life science studies are focused on a short list of well-studied proteins, resulting in most of the human proteome remaining understudied. This disparity leads to a phenomenon known as the “streetlight effect” or the “rich-get-richer syndrome”, in which the well-studied proteins are more extensively investigated to the neglect of understudied proteins.
Many understudied proteins are not efficiently solubilized, digested, eluted, or ionized for LC-MS analysis. Some are present in specific tissue types or cell types. In addition, huge numbers of proteoforms have not been analyzed. We have accumulated over 60,000 DIA maps from various human specimens, and plan to develop big-data-driven AI-empowered algorithms to analyze novel proteins and proteoforms from this proteomic big data, and establish their association with biological and disease phenotypes. Genetic manipulation such as CRISPR and mechanistic studies in cell lines and animal models will be employed to study the functions of these proteins. We will also explore their druggability. We will collaborate with partners from multiple international initiatives including the Proteomics-driven precision medicine (PDPM), the HUPO Grand Challenge project, the Understudied Protein Initiative, among others, to investigate the expression and functions of understudied proteins.