Bioorthogonal Chemistry-enabled Spatial-Temporal Proteomics
Abstract:Employing small molecules or other chemical means to modulate the function of an intracellular protein of interest, particularly in a gain-of-function fashion, remains highly desired but challenging. In this talk, I will introduce a “genetically encoded chemical decaging” strategy that relies on our recently developed bioorthogonal cleavage reactions to control protein activation in living systems. These reactions exhibit high efficiency and low toxicity for decaging the chemically “masked” lysine or tyrosine residues on intracellular proteins, allowing the spatial and temporal resolved proteomics study in living systems. Most recently, with the assistance of computer-based design and screening, we further expanded our method from “precise decaging” of enzyme active-sites to “proximal decaging” of enzyme pockets. This new method, termed Computationally Aided and Genetically Encoded Proximal Decaging” (CAGE-prox) (CAGE-prox), showed general applicability for switching on the activity of a broad range of proteins under living conditions. I will end by showcasing exciting applications of our CAGE-prox technique on: i) constructing orthogonal and mutually exclusive kinase signaling cascades; ii) temporal caspase activation for time-resolved profiling of proteolytic events upon apoptosis; and iii) on-demand activation of bacterial effectors as potential protein prodrugs for cancer therapy.
Keywords:proteomics, bioorthogonal reaction, spatial and temporal control, living systems
Profiling Intracellular Protein-Protein and Protein-Chemical Interactions at Scale with Cellular Biophysics Proteomics
Abstract:The interaction of proteins with chemicals and other proteins underlies all cellular activities, and many bioactive compounds modulate the molecular functions of proteins through direct physical interactions. Mapping these interactions will reveal internal wiring of the cells providing insight into how these interactions are dysregulated in diseases and perturbed by chemicals. However, there are currently limited time- and cost-effective techniques for global profiling of intracellular protein-protein and protein-chemical interactions. The recent Cellular Thermal Shift Assay (CETSA) verifies the intracellular binding of experimental drugs to its intended protein targets, and had been integrated with quantitative MS (termed Thermal Proteome Profiling, TPP) into a modification-free drug target deconvolution technique. My laboratory had the privilege of contributing to the extension of this technique for metabolite-binding and membrane proteins, and had recently adapted it to simultaneously assess the intracellular assembly state for thousands of protein complexes, based on an unorthodox concept termed Thermal Proximity Co-aggregation (TPCA). Both MS-based CETSA and TPCA require neither tedious modification of chemicals nor cell engineering that greatly increase their throughput potential. I will describe recent experimental and computational advancements made in my laboratory to these techniques for profiling intracellular protein-protein interaction and protein-chemical interaction networks at scale toward “inter”Omics.
Keywords:cellular thermal shift assay, thermal proximity co-aggregation, thermal proteome profiling, protein-protein interaction, protein-chemical interaction
Chemical Labeling-Assisted Glycoproteomics
Abstract:Protein glycosylation, attachment of glycans to specific amino acids via glycosidic bonds, is the most ubiquitous and complex posttranslational modification. Based on the glycosidic bond and glycan structure, the major types of protein glycosylation include N-linked glycosylation, mucin-type O-linked glycosylation, and O-GlcNAcylation. Comprehensive analysis of protein glycosylation is a prerequisite for understanding the biological function of protein glycosylation, but remains challenging. To address this challenge, we take advantage of chemical labeling of glycans with clickable unnatural sugars and develop an effective platform for comprehensive analysis of intact N- and O-glycopeptides in one sample from cell lysates and tissues. The chemical labeling-assisted glycoproteomics strategy is applied to generate large-scale datasets of protein glycosylation in various tissues of mice, demonstrating its potential in facilitating our understanding of glycobiology.
Keywords:metabolic glycan labeling, glycomics, glycoproteomics, click chemistry, intact glycopeptide
Discovery Molecular Targets of Natural Product Tanshintone With Quantitative Proteome
Abstract:Ischemia-reperfusion injury is an important reason for cell and tissue damage in clinical, especially during the therapy procedures of myocardial infarction. Natural products from plant Salvia miltiorrhiza Bge have been used in clinical and show benefit for the cardiovascular diseases. Our lab try to use quantitative proteomics methods to elucidate the molecular mechanisms in the ischemia-reperfusion injury and to discover the potential drug targets of Salvia miltiorrhiza Bge. We also developed a biotin-maleimide probe for electrophilic cysteine profiling, which can quantify more than 18,000 cysteine sites in one experiments. With this method, we discovered several potential targets of natural product Tanshintone.
Keywords:reperfusion injury, phosphorylation, chemical proteome, tanshintone
Systematic Investigation of Key Regulatory Elements for Lysine β-hydroxybutyrylation
Abstract:Short-chain fatty acids and their corresponding acyl-CoAs sit at the crossroads of multiple metabolic pathways and play important roles in diverse cellular processes. A noteworthy example is the newly identified protein posttranlation modifications (PTMs), lysine β-hydroxybutyrylation (Kbhb), which are derived from one of the ketone bodies β-hydroxybutyrate. We have demonstrated that histone Kbhb directly stimulates transcription, and established novel functions for β-hydroxybutyrate to regulate gene expressions. However, key elements for regulating this physiology-relevant pathway remain unknown, hindering characterization of mechanisms by which this modification exert its biological functions. Here we systemetically investigate the key regulatory enzymes and substrates of Kbhb, which will illuminates the landscape of the Kbhb pathway and lays a solid foundation for future studies of this pathway in cellular physiology and human diseases.
Keywords:posttranslational modifications, lysine β-hydroxybutyrylation, regulatory enzymes, substrates
Proteomic Subtyping as a Postoperative Recurrence Risk Assessment Analysis in Hepatocellular Carcinoma
Abstract:Proteomics are increasingly important in predicting clinical outcomes. The quantitative proteomic data has suggested the heterogeneity in early-stage hepatocellular carcinoma (HCC) and are used to stratify the cohort into three subtypes (i.e., S-I, S-II and S-III) with different clinical outcomes, but larger cohorts and comprehensive proteomic analysis are needed in order to provide definitive answers. we established a proteomic pipeline integrating the data-independent acquisition (DIA) on four mass spectrometers (MS) in parallel and spectral library-based database-searching. The proteomics pipeline was not only robust for analyzing snap-frozen sample, but also for formalin-fixed, paraffin-embedded (FFPE) samples. There are 1,024 patients with primary HCC from three independently cohorts in our analyses. To further verify the proteomic subtyping in real world, we identified the subtypes of all the recruited HCC patients of this study using SRPS algorithm. Cox regression analysis showed the proteomic subtypes were significantly associated with overall survival and disease free survival independent of clinical and pathological features. In summary, the proteomic subtypes do not only apply to the early-stage HCC patients with HBV infection, but also suitable to the HCC patients with BCLC stage B, HCC patients without HBV infection or cirrhosis, showing the universality of proteomic subtypes in HCC patients.
Keywords:proteomic stratification, data-independent acquisition, recurrence risk, multi-centric and well characterized cohorts
Genetic variation driven PTM aberrances: from molecular mechanism to targeted cancer therapies
Abstract:Post-translational modifications (PTMs) were critical for regulating cellular processes, and their aberrances were heavily implicated in cancer. Massive PTM sites have been identified through experimental identification and high-throughput proteomics techniques, however, their enzyme-specific regulation remains largely unknown. Recently, we developed the Deep-PLA software for HAT/HDAC-specific acetylation prediction based on deep learning, and employed the protein–protein interaction and co-sublocalization to reduce filter the false positive predictions. Through large-scale prediction based on TCGA cancer omics data, it was observed that mutations more frequently occurred at the region around acetylation sites, and acetylation-related mutations (ARMs) had higher variant allele fraction values than non-ARMs, which meant these mutations might be more functional in cancer. Furthermore, ARM proteins were significantly enriched in cancer genes and druggable proteins, and clinical survival analysis demonstrated that the patients with at least one ARM had significantly worse clinical prognosis in cancers such as head-neck squamous cell carcinoma. Besides the substrates and sites, we also studied the enzymes of PTMs, for example, HER2, which is the targeted kinase by trastuzumab in HER2 positive gastric cancer. Through monitoring patients by ctDNA, it was observed that the mutations of PIK3CA/R1/C3 or ERBB2/4 could indicated the trastuzumab resistance. Additionally, mutations in NF1 contributed to trastuzumab resistance, which was further confirmed through in vitro and in vivo studies, while combined HER2 and MEK/ERK blockade overcame trastuzumab resistance. Taken together, the PTM systems including the substrates, sites and enzymes, were critical in cancer, and further studies should be contributed to this area.
Keywords:acetylation, phosphorylation, mutation, cancer
Continual learning in cryoEM particle picking
Before we can process the biological images of cryo-electron microscopy(cryoEM), the prerequisite is to be able to see the sample of interest, so we need methods such as particle picking to find the protein or other objects of interest. However, limited by the radiation damage, the signal-to-noise ratio of cryoEM micrographs is very low, and hence the particle picking is often challenges and labor consuming. While Template matching has been widely to accelerate particle picking. it requires manual intervention and the provision of templates, consequently, relies heavily on the user's experience and is not friendly to the automated processing.
Deep learning methods are showing great potential due to their ability in object identification. Although deep learning methods no longer require users to directly provide templates, they still need enough training data to ensure the correctness and accuracy of recognition. In order to achieve fully automated, even intelligent, we have introduced a continual learning method based on the deep learning framework for the particle picking task. The significance of continual learning is that deep neural networks can be continuously trained and enhanced. In continual applications, the computer can continuously learn new feature knowledge, accumulate, and become more and more powerful.
Quantitative Proteomics for Ubiquitination Detection and Functional Researches
Abstract:Ubiquitin chains, as the carriers of biological information, perform specific biological functions and constitute the "Ubiquitin Code" system. The lack of systematic and efficient screening strategies for atypical ubiquitin chain modified substrates limits the study on identification and functional mechanism of atypical ubiquitin chain modified substrates. Also the ubiquitination signal network is reversely trimmed through deubiquitinating enzymes (DUB). The specificity of DUBs and ubiquitin chains was challenging but meritorious, which help us deeply understand the precise regulatory mechanism of UPS enzymes and substrates. Technically it is remaining challenge to directly display the specificity of certain DUB for their corresponding ubiquitin linkages. To improve the detection sensitivity and coverage for ubiquitin chains and modified sites, we developed tandem hybrid UBD (ThUBD) to enrich the ubiquitinated proteins and constructed the trypsin and LysargiNase tandem digestion strategy to improve the modified sites identification. According to these high-performance technologies, we used budding yeast to establish a high-throughput profiling and validation strategy for substrate modified by atypical ubiquitin chain based on quantitative proteomics. High-throughput screening of the substrates modified by K11 atypical ubiquitin chain were conducted to reveal its molecular functions on the transcription activation of Met4, providing a theoretical basis for the discovery of new functions of K11 atypical ubiquitin chain. We also employed SILAC quantitative proteomics approaches to systematically evaluate the specificity of DUBs on all seven types of ubiquitin chains. Based on the specificity, we proved the precise regulation and functions of ubiquitin modification on substrates through DUBs. The signal “DUBs – Ub chains – substrate – function” become the basis of precise regulation mechanism of ubiquitin networks.
Keywords:quantitative proteomics, ubiquitin, atypical ubiquitin chains, deubiquitinase
DIA proteomics with in silico spectral libraries by deep learning and DIA glycoproteomics
Data-independent acquisition (DIA) is an emerging technology for quantitative proteomic analysis of large cohorts of samples. However, sample-specific spectral libraries built by data-dependent acquisition (DDA) experiments are required prior to DIA analysis, which is time-consuming and limits the identification/quantification by DIA to the peptides identified by DDA. Recently, we developed DeepDIA, a deep learning-based approach to generate in silico spectral libraries for DIA analysis . We demonstrate that the quality of in silico libraries predicted by instrument-specific models using DeepDIA is comparable to that of experimental libraries, and outperforms libraries generated by global models. With peptide detectability prediction, in silico libraries can be built directly from protein sequence databases. We further illustrate that DeepDIA can break through the limitation of DDA on peptide/protein detection, and enhance DIA analysis on human serum samples compared to the state-of-the-art protocol using a DDA library. Due to the emergence of timsTOF pro and FAIMS, we further extended the tool box of DeepDIA for ion mobility prediction. Now, the DeepDIA also supports data from timsTOF pro and FAIMS orbitrap.
On the other topic, we recently developed GproDIA, a framework for DIA glycoproteomics with comprehensive statistical control by a 2-dimentional false discovery rate approach and a glycoform inference algorithm, enabling accurate identification of intact glycopeptides using wide isolation windows . We benchmark our method for N-glycopeptide profiling on DIA data of yeast and human serum samples, demonstrating that DIA with GlycoSWATH outperforms the data dependent acquisition (DDA) based methods for glycoproteomics in terms of capacity and data completeness of identification, as well as accuracy and precision of quantification. We expect that this work can provide a powerful tool for glycoproteomic studies. Yi Yang, Xiaohui Liu, Chengpin Shen, Yu Lin, Pengyuan Yang, Liang Qiao, Nature Communications, 2020, 11, 146
 Yi Yang, Weiqian Cao, Guoquan Yan, Siyuan Kong, Mengxi Wu, Pengyuan Yang, Liang Qiao, bioRxiv, 2021, doi: https://doi.org/10.1101/2021.03.20.436117
Keywords:data independent acquisition, proteomics, glycoproteomics, deep learning
An RNA tagging approach for system-wide RNA-binding proteome profiling and dynamics investigation upon transcription inhibition
Abstract:RNA-protein interactions play key roles in epigenetic, transcriptional and posttranscriptional regulation. To reveal the regulatory mechanisms of these interactions, global investigation of RNA-binding proteins (RBPs) and monitor their changes under various physiological conditions are needed. Herein, we developed a psoralen probe (PP)-based method for RNA tagging and ribonucleic-protein complex (RNP) enrichment. Isolation of both coding and noncoding RNAs and mapping of 2986 RBPs including 782 un-known candidate RBPs from HeLa cells was achieved by PP enrichment, RNA-sequencing and mass spectrometry analysis. The dynamics study of RNPs by PP enrichment after the inhibition of RNA synthesis provides the first large-scale distribution profile of RBPs bound to RNAs with different decay rates. Furthermore, the remarkably greater decreases in the abundance of the RBPs obtained by PP-enrichment than by global proteome profiling suggest that PP enrichment after transcription inhibition offers a valuable way for large-scale evaluation of the candidate RBPs.
Keywords:RNA-binding proteins, psoralen probe, large-scale, enrichment, mass spectrometry
Cancer Serum Atlas combining pan-targeted mass spectrometry supports proteomics-based multi-cancer diagnosis
Abstract:Early cancer detection could give better chance of long-term survival to cancer patients. The emerging multi-cancer diagnosis approach owns potential to address the large unmet need in more inclusive and cost-effective way. Yet, such approach would require high specificity, sensitivity, and highly accurate tissue of origin (TOO) identification. In this study, we developed a proteomics-based approach for multi-cancer diagnosis. Firstly, we conducted a systematic data-mining of the potentially secreted, cancer-associated proteins from the published clinical-proteomics datasets of seven common cancer types with high morbidity and mortality. Over two thousand proteins were screened as candidate cancer biomarkers that could be detectable in the blood of individuals. Unique peptides of each protein were synthesized and high-quality MS/MS and PRM spectra were acquired. All the result data were presented in the database named “Cancer Serum Atlas”(www.cancerserumatlas.com). Then we developed a pan-targeted MS strategy that can precisely quantify up to 800 proteins in one run and applied this strategy to quantify 485 detectable cancer biomarkers in sera of 293 individuals who are healthy or with 4 different types of cancer. To further improve the specificity of the multi-cancer diagnosis, a previously developed PPC-VDE algorithm was introduced which generated large number of cancer-specific features through quantify the protein-protein co-regulations. Taken together, the Cancer Serum Atlas combining pan-targeted MS approach presented great effectiveness in multi-cancer diagnosis and can be widely used to other blood-based cancer studies.
Keywords:multi-cancer diagnosis, cancer serum atlas, pan-targeted mass spectrometry, protein-protein co-regulation
Bridging Mass Spectrometry with GPCR Biology: Discovery of Potential Therapeutic Targets
Abstract:Transmembrane proteins play vital roles in mediating synaptic transmission, plasticity and homeostasis in the brain. However, these proteins, especially the G protein-coupled receptors (GPCRs), are under-represented in most large-scale proteomic surveys. Here, we present a new proteomic approach aided by deep learning-based spectral library prediction for comprehensive profiling of transmembrane protein families in multiple mouse brain regions. Our multiregional proteome profiling highlights the considerable discrepancy between mRNA and protein distribution, especially for region-enriched GPCRs, and predicts an endogenous GPCR interaction network in the brain. Furthermore, our new approach reveals the transmembrane proteome remodeling landscape in the brain of a mouse depression model, which led to the identification of two novel GPCR regulators of depressive-like behaviors. Our study provides an enabling technology and rich data resource to expand the understanding of transmembrane proteome organization and dynamics in the brain as well as accelerate the discovery of potential therapeutic targets for depression treatment.
Keywords:transmembrane proteins, GPCRs, spectral library prediction, brain proteomics, regulators of depression
Toward Automated Identiﬁcation of Glycan Branching Patterns Using Multistage Mass Spectrometry with Intelligent Precursor Selection
Abstract:Glycans play important roles in a variety of biological processes. Their activities are closely related to the fine details of their structures. Unlike the simple linear chains of proteins, branching is a unique feature of glycan structures, making their identification extremely challenging. Multistage mass spectrometry (MSn) has become the primary method for glycan structural identification. The major difficulty for MSn is the selection of fragment ions as precursors for the next stage of scanning. Widely-used strategies are either manual selection by experienced experts, which requires considerable expertise and time, or simply selecting the most intense peaks by which the product-ion spectrum generated may not be structurally informative and therefore fail to make the assignment. We here report an ‘intelligent precursor selection’ strategy (GIPS) to guide MSn experiments. Our approach consists of two key elements, an empirical model to calculate candidate glycan’s ‘probability’ and a statistical model to calculate fragment ion’s ‘distinguishing power’ in order to select the structurally-most informative peak as the precursor for next-stage scanning. Using 13 glycan standards, including 3 pairs with isomeric sequences, and 8 variously fucosylated oligosaccharides on linear or branched hexasaccharide backbones obtained from a human milk oligosaccharide fraction by HPLC, we demonstrate its successful application to branching pattern analysis with improved efficiency and sensitivity, and also the potential for automated operation.
A personal guide for distilling protein structure and dynamics information from cross-linking MS data
Abstract:Cross-linking mass spectrometry (XLMS) has been increasingly employed for the structural characterization of proteins and protein complexes. Photo- or chemical cross-linking connects two adjacent residues within a relatively short distance, and the cross-linked peptides can be identified by mass spectrometry with high confidence. However, there are three caveats associated with XLMS-based structural biology and structural proteomics:
- Proteins and protein complexes are usually dynamic. Therefore, the cross-linking reaction can capture and manifest alternative protein conformations, while the observed XLMS data should be interpreted with an ensemble of structures.
- Cross-linking implicitly involves two consecutive reactions. Thus, the dynamic timescale of the cross-linker versus the dynamic timescale of the protein can impact the observed XL-MS data. The reaction kinetics issue can especially matter for the intrinsically disordered proteins.
- XLMS data manifest inter-residue distances, which had mostly been represented with straight-line distances. Since the cross-linker cannot penetrate the protein, a new type of distance restraint has been developed to recapitulate the inter-residue solvent-accessible distance.