Fast and Comprehensive Analysis of High Resolution MS/MS Data
Abstract: Tandem mass spectrometry (MS/MS) based Proteomics has become one focus of the biological research in this century. However, the identification rate of the MS/MS data is still very low. When investigating the high resolution and high mass accuracy datasets from one of the top level mass spectrometry laboratories, the rate is only ~ 50%. We use pFind Studio to completely analyze a MS/MS dataset, which is composed of 20631 high resolution HCD spectra generated by a LTQ Orbitrap Velos mass spectrometer. Firstly, pParse is used to detect co-eluted precursors and calibrate the precursor masses. Then a novel open search strategy is followed to efficiently discover high abundant modification types and amino acid mutations. Finally, these modifications are specified in the restricted database search and all peptide-spectrum matches are measured using a uniform score function based on the semi-supervised learning model. Based on the comprehensive analysis using pFind Studio, near 80% MS/MS scans are identified with FDR less than 1% at the peptide level, while only 50 – 60% spectra can be identified for pFind or other search engines using the tradational strategy. Furthermore, the less-confident one hit wonders decreased by ~ 20%, which showed that the proteins identified by the comprehensive analysis approach is more credible.
Key words: Proteomics, MS/MS, Identification Rate, Open Search
Towards Building an Ideal Proteomics Platform
Abstract: Mass spectrometry (MS) based quantification is a powerful tool in biological researches as featuring high sensitivity and throughput in differential protein determination across different pathology/physiology conditions. Previously, we have developed a Fast-seq approach for efficient proteome profiling, and catTFRE strategy for low abundant transcription factors screening. These tools allow us to dissect a proteome with sufficient depth and breadth.
After solving proteome identification issues, precise quantification became a major challenge in proteomics platform. Extracted peptide ion chromatogram (XIC, MS1 mode) and multiple reaction monitoring (MRM, MS2 mode) are two approaches employed in quantitative proteomics. The mass spectrometry signal response of different tryptic peptides from the same protein, and similarly, different fragment ions from the same peptide, can vary up to 100-fold in intensity. Key to a successful MS1 and MS2 type of quantification is the selection of "best- peptides/transitions-responders" that have decent MS response intensity and wide dynamic range, as it determines the accuracy and sensitivity of the assay. However, even though scientists in Proteomic and MS field have realized this critical issue for a while, because of the difficulties in “Best-responder” screening and lack of related database, people chose to keep silence and accepted iBAQ algorithm, an obvious compromise approach in protein quantification. Although the application of iBAQ in proteomic quantification can relief the awkward of no proper MS quantification method, in long term, it do harm to the progress of proteomics as it has inevitable false quantified rate and cannot be applied in single protein/pathway accurate quantification, especially in clinical researches.
We recently generated original data by employing Fast-seq and catTFRE approaches to built SCRIPT-MAP, an experimental database of linear MS response curve of global peptide-transition ions in mammalian proteome. The intensity and dynamic range of over 800,000 transitions from 80,000 peptides (represent almost 9,000 gene products) in serial dilution experiment set were evaluated and presented in SCRIPT-MAP. Low abundant sub-proteomes, such as transcription factors and co-regulators were surveyed by using affinity-based MS strategy. With this experimental dataset, we can select and design targeted "best- peptide/transition-responders" for MS1 or MS2 based quantification by searching SCRIPT-MAP with either pathway or gene/protein list. We hope the combination of Fast-seq, catTFRE and SCRIPT-MAP will provide a reliable identification/quantification solution and eventually facilitate us in better understanding of "driver pathways" in clinical cancer research.
Key words: Proteome Quantification, Best Responder, Cancer
Informatics from a Biologist's Point of View
Abstract: The talk will be about ideas, strategies and their significance, from a biologist's point of view, how biological information flows in the body; how important information can be stored; how to use and compare information generated in biochemical and proteomic experiments. It will relate to biomarker, medical record, peptide identification and biological information extraction from MS data as well as protein interaction identification.
Key words: biomarker, mass spectrometry data analysis, protein interaction
Top-down Proteomics in Health and Disease: Challenges and Opportunities
Abstract: Proteomics is essential for deciphering how molecules interact as a system and for understanding the functions of cellular systems in human disease; however, the unique characteristics of the human proteome, which include a high dynamic range of protein expression and extreme complexity due to a plethora of post-translational modifications (PTMs) and sequence variations, make such analyses challenging. An emerging “top-down” mass spectrometry (MS)-based proteomics approach, which provides a “bird's eye” view of all proteoforms, has unique advantages for the assessment of PTMs and sequence variations. We have shown that top-down MS has unique advantages for unraveling the molecular complexity, quantifying multiple modified protein forms, complete mapping of modifications with full sequence coverage, discovering unexpected modifications, and identifying and quantifying positional isomers and determining the order of multiple modifications. Recently, we have also demonstrated the potential of top-down proteomics for unraveling of disease mechanisms and discovery of new biomarkers. Nevertheless, the top-down approach still faces significant challenges in terms of protein solubility, separation, and the detection of large intact proteins, as well as the under-developed data analysis tools. Consequently, new technological developments are urgently needed to advance the field of top-down proteomics. In this talk I will present our recent developments in top-down disease proteomics platform and its application to understand and diagnose heart failure using animal models, clinical human heart samples, and recombinant protein strategies. Moreover, I will outline the challenges and opportunities facing top-down proteomics strategies aimed at understanding and diagnosing human diseases.
Key words: top-down proteomics, electron capture dissociation, high resolution mass spectrometry, heart failure
Characterization of Large Intact Protein Complexes by Native Mass Spectrometry
Abstract: Conventional top-down proteomics analysis allows the entire sequence of intact protein chains to be observed, providing information on integrity of the whole sequence and on co-occupancies of PTMs. However, in the traditional top-down method, intact proteins are routinely measured under denaturing conditions, destroying noncovalent protein assemblies and substrate bound complexes. Fully active protein assemblies can be studied under native conditions, providing rich information on stoichiometry of complexes and on binding strength of components and/or substrates. Native mass spectrometry (ionization of protein complexes in their native states) experiments are challenging due to the limited surface area of protein complexes for protonation at physiological pH. Ions of large protein complexes have high m/z values and can only be detected by specifically designed FTICR, TOF, or Orbitrap instruments. Using the native MS method, we have analyzed several classes of protein complexes, including a membrane embedded complex and a DNA/protein complex. Association constants for an inhibitor binding to a large protein assembly can be obtained by directly measuring the relative abundances of substrate/protein complex assemblies. Charge state deconvolution software was developed to analyze low signal-to-noise ratio and overlapping native MS spectra.
Key words: protein complexes, native mass spectrometry, protein ligand binding
Comparison of Ultradefinition (UD) MSE and WiSIM-DIA Techniques in the Study of Middle-down Proteomics
Abstract: Unlike biased data-dependent acquisition (DDA), data-independent acquisition (DIA) method enables the acquisition of the fragments coming from all the precursors, regardless of their intensity or other properties. Ultradefinition (UD) MSE is a DDA mass spectrometry method, which employed ion mobility drift time-specific collision-energy profiles to enhance precursor fragmentation efficiency working on Waters Synapt G2-Si. While WiSIM-DIA method working on Thermo Fusion instrument, using precursor ions from SIM scans collected with wide isolation windows, and ion trap CID MS/MS spectra collected in parallel with the SIM acquisition. We applied both methods in our middle-down proteomics study to compare the performance on sensitivity, reproducibility of qualitation and quantification. HEK whole cell lysate samples with trypsin digestion for only 0.5 hr and AspN digestion for 4 hrs, respectively, were used for the comparison. Both samples were run on different loading amount and triplicate for each loading amount. UDMSE data was processed by ProteinLynx GlobalSERVER (PLGS) without any filtering, and WiSIM-DIA data was converted to mgf format, then both kind of data were analyzed with Scaffold software using the same criteria.
Key words: ultradefinition (UD) MSE, WiSIM-DIA, data-dependent acquistion, middle-down proteomics
Functional Glycomics: unveiling the role of protein glycosylation
Abstract: Glycosylation as a major post-translational modification of protein is involved in many important biological processes such as protein folding and quality control, cell adhesion and migration, viral entry and pathogenesis, immune response and regulation, etc. More than 50% of mammalian proteins and about 70% of clinical protein drugs are glycoproteins. It has been tremendously demonstrated that glycan substructures may significantly affect functions of glycoproteins. However, heterogeneity of glycan structures in natural glycoproteins hampers the functional studies on well-defined structure-function relationship. We have developed a chemoenzymatic approach to construct homogeneous glycoforms of proteins for understanding the role of glycosylation in protein functions. This method consists of native N-glycan library construction and enzymatic glycosylation remodeling of functional glycoproteins including therapeutic antibody drugs and drug receptors. Successive functional studies indicated the carbohydrate moieties significantly affect the relative molecular recognition and protein properties. These results suggest a general approach to unveil the role of protein glycosylation in biological events.
Key words: protein glycosylation, chemoenzymatic transglycosylation, homogeneous glycoprotein, functional glycomics
Developing Mass Spectrometry-based Molecular Imaging and Proteomics Strategies for the Studies of Neurological Diseases
Abstract: Advances in mass spectrometry (MS) have made MS–based proteomics a promising tool for protein profiling and biomarker discovery for diagnosis in various types of biological samples including tissues and biofluids. This presentation will focus on our recent progress in the development and application of MS tools for biomarker discovery in several neurological diseases, such as autism and Alzheimer’s disease (AD). The unique challenges associated with the study of nervous systems make it beneficial to focus on neural tissues or proximity fluids. Using a combination of matrix-assisted laser desorption/ionization (MALDI) mass spectrometric imaging (MSI) and top-down proteomics, we highlight the utility of MALDI-MSI as a discovery tool for potential biomarkers. To address remaining technical challenges, we have been exploring new strategies for the production of multiply charged ions in MALDI. Furthermore, endogenous peptidome of the cerebrospinal fluid samples were also examined for the first time to reveal potential AD biomarkers. Finally, novel dimethylated leucine (DiLeu) isobaric tagging reagents were developed and evaluated for their utility in multiplexed quantitative analysis of putative biomarkers. Both relative and absolute quantitation strategies will be presented.
Key words: mass spectrometric imaging, quantitative proteomics, isobaric tagging, DiLeu, peptidomics, neuroproteomics, biomarker discovery
De Novo Glycan Sequencing by Tandem Mass Spectrometry
Abstract: The multilateral functions of glycans in living systems derive from their structural complexity and diversity. Tandem mass spectrometry (MS/MS) has recently emerged as an indispensable tool for structural glycomics. In particular, detailed glycan structural information can be generated by tandem MS analysis employing electron activated dissociation (ExD) methods on high-performance Fourier-transform ion cyclotron resonance mass spectrometers. However, bioinformatics tools for interpretation of complex, high-resolution glycan tandem mass spectral data are currently lacking. In this presentation, we will discuss our current strategy in developing an integrated analytical and bioinformatics pipeline for comprehensive structural analysis of glycans using tandem mass spectrometry.
Key words: electron activated dissociation, Fourier-transform ion cyclotron resonance mass spectrometry, glycan sequencing
Deciphering glycoform of site-specific glycosylation based on high-throughput HCD/CID-MS/MS and MS3
Abstract: High-throughput analysis of site-specific glycosylation remains one of the toughest challenges in Proteomics. We have developed a comprehensive workflow of deciphering glycoform of site-specific glycosylation based on high-throughput HCD/CID-MS/MS and MS3. A database-free algorithm, which is the updated version of the GRIP algorithm, was specifically designed for glycopeptide fragment searching: we first analyzed peptide backbone signal in glycopeptide HCD-MS/MS; after that, glycopeptide fragmentation in CID-MS/MS was systematically screened accordingly. With the introduction of latest Orbitrap Fusion instrument, peptide backbone sequencing could be performed from either CID or HCD-MS/MS, which significantly improves the sequence fragmentation coverage of glycopeptide. Highly efficient pipeline for glycopeptide MS interpretation was also designed. Last but not least, an automated, integrated search engine for glycopeptide, name pGlyco, is under development.At the moment, we could decipher most glycosylation sites on sample of 10 glycoproteins. On average, more than 10 different glycoforms was identified for each site. Fragments on both peptide backbone and glycan were collected for each glycopeptide with strict analysis on false positive rate. Analysis on complex sample will be performed soon after. We believed that our approach could be a powerful tool for site-specific glycosylation study.
Key words: site-specific glycosylation, mass spectrometry, glycoproteomics, bioinformatics
Integrated proteomic and metabolomic analyses of intracellular Salmonella Typhimurium reveal severe iron starvation during infection of epithelial cells
Abstract: As a model bacterial pathogen, Salmonella can gain access into non-phagocytic cells where it proliferates in a unique membrane-bounded compartment. It has long been thought that various nutritional and environmental differences within host cells will force bacterial pathogens to resculpt their proteome. Unfortunately, direct measurements of the in vivo bacterial proteome during infection has been technically challenging, though proteomic analyses of in vitro grown bacteria have been practiced for decades. In order to reveal pathogens’ adaptation mechanisms while residing in their intracellular niche, here we described a strategy that enables the first comprehensive proteomic survey of Salmonella isolated from infected epithelial cells at distinct stages. Among ~2700 detected bacterial proteins, the abundance levels of >100 proteins were significantly altered at the onsite of Salmonella intracellular replication. The most striking differences are massive induction of bacterial iron-acquisition systems, thereby suggesting severe iron limitation in the host. Consistent with this notion, metabolomic analysis of infected mammalian cells reveals substantially elevated levels of iron-chelating small molecules (known as siderophores) as well. To the best of our knowledge, we are the first to provide the most comprehensive and compelling evidence directly on both the protein and metabolite levels that S. typhimurium encounters severe iron starvation during infection. More importantly, our integrated proteomic and metabolomic strategy described in this study can be readily applied to investigate in principle any intracellular bacterial pathogen, thereby allowing us to gain important insight into their infection biology within the host.
Key words: Salmonella proteomics, bacterial infection, iron starvation, siderophores
De novo protein sequencing by combining top-down and bottom-up tandem mass spectra
Abstract: There are two approaches for de novo protein sequencing: Edman degradation and mass spectrometry (MS). Existing MS-based methods characterize a novel protein by assembling tandem mass spectra of overlapping peptides generated from multiple proteolytic digestions of the protein. Because each tandem mass spectrum covers only a short peptide of the target protein, the key to high coverage protein sequencing is to find spectral pairs from overlapping peptides in order to assemble tandem mass spectra to long ones. However, overlapping regions of peptides may be too short to be confidently identified, making it challenging to achieve high coverage of protein sequences. In the past five years, high-resolution mass spectrometers have become accessible to many laboratories. These mass spectrometers are capable of analyzing molecules of large mass and boost the development of top-down MS. Top-down tandem mass spectra cover whole proteins, avoiding the spectral assembly problem of bottom-up tandem mass spectra. But top-down tandem mass spectra, even combined, rarely provide full ion fragmentation coverage of a protein. We propose an algorithm, TBNovo, for de novo protein sequencing by combining top-down and bottom-up MS. In TBNovo, a top-down tandem mass spectrum is utilized as a scaffold and bottom-up tandem mass spectra are aligned to the scaffold to increase sequence coverage. Experiments on real data sets showed that TBNovo achieved high sequence coverage and high sequence accuracy in de novo protein sequencing.
Key words: top-down mass spectrometry, de novo protein sequencing
Advancements in Tandem-MS based proteome quantification
Abstract: With the development of proteome research, quantitative proteomics is becoming a very active area of proteomics research. The development of quantitative proteomics requires the advancement of a number of mass spectrometry (MS)-based quantitative techniques. Tandem-MS based quantitation, including the MS2 and MS3-based quantification, has proven to be a popular quantitative proteomics tool and has been rapidly adopted to study a wide range of biological questions in the few years. Tandem-MS based quantification usually employed the chemical labels to generate the isobaric peptides for quantification. As the labeled peptides result in isobaric masses and are pooled, an increase in precursor ion intensity and decrease of the complexity in the MS trace compared to MS-based quantification methods is achieved. The quantitative ratio then can be differentiated upon peptide fragmentation in the MS/MS spectrum. According to the types of ion used for quantification, the tandem-MS based quantification can be grouped into 1) reporter ions quantification and 2) peptides fragments ions quantification. Reporter ions quantification affords the flexibility and multiplexing capacity; while peptides fragments ions quantification provides better accuracy and precision. Correspondingly, data analysis algorithm were developed to decipher the quantitative information. The recent advancements of tandem-based quantitation approaches was reviewed and the advantages and disadvantages are compared. The trend of tandem MS based quantitation was also prospected.
Key words: quantitative proteome, tandem mass spectrometry
Peptide Terminal Labeling Assisted Proteome Identification and Quantification
Abstract: Isobaric labeling based relative quantification is widely applied in proteomic studies due to the advantage of high accuracy resulting from the usage of peptide specific paired fragment ions. However, the simultaneous fragmentation of differently labeled peptides increases the complexity of MS2 spectrum, which is disadvantageous for the identification of peptides and proteins. Actually, the characteristic of paired fragment ions appearing in the MS2 spectrum can be utilized to correctly identify the peptide fragment ions and reduce the mismatch. Based on this principle, we developed a Paired Ion based Scoring Algorithm (PISA) to improve the number of identified and quantified proteins. In comparison with the other database search based software, more PSMs, peptides and proteins (between 30%-177%) can be identified by PISA algorithm. Moreover, the coverage of quantification achieves 100% since each of the identified PSMs contains several pair of fragment ions. Furthermore, the concept of isobaric labeling assisted proteome identification is extended to denovo sequencing based proteome identification. Due to the application of paired ions, high accuracy of identification is obtained by the easily implemented denovo sequencing algorithm. Finally, the isobaric labeling assisted proteome identification and quantification are applied to the analysis of differentially expressed proteins in human heptacellular cancer cell lines with high and low metastatic rate. Except for the identified differently expressed proteins, the isobaric labeling assisted denovo sequencing could also identify several mutated peptides in the heptacellular cancer cell lines with high metastatic rate.
Key words: proteome identification, isobaric labeling, paired ions, denovo sequencing
Lysine glutarylation is a protein posttranslational modification regulated by SIRT5.
Abstract: We report the identification and characterization of a five-carbon protein posttranslational modification (PTM) called lysine glutarylation (Kglu). This protein modification was detected by immunoblot and mass spectrometry (MS), and then comprehensively validated by chemical and biochemical methods. We demonstrated that the previously annotated deacetylase, sirtuin 5 (SIRT5), is a lysine deglutarylase. Proteome-wide analysis identified 683 Kglu sites in 191 proteins and showed that Kglu is highly enriched on metabolic enzymes and mitochondrial proteins. We validated carbamoyl phosphate synthase 1 (CPS1), the rate-limiting enzyme in urea cycle, as a glutarylated protein and demonstrated that CPS1 is targeted by SIRT5 for deglutarylation. We further showed that glutarylation suppresses CPS1 enzymatic activity in cell lines, mice, and a model of glutaric acidemia type I disease, the last of which has elevated glutaric acid and glutaryl-CoA. This study expands the landscape of lysine acyl modifications and increases our understanding of the deacylase SIRT5.
Key words: protein posttranslational modification (PTM), lysine glutarylation, SIRT5, CPS1
Characterization of glycan microheterogeneity in glycoproteins using liquid chromatography coupled tandem mass spectrometry
Abstract: Glycosylation is the most prevalent posttranslational modification of proteins in mammalian cells and aberrant glycosylation has been recognized as the attribute of many mammalian diseases, such as infection diseases and cancer. In recent years, liquid chromatography coupled tandem mass spectrometry (LC-MS/MS) has become the dominant technology for studying alteration of glycan structures on cell surface glycoproteins due to its high throughput and sensitivity, known as the glycoproteomic approach. However, its systematic application to the analysis of clinical samples (e.g., blood or tissue samples from cancer patients) was hindered by the lack of a suite of software tools for automated analysis of massive MS data, analog to the commonly used tools developed in the proteomics community. In this talk, I will introduce the bioinformatics challenges in glycoproteomics, with a focus on the identification and quantification of intact glycopeptides by using LC-MS/MS and different fragmentation methods, including collision-induced dissociation (CID), high-energy collisional dissociation (HCD) and electron-transfer dissociation (ETD). Specifically, we will present a software package GlycoFragwork developed in my group to address these challenges and its application to the glycopeptide identification from the analysis of complex glycoproteome (e.g., the serum sample from cancer patients) by using conventional proteomics protocols without glycopeptide enrichment. I will also discuss a novel statistical model that can characterize the alteration of relative abundance of different glycoforms, i.e., glycopeptides with different glycans attached to the same glycosylation site in a glycoprotein, across multiple complex samples. We applied these methods to an esophageal cancer study based on blood serum samples in attempt to detect potential biomarkers of site-specific N-glycosylations. We found that a few glycoproteins, including Hemopexin and Vitronectin, showed significantly different abundances at site-specific levels within cancer/control samples, indicating that our method is ready to be used for the discovery of biomarkers of site-specific glycosylations.
Key words: site-specific glycosylation, glycan microheterogeneity, liquid chromatography coupled tandem mass spectrometry, bioinformatics
Iterative Methods in Large Field Electron Microscope Tomography
Abstract: Electron tomography (ET) is a powerful technology allowing the three-dimensional imaging of cellular ultra-structure. These structures are reconstructed from a set of micrographs taken at different sample orientations, the final volume being solution of a general inverse problem. Two different approaches are used in this context: iterative methods and filtered backprojection. Iterative methods are known to provide high-resolution three-dimensional (3D) reconstructions for ET under noisy and incomplete data conditions. However, all previous implementations have been restricted to the straight-line optics assumed for X-rays and biological samples may warp as a result of being exposed to an electron beam. Compensation for curvilinear trajectories, nonlinear electron optics, and sample warping constitutes a major advance in large field electron tomography and has made possible resolution down to the molecular level in reconstructions of whole cells. At present these advances are limited to filtered backprojection. As the next step in this development, we have modified the ASART method in conjunction with a 3D model. By employing alignment based on general curvilinear trajectories we have been able to show that further improvements can be achieved with iterative methods.
Key words: Electron tomography, Three-dimensional reconstruction, Iterative methods, Nonlinear projection model, ASART
Systematically Ranking the Tightness of Membrane Association for Peripheral Membrane Proteins
Abstract: Large scale quantitative evaluation of the tightness of membrane association for non-trans membrane proteins is important for identifying true peripheral membrane proteins with functional significance. Herein, we simultaneously ranked more than 1,000 proteins of the photosynthetic model organism Synechocystis sp. PCC6803 for their relative tightness of membrane association using a proteomic approach. Using multiple precisely ranked and experimentally verified peripheral subunits of photosynthetic protein complexes as the landmarks, we found that proteins involved in two-component signal transduction systems and transporters are overall tightly associated with the membranes, whereas the associations of ribosomal proteins are much weaker. Moreover, we found that hypothetical proteins containing the same domains generally have the similar tightness. This work provided a global view of the structural organization of the membrane proteome with respect to divergent functions, and built the foundation for future investigation of the dynamic membrane proteome reorganization in response to different environmental or internal stimuli.
Key words: membrane, peripheral membrane proteins, proteomics, Synechocystis, tightness of membrane association
Preliminary Application of Mass Spectrometry on the Study of Conotoxins
Abstract: Conotoxins are peptide neurotoxins produced by cone snails to prey and for defense. Their specific and potent activities have attracted much research interest, which has led to a successful development of an analgesic Ziconotide. Yet, the structural diversity and high content of disulfide bonds in conotoxins are two major challenges for the study of conotoxins, in addition that the majority of conotoxins have their physiological target unknown. Recently, by combining the cDNA library and LC-MS/MS analysis, we performed the venomic study of Conus flavidus, and revealed various diversifications of conotoxins, including alternative post-translational modifications and cleavages. Regarding the high content of disulfide bonds, a 10-disulfide-bond-containing alphaD-conotoxin GeXXA could serve as an example. We demonstrated that GeXXA antagonizes nicotinic acetylcholine receptor (nAChR) with a novel mechanism by binding at a novel site. During the study of its structure and functional mechanism, the classical peptide chemistry analysis had to be applied. Therefore, in this talk, by presenting our recent work, I would rather introduce particular difficulties of conotoxin study, expecting further application of mass spectrometry in this field.
Key words: conotoxin, disulfide bond, mass spectrometry
The Influence of temperature on the virulence of Shigella flexneri
Abstract: Shigella flexneri, which is closely related to Escherichia coli, is the most common cause of the endemic form of shigellosis. The expression of its virulence genes is induced under growth conditions similar to those found at the site of invasion. For example, a temperature of 37 °C is a particularly important environmental signal. Bacteria grown at 30 °C were phenotypically avirulent and non-invasive. To upgrade the mechanism of the viulence of Shigella flexneri, the protein expression profiles of cells grown at 30 and 37 °C were thoroughly analyzed using multiple overlapping narrow pH range (between pH 4.0 and 11.0) two-dimensional gel electrophoresis. A total of 723 spots representing 574 protein entries were identified by MALDI-TOF/TOF MS, including the majority of known key virulence factors. A comparison between the two proteome maps showed that most of the virulence related proteins were up-regulated at 37 °C. A further significant finding was that the expression of the protein ArgT was dramatically up-regulated at 30 °C. The results of semiquantitative RT-PCR analysis showed that expression of argT was not regulated at the transcriptional level. Therefore, we carried out a series of experiments to uncover the mechanism regulating ArgT levels and found that the differential expression of ArgT was due to its degradation by a periplasmic protease, HtrA, whose activity, but not its synthesis, was affected by temperature. The cleavage site in ArgT was between position 160 (Val) and position 161 (Ala). In contrast, the ArgT from the nonpathogenic E. coli did not show this differential expression as in S. flexneri, which suggested that argT might be a potential anti-virulence gene. Competitive invasion assays in HeLa cells and in BALB/c mice with argT mutants were performed, and the results indicated that the over-expression of ArgTY225D would attenuate the virulence of S. flexneri. A comparative proteomic analysis was subsequently performed to investigate the effects of ArgT in S. flexneri at the molecular level. We show that HtrA is differentially expressed among different derivative strains. On the other hand, interactions with other proteins are vitally important for the majority of proteins to carry out their biological functions In living cells. Therefore, the abundance of the protein complexes in Shigella flexneri was compared following growth at 37 or 30°C, and the abundance of three protein complexes (PyrB-PyrI, GlmS, and MglB) related to the synthesis of lipopolysaccharides (LPS) appeared to be temperature-dependent. Many studies have shown that LPS is essential to the virulence of S. flexneri. Here, we report the influence of temperature on the amount of LPS. These results may provide useful insights for understanding the physiology and pathogenesis of S. flexneri.
Key words: Shigella flexneri, proteome, protein complexes, relative abundance, ArgT
Network-based analysis of mouse testicular phosphoproteome
Abstract: Recent progresses in phosphoproteomics have identified more than 400,000 phosphorylation sites, which are fundamental for understanding the regulatory mechanisms of phosphorylation. However, how to efficiently retrieve useful information from flood of data is still a great challenge. Here we present an integrative study of phosphorylation events in mouse testis. Large-scale phosphoproteome profiling in the adult mouse testis identified 17,829 phosphorylation sites in 3,955 phosphoproteins. Although only approximately half of the phosphorylation sites enriched by IMAC were also captured by TiO2, both the phosphoprotein datasets identified by the two methods significantly enriched the functional annotation of spermatogenesis. Thus, the phosphoproteome profiled in this study is a highly useful snapshot of the phosphorylation events in spermatogenesis. To further understand phosphoregulation in the testis, the site-specific kinase-substrate relations (ssKSRs) were computationally predicted for re-constructing kinase-substrate phosphorylation networks (KSPNs). Network-based analyses demonstrated that a number of protein kinases such as MAPKs, CDK2 and CDC2 with statistically more ssKSRs might have significantly higher activities and play an essential role in spermatogenesis, and the predictions were consistent with previous studies on the regulatory role of these kinases. In particular, the analyses proposed that the activities of POLO-like kinases (PLKs) might be dramatically higher, while the prediction was experimentally validated by detecting the phosphorylation level of pT210, an indicator of PLK1 activation, in testis and other tissues. Further experiments showed that the inhibition of PLKs could decrease cell proliferation by inducing G2/M cell cycle arrest. Taken together, this systematic study provides a global landscape of phosphoregulation in the testis, and should prove to be of value in future studies of spermatogenesis.
Key words: phosphorylation, phosphoproteome, spermatogenesis, PLK, testis
Glycoproteomics analysis and data processing
Abstract: The detailed characterization of site-specific glycosylation requires the identification of glycan composition and specific attachment sites on proteins, which need the identification of intact glycopeptides by mass spectrometry. We present an analytical and computational strategy for the high throughput characterization of intact N-glycopeptides derived from complex proteome samples. N-glycopeptides were identified using the spectra acquired for intact glycopeptides as well as de-glycopeptides. The strong correlation of retention times of intact peptides and their deglycosylated forms effectively filtered out random matches. The fully automated software platform integrates all of the above processes involved in the identification of the intact N-glycopeptides. This platform was applied to detailed characterization of site-specific glycosylation in HEK 293T cells, which led to the identification of 2249 unique intact N-glycopeptides. These results form by far the largest dataset of glycosylation from mammalian samples and provide a unique resource for studying the influence of glycosylation on protein function. Enrichment of glycopeptides by hydrazide chemistry (HC) is a popular method for glycoproteomics analysis. However, possible side reactions of peptide backbones during the glycan oxidation have not been comprehensively studied. Here, we developed a proteomics approach to locate such side reactions and successfully identified several types of the side reactions that could seriously compromise the performance of glycoproteomics analysis.
Key words: protein glycosylation, intact glycopeptide, glycoproteomics
Site-specific glycoform assignment through glycoproteomics approaches
Abstract: The factors affecting the heterogeneities of glycosylation and the role of different glycoforms in harmonizing the function of glycoproteins are important questions for glycobiology. To explore these questions, the information of large amount of site-specific glycosylation is needed. The development of high throughput glycoproteomics approaches helped to unravel the distribution of N-glycosylation sites in a large scale. However, the site-specific characterization of N-glycoforms in -omic range is still a great challenge. Here, we presented a strategy of N-glycoproteome research based on inhibitors of N-glycan biosynthesis which can discriminate the glycosites carrying complex type glycans from those carrying high-mannose and hybrid types. Two chemical molecules, swainsonine and 1-deoxymannojirimycin were used to interfere with the N-glycan biosynthesis process, and hybrid and/or high-mannose type N-glycans were accumulated as the results of inhibition. The ConAlectin was applied to enrich the glycopeptides before and after the inhibition. Comparing the glycosites identified before and after inhibition through label-free quantitation, glycosites carrying complex type N-glycan could be picked up and identified. In total, 2498 unique N-glycosites from 898 proteins have been identified from HepG2 cells and the N-glycosylation type on each site was analyzed, in which 803 sites were found bearing complex-type glycans. Many CD molecular bearing different type of glycoforms were revealed, which may dictate the different function of the CD molecular. The site-specific information of glycosylation was used to analyze the structure character, location and function of different sites and glycoproteins.
Key words: Glycorpteomics, Swainsonine; 1-deoxymannojirimycin; Glycoform; Lectin; Mass sepctrometry
Exploring the Origins of Proteins by ATP Selection in a Random Peptide Library Consisting of Reduced Amino Acids
Abstract: Following the successful prebiotic synthesis of amino acids, the next challenge in study of origins of life is to elucidate the mechanisms underlying the origins of proteins. Through synthesizing a large number of theoretical findings on protein evolution, we proposed a ligand-selection model for protein origins, which stated that the most ancient proteins (which used c.37 fold) originated from ATP selection in a pool of random peptides. Ten years ago, Szostak and co-workers have performed an mRNA display-based ATP-selection experiment and obtained four families of ATP-binding proteins from a large collection of random sequences. Although the 3D structure of one protein is like c.37 fold, its sequence is totally different from those of c.37 proteins. Because in the primordial world some pre-biosynthetic amino acids were more available than others, a random peptide library of reduced amino acid alphabet may be more suitable for exploring the origins of proteins. By using cDNA display technique, here we construct a random peptide library consisting of 15 kinds of amino acids, and then use ATP to do six-round in vitro selection. By means of next-generation sequencing, the most prevalent sequence is defined. It is intriguing to find that this sequence is most like c.37.1 fold superfamily and has ATP-binding and ATP-hydrolysis activities, which provides direct evidence to support the scenario that the primordial proteins originated through ATP selection from a random peptide pool.
Key words: origins of proteins, cDNA display, in vitro selection, next-generation sequencing
Mass Spectrometric Quantification of Interactions between Proteins and Drugs
Abstract: Studies of interactions between proteins and drugs are important for understanding of drug ADME (Absorption, Distribution, Metabolism and Excretion) processes. Conventional crystallographic techniques are able to show structures of protein-drug complexes in solid state but the three dimensional structures of proteins in aqueous solution and its contribution to binding with different ligands remain largely unknown. We described here a mass spectrometry-based technique that can quantitatively determine the intrinsic binding affinity, the affinitive variation induced by exogenous chemicals and the topological dependence of proteins.
Key words: Mass Spectrometric Quantification, Protein-Drug Interaction
Technology development for membrane protein analysis and its applications
Abstract: Membrane proteins play an important role in a variety of cellular functions, ranging from signal transduction, subcellular compartmentalization, membrane trafficking, and protein secretion, in addition to their function in providing and maintenance of the structural integrity of membranes. Not surprisingly, the membrane associated proteins account for nearly 60% of pharmaceutical drug targets. Effecient analysis of membrane protein and correct linking the structure and function of membrane protein will be hlepful to improve the successful rate of drug design. Unfortunately, membrane proteins are often underrepresented in proteomic experiments due to their low abundance and their hydrophobicity. Clearly, we need technologies for the study of the membrane proteome. Therefore, membrane protein analysis represents a major technological challenge in proteomics and analytical chemistry. We recently developed a set of technologies, based on the strong cation exchange material and pH gradient elution, facilitating membrane protein analysis. This new approach will be a promising tool for drug membrane protein target discovery and for the biomarker discovery on clinic samples.
Key words: membrane protein, strong cation exchange, pH gradient
University of MichiganHomepage
Preliminary Lecture 1: Combined transcriptome and proteome analysis: methods and applications
Time: 10:00 am - 12:00 pm, Tuesday, April 22, 2014
Location: 1-312, FIT Building, Tsinghua University
Host: Dr. Ting Chen
Abstract (Click to hide <<):
There is an increasing interest in combined analysis of transcriptome and proteome data taking advantage of new technologies such as next-generation transcriptome sequencing (RNA-Seq) and highly sensitive mass spectrometry instrumentation. In this presentation, we will start by discussing the computational approaches for label-free protein quantification and integration of protein and transcript abundance data, including the difficulty of linking protein and transcript level information in the presence of transcript/protein isoforms. We will then present the results of a joint analysis of transcriptome and proteome data using two human prostate cancer cell lines, VCaP and RWPE. Our analysis will focuson the relationship between transcripts and proteinsindependently in each of thecell lines, as well as on the differences in this relationship (concordance or discordance) across the cell lines as a way to gain a better understanding of changes at the pathway level related to the VCaP prostate cancer model.We will demonstrate how RNA-Seq datacan be used for improved protein identification in shotgun proteomics, and will discuss emerging proteogenomics applications focusing on the validation of predicted gene models using mass spectrometry data.
Preliminary Lecture 2: Computational tools for untargeted proteomics using data independent acquisition mass spectrometry
Time: 10:30 am - 12:00 pm, Wednesday, April 23, 2014
Location: Room 446, Institute of Computing Technology, Chinese Academy of Sciences
Host: Dr. Si-Min He
Abstract (Click to hide <<):
Improvements in the scanning rates and the accuracy of mass measurement achieved in the latest generation of mass spectrometers (MS) have enabled several practical implementations of the so called data independent acquisition (DIA) strategy. Recent examples of DIA approaches include SWATH-MS, which uses a fast scanning Q-TOF instrument to systematically fragment the entire useful mass range in increments of a few tens of Daltons. We start by reviewing the DIA and DDA MS strategies, and their application in proteomics research. We will then focus on the computational strategies for the analysis of DIA data. At present, DIA data is analyzed using in a targeted fashion that requires pre-existing spectral libraries (generated using convention data dependent data, DDA). We demonstrate that SWATH-MS can also be employed for identification of peptides in an untargeted workflow that does not rely on spectral libraries and prior DDA data acquisition. We describe anopen source software tool that extracts signal features from SWATH MS1 and MS2 data and assembles them into pseudo-MS/MS spectra that are fully compatible with conventional database search engines and error rate estimation approaches. We show that the method efficiently and reproducibly identifies peptides and proteins in SWATH-MS data from samples of low and high complexity.We discuss the advantages of the untargeted strategy, but also its drawbacks observed in the case of low abundance peptides in samples with a large dynamic range of protein abundances. We also discuss the differences between our untargeted approach and the existing targeted data extraction approaches, and propose a “semi-targeted” strategy for more efficient analysis of DIA data. The computational method and the software are not restricted to SWATH-MS data, and will be demonstrated using DIA data generated on a Q Exactive Plus and using Waters MSE data.
Preliminary Lecture 3: Reconstruction of protein interaction networks using affinity purification mass spectrometry technology
Time: 14:00 pm - 15:30 pm, Wednesday, April 23, 2014
Location: Room 446, Institute of Computing Technology, Chinese Academy of Sciences
Host: Dr. Si-Min He
Abstract (Click to hide <<):
Affinity purification followed by mass spectrometry (AP-MS) has become a commonly used method for the identification of protein-protein interactions and protein complexes. We will start with a review of the most commonly used experimental AP-MS workflows, with an emphasis on data analysis challenges typically encountered in such studies. We will review computational and informatics strategies for detecting specific protein interaction partners in AP-MS experiments, and will contrast computational methods developed for genome-wide interactome mapping studies with those applicable to more frequently generated small to intermediate-scale datasets. We will discuss the current state of the computational tool such as SAINT and CRAPome (www.crapome.org) that were developed in our lab. We will also discuss related issues such as combining multiple biological or technical replicates, dealing with data generated using different tagging strategies, and integration of AP-MS data with structure-based protein interaction predictions. We will also discuss the use of label-free quantification in clustering of AP-MS protein interaction data for improved reconstruction of protein complexes, as well as for detection of quantitative changes in the composition of protein complexes as a function of the cell state.
About Alexey Nesvizhskii
Dr. Alexey Nesvizhskii is a tenured Associate Professor in the Departments of Computational Medicine & Bioinformatics and Pathology at the University of Michigan, Ann Arbor. He received his M.S. degree (with honors) from St. Petersburg State Technical University, Russia in 1995 and Ph.D. degree in Physics from the University of Washington in 2001.
Dr. Nesvizhskii's research laboratory (www.nesvilab.org) is working in the area of bioinformatics, proteomics, and systems biology. He has published more than 100 manuscripts in international scientific journals, including first or senior author publications in such leading journals as Science, Nature Methods, Molecular Systems Biology, and Nature Communications. His works are cited more than 11,600 times, H-index of 41 (Google Scholar; April 2014). In 2007, he was named a "Rising Young Investigator" by Genome Technology magazine (USA).
Dr. Nesvizhskii serves as Senior Editor in the area of bioinformatics and biostatistics for international journalsProteomics and Proteomics-Clinical Applications, as Section Editor in the area of proteomics for BMC Bioinformatics, and on the Editorial Boards of Molecular and Cellular Proteomics.