Blood Protein - Methods summarySummaryThe Blood Proteins section contains information regarding the proteins present in blood. Externally and “In-house” generated data are integrated to explore human plasma protein profiles in healthy individuals. Plasma levels in blood are presented, based both on antibody-based immune assays and mass spectrometry-based proteomics. Key publicationsUhlén M et al. (2019) “The human secretome” Sci signal What can you learn from the Blood Protein section?Learn about
Data overview
How has the data been generated?The plasma proteome levels from healthy individuals were measured using proximity extension assay (PEA). The healthy individuals were followed longitudinal for two years and the plasma proteome levels were measured every three months. Data generated by proximity extension assays was normalized within and between plates followed by transformation using a predetermined correction factor and provided in the arbitrary unit Normalized Protein eXpression (NPX). To analyze longitudinal healthy datasets while accounting for both within-subject and inter-individual variability, linear mixed-effects models are applied. Models include random intercepts for individual subjects and fixed effects for age and sex. Proteins with more than 80% of samples below the limit of detection are excluded. The Blood Atlas also contain information of proteins detected by mass spectrometry-based proteomics, based on publicly available data on the Peptide Atlas. The mass-spectrometry based data was filtered to include only the minimal, non-redundant list of proteins derived from the set of identified peptides and exclude entries labelled as contaminants. In addition, the concentration of actively secreted proteins was annotated using publicly available literature.
What is presented in the section?In the gene summary page of the Blood Proteins section, the protein levels from the PEA based assays are shown. The longitudinal variation in the expression of a protein in plasma from healthy individuals is displayed in two line plots separated according to gender.
In addition, the gene summary page include the blood concentrations from mass spectrometry studies when available in the Peptide Atlas. The predicted concentration in blood (plasma or serum) is shown.
Furthermore, the blood concentration for the proteins annotated to be secreted to blood is shown (when available). The reported concentration in blood are shown for some representative studies with reference to the literature.
Human Disease Blood Atlas - Method SummarySummaryA comprehensive characterization of the blood proteome profiles in patients with various diseases can contribute to a better understanding of the disease etiology, resulting in earlier diagnosis, risk stratification and better monitoring of the disease progression. Connecting the dynamics of the plasma proteome to functionality across conditions could work as a window into their biology and mechanisms and broaden the horizon for new treatments. Precision Medicine thus aims to allow for an individualized diagnosis, treatment and monitoring of patients, including the use of molecular tools such as genomics, proteomics and metabolomics. Technologies such as Proximity Extension Assay and Targeted Mass Spectrometry are well equipped to do this. In the first version of the Human Disease Blood Atlas, a pan-cancer study covering 12 major cancer types was reported. In the current version, protein profiles for 59 diseases are presented. Key publicationsÁlvez MB et al. (2023) "Next generation pan-cancer blood proteome profiling using proximity extension assay" Nat Commun 14, 4308 (2023). Kotol D et al. (2023) "Absolute quantification of pan-cancer plasma proteomes reveals unique signature in multiple myeloma" Cancers 15(19), 4764 (2023). What can you learn from the Disease Blood Atlas?Learn about
Data overview
How was the Proximity Extension Assay data generated?Next Generation Blood Profiling was performed by combining antibody-based proximity extension assay with next generation sequencing (Wik L et al. (2021)). This method enables the multiplex exploration of protein concentrations in blood from patients with different diseases. Plasma profiles of 1165 proteins from more than 6000 patients representing altogether 59 diseases (Figure 1) were measured in minute amounts of blood plasma collected at the time of diagnosis and before treatment. The diseases in this study belong to different classes, including cardiovascular, metabolic, cancer, psychiatric, autoimmune, infectious, and pediatric diseases.
Figure 1. Overview of pan-disease blood proteome profiling study. Differential abundance analysesTo investigate disease-specific proteome profiles, differential abundance analyses were conducted with the following comparisons:
The models were generated using the limma R package (Ritchie ME et al. (2015)), with the folloring model covariates:
This approach ensures that our analyses account for relevant biological variables while addressing specific issues related to data correlations and sample characteristics. Additionally, control samples were matched to the number of cases based on sex and age to ensure a balanced comparison and reduce potential biases in the analysis. The up- and down-regulated proteins in each disease are summarized in the volcano plots displayed in the sections for the different diseases, and highlighting the most significantly differentially expressed proteins. The results for all diseased patients for each protein target are presented on the individual gene pages. Machine learning analysesAdditionally, a machine learning approach was applied to investigate the plasma proteome in diseases with a sufficient sample size. Regularized logistic regression (lasso) classification models were developed in three settings:
For all comparisons, the data were split into 70% training and 30% testing sets. Models were trained on 100 splits to account for variability in data partitioning. In the Human Disease Blood Atlas, we report the top-ranking features for classification of each disease across all settings, including the average important score and standard deviation, calculated from the absolute model estimates and normalized to a 1-100 scale. How was the Targeted Proteomics data generated?Targeted proteomics is a bottom-up approach where proteases, most commonly trypsin, are used to digest proteins into peptides that can be measured by liquid chromatography-tandem mass spectrometry (LC-MS/MS). This strategy is an excellent tool for performing measurements with high reproducibility and precision, making it appropriate for quantifying proteins in cells, tissues, and blood. Targeted proteomics, as opposed to the widely used data-dependent acquisition (DDA), also known as shotgun proteomics, works with a defined collection of peptides and builds on prior knowledge about the analytes. Generally, peptide quantification can be either relative or absolute. Relative quantification is a method for describing the amount of an analyte in proportion to another measurement of the same analyte across several biological samples or across two groups, as in case-control studies. On the other hand, absolute concentrations can be obtained by spiking samples with known amounts of heavy-labelled standards during the sample preparation workflow. Using isotope-labeled peptides or protein standards can also considerably increase consistency and precision and it can be done at a large scale. A quantitative strategy based on heavy isotope-labeled PrESTs was originally developed as a collaborative effort between Professor Matthias Mann and Professor Mathias Uhlén (Zeiler M et al. (2012)). They introduced the multiplex PrEST-SILAC quantitative approach. This quantitative workflow was based on shotgun proteomics and had the benefit of being relatively simple to execute and straightforward to work with. The addition of stable isotope labeled (SIS) PrESTs, combined with a mass spectrometry readout, can be used in almost any MS setup and analysis mode, including both targeted (SRM, MRM, PRM, DIA) and untargeted (DDA) modes of operation. The standards are added to the sample at the initial stage in the proteomics workflow and, therefore, they can account for potential digestion biases as they generate the same prototypic peptides (Figure 3) and mimic the exact amino acid repertoire of the endogenous protein. When protein standards are not cleaved together with the endogenous proteins from the sample, this bias is a common source of errors that affects almost every LC-MS/MS sample preparation workflow and can be very hard to control for.
Figure 3. The standard's N-terminal sequence enables affinity purification and measurement. The C-terminal portion contains 50–150 human amino acids. Each standard contains numerous tryptic peptides that can be used to measure an unknown sample's target protein. Each SIS-PrEST standard is fully labeled with 13C and 15N enriched arginine and lysine, and the protein sequence used for quantification span shorter amino acid sequences (50-150 aa) representative of the target protein of interest (Figure 4). In the Disease Atlas, 273 SIS-PrESTs were spiked in known concentrations directly into undepleted human blood plasma from 1,469 cancer patients. The spiked amounts were tuned to be as close to a 1:1 ratio with the endogenous proteins as possible. This increases the analytical precision during a one-point calibration-based quantification of the endogenous proteins. The quantitative peptides were selected using the lowest coefficient of variation and highest frequency of detection as selection criteria, while the single best-performing peptide per protein was used.
Figure. 4. Targeted Proteomics workflow using SIS-PrESTs. Production of Standards: PrESTs from the human protein atlas are labeled in high-throughput with heavy Arginine (Arg10) and Lysine (Lys8) amino acid residues. Each PrEST fragment can be individually quantified by the common Q-Tag sequence (also used for purification). Assay Generation: Heavy peptides originating from the PrEST sequence are used to establish targeted assays. The quantitative range is defined, and the protein level in healthy plasma is determined in a pool of healthy volunteers. Targeted Proteomics: SIS-PrESTs are spiked directly into non-depleted human plasma collected from cancer patients and act as internal standards throughout the workflow. Quantitative Mass Spectrometry: Endogenous peptides from each patient is measured together with the spiked internal standard. The known amount of spiked standard is used to calculate the absolute concentration of each protein analyte. What is presented in the section?The protein levels for all cancer patients for each protein target, together with information on whether the target is upregulated in any of the diseases and/or is included in any disease prediction model, are presented on the individual gene summary pages in the Human Protein Atlas. |