Cluster and dataset comparisonInclusion Criteria for Data in the Single Cell Type ResourceThe scRNA-seq dataset was retrieved from published studies based on healthy human tissues. We performed meta-analysis of literature on scRNA-seq and searched single cell databases, including the Single Cell Expression Atlas (https://www.ebi.ac.uk/gxa/sc/home), the Human Cell Atlas (https://www.humancellatlas.org), the Gene Expression Omnibus (https://www.ncbi.nlm.nih.gov/geo/), the Tabula Sapiens (https://tabula-sapiens-portal.ds.czbiohub.org/), the Allen Brain Atlas (https://portal.brain-map.org/) and the European Genome-phenome Archive (https://www.ebi.ac.uk/ega/). To avoid technical bias and to ensure the single cell datasets can best represent the corresponding tissues, we applied the following criteria for data selection: (1) Single cell RNA sequencing was performed on single cell suspension from tissues without pre-enrichment of cell types; (2) Datasets included >3,000 cells and 20 million read counts; (3) Pseudo-bulk gene expression profiles were highly correlated with bulk RNA-seq profiles. In total, datasets from 30 tissue types and human blood were included. The samples, their references, and cluster details are listed here. Tabula SapiensThe Tabula sapiens project (Tabula Sapiens Consortium* et al. (2022)), includes nearly 500,000 cells from 24 different tissues and organs. The data is publicly available (https://tabula-sapiens.sf.czbiohub.org/) and included in the CZ CellxGene tool where you can explore separate cell types across tissues, here exemplified by epithelial cells across the different tissue samples. Currently, 6 tissues represented in the aggregated HPA single cell type data, which is used for cell type classification, are imported from the Tabula Sapiens (lung, prostate, salivary gland, thymus, tongue and vascular). Additionally, another 13 tissues are now included in the HPA Single Cell Type resource, for comparison and validation of cell cluster expression profiles. For the 6 tissues already represented by Tabula Sapiens, the original clustering is added to the gene detail pages for easy comparison of cluster expression overview. Tabula sapiens clustering with HPA clustering of the same dataIn the case of the 6 tissue types (lung, prostate, salivary gland, thymus, tongue and vascular) represented by Tabula Sapiens data in the HPA aggregated cell type expression profile, the addition of Tabula Sapiens' own clustering details enables comparison and verification of the HPA pipeline robustness. Tabula Sapiens data but with the HPA pipeline and clustering methods. Tabula Sapiens data with Tabula Sapiens clustering details. LungDNAI2 is a protein enriched in ciliated cells, highly expressed in the cilia cluster of the lung sample. PDGFRA is a protein enriched in fibroblasts. ProstateKLK3 is a protein enriched in prostatic glandular cells. CNN1 is a protein enriched in smooth muscle cells. Salivary glandMACC1 is a protein enriched in mucus glandular cells of the salivary gland. LPO is a protein enriched in serous glandular cells of the salivary gland. ThymusTHEMIS is a protein enriched in T-cells. TongueKRT5 is a protein enriched in basal keratinocytes. VascularSELE is a protein enriched in endothelial cells, and specifically detected in the endothelial cell clusters. Tabula Sapiens comparison with non-Tabula SapiensFor the tissues represented by non-Tabula Sapiens data, the addition of Tabula Sapiens clustering data enables a dataset for comparison and result validation. Here, we show examples of expression overview in each of the tissues that are represented by a non-Tabula Sapiens dataset and compare the cell type expression profile with the Tabula Sapiens results. The comparison for these tissues is available for each protein-coding gene at the gene detail page. Adipose tissueIn the HPA cell type aggregated data, cell data representing the adipose tissue is based on data from Hildreth AD et al. (2021).
Bone marrowIn the HPA cell type aggregated data, cell data representing the bone marrow is based on data from He S et al. (2020).
EyeIn the HPA cell type aggregated data, cell data representing the eye is based on data from Menon M et al. (2019). These datasets differ a lot, since the HPA integrated data is only retina, while the Tabula Sapiens includes the whole eye as a tissue, therefore more cell clusters than retinal cells. RHO is a protein enriched in the rod photoreceptor cells or retina.
Heart muscleIn the HPA cell type aggregated data, cell data representing heart muscle is based on data from MacParland SA et al. (2018).
KidneyIn the HPA cell type aggregated data, cell data representing the kidney is based on data from Liao J et al. (2020).
LiverIn the HPA cell type aggregated data, cell data representing the liver is based on data from MacParland SA et al. (2018).
Lymph nodeIn the HPA cell type aggregated data, cell data representing lymph node is based on data from He S et al. (2020).
PancreasIn the HPA cell type aggregated data, cell data representing the pancreas is based on data from Qadir MMF et al. (2020).
Skeletal muscleIn the HPA cell type aggregated data, cell data representing skeletal muscle is based on data from De Micheli AJ et al. (2020).
SkinIn the HPA cell type aggregated data, cell data representing the skin is based on data from Solé-Boldo L et al. (2020).
Small intestineIn the HPA cell type aggregated data, cell data representing the small intestine is based on data from Wang Y et al. (2020).
SpleenIn the HPA cell type aggregated data, cell data representing the spleen is based on data from He S et al. (2020).
|