Cluster and dataset comparison

Inclusion Criteria for Data in the Single Cell Type Resource

The scRNA-seq dataset was retrieved from published studies based on healthy human tissues. We performed meta-analysis of literature on scRNA-seq and searched single cell databases, including the Single Cell Expression Atlas (https://www.ebi.ac.uk/gxa/sc/home), the Human Cell Atlas (https://www.humancellatlas.org), the Gene Expression Omnibus (https://www.ncbi.nlm.nih.gov/geo/), the Tabula Sapiens (https://tabula-sapiens-portal.ds.czbiohub.org/), the Allen Brain Atlas (https://portal.brain-map.org/) and the European Genome-phenome Archive (https://www.ebi.ac.uk/ega/). To avoid technical bias and to ensure the single cell datasets can best represent the corresponding tissues, we applied the following criteria for data selection: (1) Single cell RNA sequencing was performed on single cell suspension from tissues without pre-enrichment of cell types; (2) Datasets included >3,000 cells and 20 million read counts; (3) Pseudo-bulk gene expression profiles were highly correlated with bulk RNA-seq profiles. In total, datasets from 30 tissue types and human blood were included. The samples, their references, and cluster details are listed here.

Tabula Sapiens

The Tabula sapiens project (Tabula Sapiens Consortium* et al. (2022)), includes nearly 500,000 cells from 24 different tissues and organs. The data is publicly available (https://tabula-sapiens.sf.czbiohub.org/) and included in the CZ CellxGene tool where you can explore separate cell types across tissues, here exemplified by epithelial cells across the different tissue samples.

Currently, 6 tissues represented in the aggregated HPA single cell type data, which is used for cell type classification, are imported from the Tabula Sapiens (lung, prostate, salivary gland, thymus, tongue and vascular). Additionally, another 13 tissues are now included in the HPA Single Cell Type resource, for comparison and validation of cell cluster expression profiles. For the 6 tissues already represented by Tabula Sapiens, the original clustering is added to the gene detail pages for easy comparison of cluster expression overview.

Tabula sapiens clustering with HPA clustering of the same data

In the case of the 6 tissue types (lung, prostate, salivary gland, thymus, tongue and vascular) represented by Tabula Sapiens data in the HPA aggregated cell type expression profile, the addition of Tabula Sapiens' own clustering details enables comparison and verification of the HPA pipeline robustness.

Tabula Sapiens data but with the HPA pipeline and clustering methods.

Tabula Sapiens data with Tabula Sapiens clustering details.

Lung

DNAI2 is a protein enriched in ciliated cells, highly expressed in the cilia cluster of the lung sample.


DNAI2 - lung

DNAI2 - Lung


DNAI2 - lung

DNAI2 - Lung

PDGFRA is a protein enriched in fibroblasts.


PDGFRA - lung

PDGFRA - Lung


PDGFRA - lung

PDGFRA - Lung

Prostate

KLK3 is a protein enriched in prostatic glandular cells.


KLK3 - prostate

KLK3 - Prostate


KLK3 - prostate

KLK3 - Prostate

CNN1 is a protein enriched in smooth muscle cells.


CNN1 - prostate

CNN1 - Prostate


CNN1 - prostate

CNN1 - Prostate

Salivary gland

MACC1 is a protein enriched in mucus glandular cells of the salivary gland.


MACC1 - salivary gland

MACC1 - Salivary gland


MACC1 - salivary gland

MACC1 - Salivary gland

LPO is a protein enriched in serous glandular cells of the salivary gland.


LPO - salivary gland

LPO - Salivary gland


LPO - salivary gland

LPO - Salivary gland

Thymus

THEMIS is a protein enriched in T-cells.


THEMIS - thymus

THEMIS - Thymus


THEMIS - thymus

THEMIS - Thymus

Tongue

KRT5 is a protein enriched in basal keratinocytes.


KRT5 - tongue

KRT5 - Tongue


KRT5 - tongue

KRT5 - Tongue

Vascular

SELE is a protein enriched in endothelial cells, and specifically detected in the endothelial cell clusters.


SELE - vascular

SELE - Vascular


SELE - vascular

SELE - Vascular

Tabula Sapiens comparison with non-Tabula Sapiens

For the tissues represented by non-Tabula Sapiens data, the addition of Tabula Sapiens clustering data enables a dataset for comparison and result validation. Here, we show examples of expression overview in each of the tissues that are represented by a non-Tabula Sapiens dataset and compare the cell type expression profile with the Tabula Sapiens results. The comparison for these tissues is available for each protein-coding gene at the gene detail page.

Adipose tissue

In the HPA cell type aggregated data, cell data representing the adipose tissue is based on data from Hildreth AD et al. (2021).
LIPE is a protein enriched in adipocytes, confirmed by immunohistochemical staining in the Tissue Atlas.


LIPE - adipose tissue

LIPE - Adipose tissue

The HPA clustering of Hildreth AD et al. (2021) adipose tissue single cell data.


LIPE - adipose tissue

LIPE - Adipose tissue

Tabula Sapiens expression and clustering data.

Bone marrow

In the HPA cell type aggregated data, cell data representing the bone marrow is based on data from He S et al. (2020).
HBD is a erythroid enriched protein, detected in the erythroid cluster of both datasets.


HBD - bone marrow

HBD - Bone marrow

The HPA clustering of He S et al. (2020) bone marrow single cell data.


HBD - bone marrow

HBD - Bone marrow

Tabula Sapiens expression and clustering data.

Eye

In the HPA cell type aggregated data, cell data representing the eye is based on data from Menon M et al. (2019). These datasets differ a lot, since the HPA integrated data is only retina, while the Tabula Sapiens includes the whole eye as a tissue, therefore more cell clusters than retinal cells. RHO is a protein enriched in the rod photoreceptor cells or retina.


RHO - eye

RHO - Eye

The HPA clustering of Menon M et al. (2019) eye single cell data.


RHO - eye

RHO - Eye

Tabula Sapiens expression and clustering data.

Heart muscle

In the HPA cell type aggregated data, cell data representing heart muscle is based on data from MacParland SA et al. (2018).
MB is a protein enriched in the cardiomyocytes of the heart.


MB - heart muscle

MB - Heart muscle

The HPA clustering of MacParland SA et al. (2018) heart muscle single cell data.


MB - heart muscle

MB - Heart muscle

Tabula Sapiens expression and clustering data.

Kidney

In the HPA cell type aggregated data, cell data representing the kidney is based on data from Liao J et al. (2020).
SLC12A1 is a protein enriched in distal tubular cells and collecting ducts, confirmed by immunohistochemical staining in the Tissue Atlas.


SLC12A1 - kidney

SLC12A1 - Kidney

The HPA clustering of Liao J et al. (2020) kidney single cell data.


SLC12A1 - kidney

SLC12A1 - Kidney

Tabula Sapiens expression and clustering data.

Liver

In the HPA cell type aggregated data, cell data representing the liver is based on data from MacParland SA et al. (2018).
HAO1 is protein enriched in hepatocytes of the liver, consistent specificity to the cell clusters independent of the dataset. HNF1B is a cholangiocyte specific protein, and selective nuclear staining of bile ducts in the liver is verified by immunohistochemistry in the Tissue Atlas.


HAO1 - liver

HAO1 - Liver

The HPA clustering of MacParland SA et al. (2018) liver single cell data.


HAO1 - liver

HAO1 - Liver

Tabula Sapiens expression and clustering data.


HNF1B - liver

HNF1B - Liver


HNF1B - liver

HNF1B - Liver

Lymph node

In the HPA cell type aggregated data, cell data representing lymph node is based on data from He S et al. (2020).
MS4A1 is a B-cell specific protein, where the Tabula Sapiens with more cells included offers a better resolution to what type of B-cells are showing expression.


MS4A1 - lymph node

MS4A1 - Lymph node

The HPA clustering of He S et al. (2020) lymph node single cell data.


MS4A1 - lymph node

MS4A1 - Lymph node

Tabula Sapiens expression and clustering data.

Pancreas

In the HPA cell type aggregated data, cell data representing the pancreas is based on data from Qadir MMF et al. (2020).
CPA1 is a protein enriched in pancreatic exocrine glandular cells,


CPA1 - pancreas

CPA1 - Pancreas

The HPA clustering of Qadir MMF et al. (2020) pancreas single cell data.


CPA1 - pancreas

CPA1 - Pancreas

Tabula Sapiens expression and clustering data.

Skeletal muscle

In the HPA cell type aggregated data, cell data representing skeletal muscle is based on data from De Micheli AJ et al. (2020).
MYH2 is a protein enriched in skeletal myocytes.


MYH2 - skeletal muscle

MYH2 - Skeletal muscle

The HPA clustering of De Micheli AJ et al. (2020) skeletal muscle single cell data.


MYH2 - skeletal muscle

MYH2 - Skeletal muscle

Tabula Sapiens expression and clustering data.

Skin

In the HPA cell type aggregated data, cell data representing the skin is based on data from Solé-Boldo L et al. (2020).
KRT10 is a protein enriched in suprabasal keratinocytes.


KRT10 - skin

KRT10 - Skin

The HPA clustering of Solé-Boldo L et al. (2020) skin single cell data.


KRT10 - skin

KRT10 - Skin

Tabula Sapiens expression and clustering data.

Small intestine

In the HPA cell type aggregated data, cell data representing the small intestine is based on data from Wang Y et al. (2020).
ALPI is a protein with elevated expression in proximal enterocytes.


ALPI - small intestine

ALPI - Small intestine

The HPA clustering of Wang Y et al. (2020) small intestine single cell data.


ALPI - small intestine

ALPI - Small intestine

Tabula Sapiens expression and clustering data.

Spleen

In the HPA cell type aggregated data, cell data representing the spleen is based on data from He S et al. (2020).
IGHA2 is a protein enriched in plasma cells.


IGHA2 - spleen

IGHA2 - Spleen

The HPA clustering of He S et al. (2020) spleen single cell data.


IGHA2 - spleen

IGHA2 - Spleen

Tabula Sapiens expression and clustering data.