In [2]:
%matplotlib inline

Analyze Slide-seqV2 data¶

This tutorial shows how to apply Squidpy for the analysis of Slide-seqV2 data.

The data used here was obtained from {cite}Stickels2020-rf. We provide a pre-processed subset of the data, in {class}anndata.AnnData format. We would like to thank @tudaga for providing cell-type level annotation. For details on how it was pre-processed, please refer to the original paper.

Note: this HTML page is generated with nbconvert. It has not been modified after. This page is meant to highlight the inaccessibility of certain features, and therefore e.g. alt text has not been added afterwards.

Import packages & data¶

To run the notebook locally, create a conda environment as conda env create -f environment.yml using this environment.yml <https://github.com/scverse/squidpy_notebooks/blob/main/environment.yml>_.

In [3]:
import squidpy as sq

print(f"squidpy=={sq.__version__}")

# load the pre-processed dataset
adata = sq.datasets.slideseqv2()
adata
squidpy==1.2.3
100%|██████████| 251M/251M [00:36<00:00, 7.15MB/s] 
Out[3]:
AnnData object with n_obs × n_vars = 41786 × 4000
    obs: 'barcode', 'x', 'y', 'n_genes_by_counts', 'log1p_n_genes_by_counts', 'total_counts', 'log1p_total_counts', 'pct_counts_in_top_50_genes', 'pct_counts_in_top_100_genes', 'pct_counts_in_top_200_genes', 'pct_counts_in_top_500_genes', 'total_counts_MT', 'log1p_total_counts_MT', 'pct_counts_MT', 'n_counts', 'leiden', 'cluster'
    var: 'MT', 'n_cells_by_counts', 'mean_counts', 'log1p_mean_counts', 'pct_dropout_by_counts', 'total_counts', 'log1p_total_counts', 'n_cells', 'highly_variable', 'highly_variable_rank', 'means', 'variances', 'variances_norm'
    uns: 'cluster_colors', 'hvg', 'leiden', 'leiden_colors', 'neighbors', 'pca', 'spatial_neighbors', 'umap'
    obsm: 'X_pca', 'X_umap', 'deconvolution_results', 'spatial'
    varm: 'PCs'
    obsp: 'connectivities', 'distances', 'spatial_connectivities', 'spatial_distances'

First, let's visualize cluster annotation in spatial context with {func}squidpy.pl.spatial_scatter.

In [4]:
sq.pl.spatial_scatter(adata, color="cluster", size=1, shape=None)
WARNING: Please specify a valid `library_id` or set it permanently in `adata.uns['spatial']`
No description has been provided for this image

Neighborhood enrichment analysis¶

Similar to other spatial data, we can investigate spatial organization of clusters in a quantitative way, by computing a neighborhood enrichment score. You can compute such score with the following function: {func}squidpy.gr.nhood_enrichment. In short, it's an enrichment score on spatial proximity of clusters: if spots belonging to two different clusters are often close to each other, then they will have a high score and can be defined as being enriched. On the other hand, if they are far apart, the score will be low and they can be defined as depleted. This score is based on a permutation-based test, and you can set the number of permutations with the n_perms argument (default is 1000).

Since the function works on a connectivity matrix, we need to compute that as well. This can be done with {func}squidpy.gr.spatial_neighbors. Please see {doc}../examples/graph/compute_spatial_neighbors and {doc}../examples/graph/compute_nhood_enrichment for more details of how these functions works.

Finally, we'll directly visualize the results with {func}squidpy.pl.nhood_enrichment. We'll add a dendrogram to the heatmap computed with linkage method ward.

In [5]:
sq.gr.spatial_neighbors(adata, coord_type="generic")
sq.gr.nhood_enrichment(adata, cluster_key="cluster")
sq.pl.nhood_enrichment(
    adata, cluster_key="cluster", method="single", cmap="inferno", vmin=-50, vmax=100
)
100%|██████████| 1000/1000 [00:09<00:00, 110.64/s]
No description has been provided for this image

Interestingly, there seems to be an enrichment between the Endothelial_Tip, the Ependymal cells. Another putative enrichment is between the Oligodendrocytes and Polydendrocytes cells. We can visualize the spatial organization of such clusters. For this, we'll use {func}squidpy.pl.spatial_scatter again.

In [6]:
sq.pl.spatial_scatter(
    adata,
    shape=None,
    color="cluster",
    groups=["Endothelial_Tip", "Ependymal", "Oligodendrocytes", "Polydendrocytes"],
    size=3,
)
WARNING: Please specify a valid `library_id` or set it permanently in `adata.uns['spatial']`
No description has been provided for this image

Ripley's statistics¶

In addition to the neighbor enrichment score, we can further investigate spatial organization of cell types in tissue by means of the Ripley's statistics. Ripley's statistics allow analyst to evaluate whether a discrete annotation (e.g. cell-type) appears to be clustered, dispersed or randomly distributed on the area of interest. In Squidpy, we implement three closely related Ripley's statistics, that can be easily computed with {func}squidpy.gr.ripley. Here, we'll showcase the Ripley's L statistic, which is a variance-stabilized version of the Ripley's K statistics. We'll visualize the results with {func}squidpy.pl.ripley. Check {doc}../examples/graph/compute_ripley for more details.

In [7]:
mode = "L"
sq.gr.ripley(adata, cluster_key="cluster", mode=mode, max_dist=500)
sq.pl.ripley(adata, cluster_key="cluster", mode=mode)
No description has been provided for this image

The plot highlight how some cell-types have a more clustered pattern, like Astrocytes and CA11_CA2_CA3_Subiculum cells, whereas other have a more dispersed pattern, like Mural cells. To confirm such interpretation, we can selectively visualize again their spatial organization.

In [8]:
sq.pl.spatial_scatter(
    adata,
    color="cluster",
    groups=["Mural", "CA1_CA2_CA3_Subiculum"],
    size=3,
    shape=None,
)
WARNING: Please specify a valid `library_id` or set it permanently in `adata.uns['spatial']`
No description has been provided for this image

Ligand-receptor interaction analysis¶

The analysis showed above has provided us with quantitative information on cellular organization and communication at the tissue level. We might be interested in getting a list of potential candidates that might be driving such cellular communication. This naturally translates in doing a ligand-receptor interaction analysis. In Squidpy, we provide a fast re-implementation the popular method CellPhoneDB {cite}cellphonedb (code <https://github.com/Teichlab/cellphonedb>_ ) and extended its database of annotated ligand-receptor interaction pairs with the popular database Omnipath {cite}omnipath. You can run the analysis for all clusters pairs, and all genes (in seconds, without leaving this notebook), with {func}squidpy.gr.ligrec.

Let's perform the analysis and visualize the result for three clusters of interest: Polydendrocytes and Oligodendrocytes. For the visualization, we will filter out annotations with low-expressed genes (with the means_range argument) and decreasing the threshold for the adjusted p-value (with the alpha argument) Check {doc}../examples/graph/compute_ligrec for more details.

In [9]:
sq.gr.ligrec(
    adata,
    n_perms=100,
    cluster_key="cluster",
    clusters=["Polydendrocytes", "Oligodendrocytes"],
)
sq.pl.ligrec(
    adata,
    cluster_key="cluster",
    source_groups="Oligodendrocytes",
    target_groups=["Polydendrocytes"],
    pvalue_threshold=0.05,
    swap_axes=True,
)
100%|██████████| 100/100 [00:07<00:00, 13.48permutation/s]
No description has been provided for this image

The dotplot visualization provides an interesting set of candidate interactions that could be involved in the tissue organization of the cell types of interest. It should be noted that this method is a pure re-implementation of the original permutation-based test, and therefore retains all its caveats and should be interpreted accordingly.

Spatially variable genes with spatial autocorrelation statistics¶

Lastly, with Squidpy we can investigate spatial variability of gene expression. {func}squidpy.gr.spatial_autocorr conveniently wraps two spatial autocorrelation statistics: Moran's I and Geary's C**. They provide a score on the degree of spatial variability of gene expression. The statistic as well as the p-value are computed for each gene, and FDR correction is performed. For the purpose of this tutorial, let's compute the *Moran's I score. See {doc}../examples/graph/compute_moran for more details.

In [10]:
sq.gr.spatial_autocorr(adata, mode="moran")
adata.uns["moranI"].head(10)
Out[10]:
I pval_norm var_norm pval_norm_fdr_bh
Ttr 0.703289 0.0 0.000008 0.0
Plp1 0.531680 0.0 0.000008 0.0
Mbp 0.495970 0.0 0.000008 0.0
Hpca 0.490302 0.0 0.000008 0.0
Enpp2 0.455090 0.0 0.000008 0.0
1500015O10Rik 0.453225 0.0 0.000008 0.0
Pcp4 0.428500 0.0 0.000008 0.0
Sst 0.398053 0.0 0.000008 0.0
Ptgds 0.385718 0.0 0.000008 0.0
Nrgn 0.368533 0.0 0.000008 0.0

The results are stored in adata.uns["moranI"] and we can visualize selected genes with {func}squidpy.pl.spatial_scatter.

In [11]:
sq.pl.spatial_scatter(
    adata,
    shape=None,
    color=["Ttr", "Plp1", "Mbp", "Hpca", "Enpp2"],
    size=0.1,
)
WARNING: Please specify a valid `library_id` or set it permanently in `adata.uns['spatial']`
No description has been provided for this image