Introduction
Understanding the spatial distribution and interplay of cell states in tissue is critical for elucidating tissue formation and function. Single-cell and spatial omics present a promising approach to addressing this need. Traditional methods typically include the identification of highly variable genes, dimensionality reduction, clustering, and the annotation of cells or functions based on gene over-expression. Nevertheless, these qualitative approaches are inadequate for accurately mapping the distributions of spatial features. To address this, integrating biomedical knowledge such as Gene Ontology, KEGG, Reactome, transcription factors, and cell-type marker genes directly allows for the evaluation of cell states from gene expression data, creating quantitative functional pathway profiles at the single captured location.
After quantifying cell functions, analyzing their spatial
distribution and co-distribution with other features can provide deeper
insights into related biological questions. We focus on three aspects:
the spatial variability of cell functions, regions where these functions
cluster, and their co-distribution patterns with other features.
Although existing tools such as SPARK-X(Zhu, Sun, and Zhou 2021),
nnSVG(Weber et al. 2023),
SpatialDE(Svensson, Teichmann, and
Stegle 2018), SpaGFT(Chang et
al. 2024), Seurat(Hao et al.
2023), and Squidpy(Palla et
al. 2022) facilitate the exploration of spatially variable genes,
they are primarily designed for gene-level analysis and lack the
capability to investigate the spatial co-distribution of features.
Additionally, many of these tools, including SpatialDE(Svensson, Teichmann, and Stegle 2018),
SPARK(Sun, Zhu, and Zhou
2020), MERINGUE(Miller et al.
2021), and nnSVG(Weber et al.
2023), face challenges in handling large-scale spatial
transcriptome data due to high memory consumption and low computational
efficiency.
To fill the gaps, we developed SVP to accurately predict
cell states, explore their spatial distribution, and assess their
spatial relationship with other features. SVP is developed based on
SingleCellExperiment class, which can be interoperable with
the existing computing ecosystem of Bioconductor.
Overview of SVP
The evaluation of functional status at individual locations captured
by SVP is achieved using Multiple Correspondence Analysis
(MCA) for dimensionality reduction. This process employs a
standardized gene expression matrix to project both cells and genes into
a unified MCA space. It has been established that this
method allows for the calculation of distances not only between genes
and cells but also between cells and genes, thereby facilitating the
assessment of their associations(Cortal et al.
2021). Proximity in this space indicates a stronger relationship.
These calculated distances are crucial for constructing a weighted
k-nearest neighbors (KNN) network, linking each cell or gene to its most
relevant counterparts. To discern features with varying levels of
proximity, distances are first normalized to a 0-1 scale, with closer
distances approaching 1 and farther distances approaching 0. This
normalization is followed by division by the total distance among the
nearest features, thereby assigning greater connection weights to closer
features. Subsequently, databases of known biological knowledge, such as
transcription factor target gene sets, Reactome functional gene sets,
and cell-type marker gene sets, serve as initial seeds. Random walks on
the constructed weighted KNN network yield preliminary functional state
activity scores for each location. To mitigate potential biases
introduced by dimensionality reduction, a hypergeometric distribution
test enhances the enrichment analysis of top-ranking genes extracted
directly from the expression matrix at each location. These analyses
provide weights for functional activities, culminating in the derivation
of functional activity scores at the single captured location (Fig.
1A).
To identify spatially variable cell functions, we first established a cell neighbor weight matrix based on spot locations using the Delaunay triangulation (default) or KNN algorithm. This weight matrix, alongside global autocorrelation analyses such as Moran’s I (default), Geary’s C, or Getis-Ord’s G, facilitated the identification of spatially variable cell functions or gene characteristics. Additionally, we utilized the same cell neighbor weight matrix and local spatial autocorrelation algorithms (Local Getis-Ord or Local Moran) to delineate the local spatial distribution of these variable features (Fig. 1B). To examine spatial co-distribution among cell functions, we designed a bivariate spatial global and local autocorrelation algorithm employing the Lee index. This approach enables the assessment of correlation between different cell characteristics in their spatial distribution (Fig. 1C).
Overview of SVP.
Install
You can use the following commands to install it
#Once Bioconductor 3.21 is released, you can install it as follows.
if (!requireNamespace("BiocManager", quietly=TRUE))
    install.packages("BiocManager")
BiocManager::install("SVP")
#Or you can install it via github
if (!requireNamespace("remotes", quietly=TRUE))
    install.packages("remotes")
remotes::install_github("YuLab-SMU/SVP")Quantification of cell states using SVP
SVP can integrate gene or protein lists from existing biomedical
databases like cell-type markers (PanglaoDB, CellMarkers), gene ontology
(GO), molecular features (MSigDB), and pathways (KEGG, Reactome,
Wikipathways), transcription factors, and disease ontology, to assess
cell states. Of course, the cell-type markers also can be extracted from
reference single cell data. In this vignette, we use the spatial
transcriptome and single cell transcriptome of mouse olfactory bulb from
the articles(Ståhl et al. 2016; Tepe et al.
2018) to demonstrate. Here, we first perform the normalization by
logNormCounts of scuttle and the
MCA dimensionality reduction by runMCA of
SVP.
# load the packages required in the vignette
library(SpatialExperiment)
library(SingleCellExperiment)
library(scuttle)
library(scran)
library(SVP)
library(ggsc)
library(ggplot2)
library(magrittr)
library(STexampleData)
# load the spatial transcriptome, it is a SpatialExperiment object
# to build the object, you can refer to the SpatialExperiment package
mob_spe <- ST_mouseOB()  
#> see ?STexampleData and browseVignettes('STexampleData') for documentation
#> loading from cache
# the genes that did not express in any captured location were removed.
# we use logNormCounts of scuttle to normalize the spatial transcriptome
mob_spe <- logNormCounts(mob_spe)
# Now the normalized counts was added to the assays of mob_spe as `logcounts`
# it can be extracted via logcounts(mob_spe) or assay(mob_spe, 'logcounts')
# Next, we use `runMCA` of `SVP` to preform the MCA dimensionality reduction
mob_spe <- runMCA(mob_spe, assay.type = 'logcounts')
#> Computing Fuzzy Matrix
#> Computing SVD
#> Computing Coordinates
# The MCA result was added to the reducedDims, it can be extracted by reducedDim(mob_spe, 'MCA')
# more information can refer to SingleCellExperiment package
mob_spe
#> class: SpatialExperiment 
#> dim: 15928 262 
#> metadata(0):
#> assays(2): counts logcounts
#> rownames(15928): 0610007N19Rik 0610007P14Rik ... Zzef1 Zzz3
#> rowData names(1): gene_name
#> colnames(262): ACAACTATGGGTTGGCGG ACACAGATCCTGTTCTGA ...
#>   TTTCAACCCGAGGAAGTC TTTCTAACTCATAAGGAT
#> colData names(4): barcode_id sample_id layer sizeFactor
#> reducedDimNames(1): MCA
#> mainExpName: NULL
#> altExpNames(0):
#> spatialCoords names(2) : x y
#> imgData names(1): sample_idQuantification of BIOCARTA pathway or other function
SVP accepts SingleCellExperiment or
SpatialExperiment object as input. The gene set should be a
named list object, and the elements of the list must be included in the
row names of the SingleCellExperiment or
SpatialExperiment object. To quantifying the
Biocarta pathway or other function for each captured
location, the command runSGSA is used.
#Note: the gset.idx.list supports the named gene list, gson object or
# locate gmt file or html url of gmt.
mob_spe <- runSGSA(mob_spe, 
             gset.idx.list = "https://data.broadinstitute.org/gsea-msigdb/msigdb/release/2024.1.Mm/m2.cp.biocarta.v2024.1.Mm.symbols.gmt", # The target gene set gmt file
             assay.type = 'logcounts', # The name of assays of gene profiler
             gsvaExp.name = 'Biocarta' # The name of result to save to gsvaExps slot
           )
#> Building the nearest neighbor graph with the distance between features and
#> cells ...
#> elapsed time is 2.781000 seconds
#> Building the seed matrix using the gene set and the nearest neighbor graph for
#> random walk with restart ...
#> elapsed time is 0.040000 seconds
#> Calculating the affinity score using random walk with restart ...
#> elapsed time is 1.167000 seconds
#> Tidying the result of random walk with restart ...
#> elapsed time is 0.003000 seconds
#> ORA analysis ...
#> elapsed time is 0.009000 seconds
# Then the result was added to gsvaExps to return a SVPExperiment object, the result
# can be extracted with gsvaExp, you can view more information via help(gsvaExp).
mob_spe
#> class: SVPExperiment 
#> dim: 15928 262 
#> metadata(0):
#> assays(2): counts logcounts
#> rownames(15928): 0610007N19Rik 0610007P14Rik ... Zzef1 Zzz3
#> rowData names(1): gene_name
#> colnames(262): ACAACTATGGGTTGGCGG ACACAGATCCTGTTCTGA ...
#>   TTTCAACCCGAGGAAGTC TTTCTAACTCATAAGGAT
#> colData names(4): barcode_id sample_id layer sizeFactor
#> reducedDimNames(1): MCA
#> mainExpName: NULL
#> altExpNames(0):
#> spatialCoords names(2) : x y
#> imgData names(0):
#> gsvaExps names(1) : Biocarta
gsvaExpNames(mob_spe)
#> [1] "Biocarta"
# The result is also a SingleCellExperiment or SpatialExperiment.
gsvaExp(mob_spe, 'Biocarta')
#> class: SpatialExperiment 
#> dim: 252 262 
#> metadata(0):
#> assays(1): affi.score
#> rownames(252): BIOCARTA_41BB_PATHWAY BIOCARTA_ACE2_PATHWAY ...
#>   BIOCARTA_WNT_LRP6_PATHWAY BIOCARTA_WNT_PATHWAY
#> rowData names(4): exp.gene.num gset.gene.num gene.occurrence.rate
#>   geneSets
#> colnames(262): ACAACTATGGGTTGGCGG ACACAGATCCTGTTCTGA ...
#>   TTTCAACCCGAGGAAGTC TTTCTAACTCATAAGGAT
#> colData names(4): barcode_id layer sizeFactor sample_id
#> reducedDimNames(0):
#> mainExpName: NULL
#> altExpNames(0):
#> spatialCoords names(2) : x y
#> imgData names(0):
# identify the spatially variable Biocarta pathway.
Biocarta.svdf <- gsvaExp(mob_spe, "Biocarta") |>
 runDetectSVG(
   assay.type = 1,
   action = 'only'
 )
#> Identifying the spatially variable gene sets (pathway) based on moransi
#> elapsed time is 0.024000 seconds
Biocarta.svdf |> head()
#>                                obs expect.moransi sd.moransi Z.moransi
#> BIOCARTA_41BB_PATHWAY   0.01971953   -0.003831418 0.03450185 0.6825996
#> BIOCARTA_ACE2_PATHWAY   0.12560773   -0.003831418 0.03245350 3.9884491
#> BIOCARTA_ACH_PATHWAY    0.03431282   -0.003831418 0.03188915 1.1961511
#> BIOCARTA_ACTINY_PATHWAY 0.11817919   -0.003831418 0.03491822 3.4941817
#> BIOCARTA_AGPCR_PATHWAY  0.01336534   -0.003831418 0.03505594 0.4905517
#> BIOCARTA_AGR_PATHWAY    0.02421540   -0.003831418 0.03550837 0.7898651
#>                               pvalue         padj rank
#> BIOCARTA_41BB_PATHWAY   2.474299e-01 0.3379186014  175
#> BIOCARTA_ACE2_PATHWAY   3.325333e-05 0.0002563724   31
#> BIOCARTA_ACH_PATHWAY    1.158188e-01 0.1821098346  152
#> BIOCARTA_ACTINY_PATHWAY 2.377586e-04 0.0012247615   46
#> BIOCARTA_AGPCR_PATHWAY  3.118718e-01 0.4007384690  186
#> BIOCARTA_AGR_PATHWAY    2.148033e-01 0.3002221081  171
# We can obtain significant spatially variable pathways, 
# which are sorted according to the Moran index.
# More details see also the Spatial statistical analysis session
Biocarta.svdf |> dplyr::filter(padj<=0.01) |> dplyr::arrange(rank) |> head()
#>                              obs expect.moransi sd.moransi Z.moransi
#> BIOCARTA_CACAM_PATHWAY 0.3752909   -0.003831418 0.03565151 10.634117
#> BIOCARTA_P53_PATHWAY   0.2604483   -0.003831418 0.03258122  8.111412
#> BIOCARTA_GSK3_PATHWAY  0.2328804   -0.003831418 0.03561091  6.647170
#> BIOCARTA_TFF_PATHWAY   0.2280877   -0.003831418 0.03566444  6.502811
#> BIOCARTA_PRION_PATHWAY 0.2219682   -0.003831418 0.03517042  6.420155
#> BIOCARTA_FOSB_PATHWAY  0.2169072   -0.003831418 0.03589631  6.149340
#>                              pvalue         padj rank
#> BIOCARTA_CACAM_PATHWAY 1.033891e-26 2.471000e-24    1
#> BIOCARTA_P53_PATHWAY   2.501742e-16 2.989582e-14    2
#> BIOCARTA_GSK3_PATHWAY  1.493913e-11 1.190151e-09    3
#> BIOCARTA_TFF_PATHWAY   3.941649e-11 2.355135e-09    4
#> BIOCARTA_PRION_PATHWAY 6.806791e-11 3.253646e-09    5
#> BIOCARTA_FOSB_PATHWAY  3.890305e-10 1.375127e-08    6
ids.sv.biocarta <- Biocarta.svdf |> dplyr::filter(padj<=0.01) |> dplyr::arrange(rank) |> rownames()
# We use sc_spatial of ggsc to visualize the distribution of some spatially variable functions
gsvaExp(mob_spe, 'Biocarta') %>% 
  sc_spatial(
    features = ids.sv.biocarta[seq(12)],
    mapping = aes(x = x, y = y),
    geom = geom_bgpoint,
    pointsize = 3.5,
    ncol = 4,
    common.legend = FALSE # The output is a patchwork 
  ) & 
  scale_color_viridis_c(option='B', guide='none')
#> Scale for colour is already present.
#> Adding another scale for colour, which will replace the existing scale.
#> Scale for colour is already present.
#> Adding another scale for colour, which will replace the existing scale.
#> Scale for colour is already present.
#> Adding another scale for colour, which will replace the existing scale.
#> Scale for colour is already present.
#> Adding another scale for colour, which will replace the existing scale.
#> Scale for colour is already present.
#> Adding another scale for colour, which will replace the existing scale.
#> Scale for colour is already present.
#> Adding another scale for colour, which will replace the existing scale.
#> Scale for colour is already present.
#> Adding another scale for colour, which will replace the existing scale.
#> Scale for colour is already present.
#> Adding another scale for colour, which will replace the existing scale.
#> Scale for colour is already present.
#> Adding another scale for colour, which will replace the existing scale.
#> Scale for colour is already present.
#> Adding another scale for colour, which will replace the existing scale.
#> Scale for colour is already present.
#> Adding another scale for colour, which will replace the existing scale.
#> Scale for colour is already present.
#> Adding another scale for colour, which will replace the existing scale.The heatmap of spatially variable biocarta pathway
Quantification of cell-type
To quantifying the cell-type for each captured location, the gene set
should be the marker gene sets of cell-type. It can be the
pre-established marker gene sets, such as the markers from
CellMarkers or PanglaoDB. High-quality cell
marker genes from past reports or your own compilations are acceptable.
In addition, It can also be extracted from the reference single cell
data. The following sections will detail the specific operations of each
approach.
Quantification of cell-type using the pre-established marker gene sets
We can use the pre-established marker gene sets to quantifying the cell-type for each captured location. Here, we used the markers obtained the differential expression genes from the single cell transcriptome of the article (Tepe et al. 2018).
# The marker gene sets
# We can view the marker gene number of each cell-type.
data(mob_marker_genes)
lapply(mob_marker_genes, length) |> unlist()
#>     GC    PGC   M/TC   OSNs EPL-IN 
#>     14     13     13     34     12
mob_spe <- runSGSA(
                mob_spe, 
                gset.idx.list = mob_marker_genes,
                gsvaExp.name = 'CellTypeFree', 
                assay.type = 'logcounts'
              )
#> Building the nearest neighbor graph with the distance between features and
#> cells ...
#> elapsed time is 2.750000 seconds
#> Building the seed matrix using the gene set and the nearest neighbor graph for
#> random walk with restart ...
#> elapsed time is 0.006000 seconds
#> Calculating the affinity score using random walk with restart ...
#> elapsed time is 0.070000 seconds
#> Tidying the result of random walk with restart ...
#> elapsed time is 0.003000 seconds
#> ORA analysis ...
#> elapsed time is 0.003000 seconds
# Then we can visualize the cell-type activity via sc_spatial of SVP with pie plot
gsvaExp(mob_spe, "CellTypeFree") |> 
   sc_spatial(
     features = sort(names(mob_marker_genes)),
     mapping = aes(x = x, y = y),
     plot.pie = TRUE,
     color = NA,
     pie.radius.scale = 1.1
   ) +
   scale_fill_brewer(palette='Set2')
#> Scale for fill is already present.
#> Adding another scale for fill, which will replace the existing scale.The pie plot of cell-type activity
We cal also display each cell-type activity with heat-map according to the spatial coordinate.
# calculated the of cell-type
assay(gsvaExp(mob_spe, 2)) |> as.matrix() |> 
  prop.table(2) |> 
  Matrix::Matrix(sparse=TRUE) -> 
  assay(gsvaExp(mob_spe,2), "prop")
mob_spe |> gsvaExp(2) |> 
   sc_spatial(
     features = sort(names(mob_marker_genes)),
     mapping = aes(x = x,  y = y),
     geom = geom_bgpoint,
     pointsize = 2,
     ncol = 5,
     slot = 2
   ) + 
   scale_color_viridis_c()
#> Scale for colour is already present.
#> Adding another scale for colour, which will replace the existing scale.the heatmap of each cell-type activity
Quantification of cell-type using the reference single-cell data
This approach requires obtaining the cell-type marker genes from the
reference single-cell data. First, the reference single cell data should
be normalized. Here, the command of logNormCounts of
scuttle is used. Next, the MCA dimensionality
reduction is performed by runMCA of SVP.
Finally, the marker gene sets of cell-type are extracted via
runDetectMarker of SVP.
# load the reference single cell transcriptome
# it is a SingleCellExperiment
data(mob_sce) 
mob_sce <- logNormCounts(mob_sce)
# You can also first obtain the high variable genes, then
# perform the MCA reduction.
# set.seed(123)
# stats <- modelGeneVar(mob_sce)
# hvgs <- getTopHVGs(stats)
# perform the MCA reduction.
mob_sce <- mob_sce |> runMCA(assay.type = 'logcounts')
#> Computing Fuzzy Matrix
#> Computing SVD
#> Computing Coordinates
# obtain the marker gene sets from reference single-cell transcriptome based on MCA space
refmakergene <- runDetectMarker(mob_sce, group.by='cellType', 
  ntop=200, present.prop.in.group=.2, present.prop.in.sample=.5
  )
refmakergene |> lapply(length) |> unlist()
#>     GC    PGC   M/TC   OSNs EPL-IN 
#>     15     14      8     36     11Then, the quantification of the cell-type gene signatures can also be
performed via runSGSA with the marker gene sets.
# use the maker gene from the reference single-cell transcriptome
mob_spe <- runSGSA(
  mob_spe,
  gset.idx.list = refmakergene,
  gsvaExp.name = 'CellTypeRef'
)
#> Building the nearest neighbor graph with the distance between features and
#> cells ...
#> elapsed time is 2.730000 seconds
#> Building the seed matrix using the gene set and the nearest neighbor graph for
#> random walk with restart ...
#> elapsed time is 0.005000 seconds
#> Calculating the affinity score using random walk with restart ...
#> elapsed time is 0.066000 seconds
#> Tidying the result of random walk with restart ...
#> elapsed time is 0.003000 seconds
#> ORA analysis ...
#> elapsed time is 0.003000 seconds
mob_spe |> 
  gsvaExp('CellTypeRef') |> 
  sc_spatial(
    features = sort(names(refmakergene)), 
    mapping = aes(x = x, y = y),
    plot.pie = TRUE, 
    color = NA, 
    pie.radius.scale = 1.1
  ) +
  scale_fill_brewer(palette = 'Set2')
#> Scale for fill is already present.
#> Adding another scale for fill, which will replace the existing scale.The pie plot of cell-type activity with the marker gene sets from reference single-cell transcriptome
We can found that the result based on marker gene sets from reference single-cell transcriptome is similar with the result based on marker gene sets from previous reports.
We can also obtain the major cell-type for each captured location
using pred.cell.signature.
# the result will be added to the colData of gsvaExp(mob_spe, "CellTypeRef")
mob_spe <- mob_spe |> pred.cell.signature(gsvaexp='CellTypeFree', pred.col.name='pred')
#> The `gsvaexp` = CellTypeFree was specified, the specified gsva Experiment will
#> be used to predict the cell signature.
# Then we can use sc_spatial to visualize it.
mob_spe |>
  gsvaExp("CellTypeFree") |>
  sc_spatial(
    mapping = aes(x =  x, y = y, color = pred),
    geom = geom_bgpoint,
    pointsize = 4
  ) +
  scale_color_brewer(palette = 'Set2')The major cell type for each spot
Spatial statistical analysis
SVP implements several spatial statistical algorithms to explore the spatial distribution of cell function or other features from three aspects: whether cell function states have spatial variability, where spatially variable cell functions cluster, and the co-distribution patterns of spatially variable cell functions with other features. The following sections provide a detailed description of each aspect.
Univariate spatial statistical analysis
This section addresses whether the feature is spatially variable and
identifies its high-cluster areas. In detail, to identify the spatial
variable features, SVP implements three spatial
autocorrelation algorithms using Rcpp and
RcppParallel: global Moran’s I, Geary’s C
and Getis-Ord’s G. To identify the spatial high-cluster areas,
SVP implements two local univariate spatial statistical
algorithms using Rcpp and RcppParallel: local
Getis-Ord’s G and local Moran’s I.
Identification of spatial variable features
Here, we identify the spatial variable cell-types using
runDetectSVG, which implements the global Moran’s
I, Geary’s C and Getis-Ord’s G
algorithms.
# First approach:
# we can first obtain the target gsvaExp with gsvaExp names via 
# gsvaExp() function, for example, CellTypeFree
mob_spe |> gsvaExp("CellTypeFree") 
#> class: SpatialExperiment 
#> dim: 5 262 
#> metadata(0):
#> assays(2): affi.score prop
#> rownames(5): GC PGC M/TC OSNs EPL-IN
#> rowData names(4): exp.gene.num gset.gene.num gene.occurrence.rate
#>   geneSets
#> colnames(262): ACAACTATGGGTTGGCGG ACACAGATCCTGTTCTGA ...
#>   TTTCAACCCGAGGAAGTC TTTCTAACTCATAAGGAT
#> colData names(5): barcode_id layer sizeFactor sample_id pred
#> reducedDimNames(0):
#> mainExpName: NULL
#> altExpNames(0):
#> spatialCoords names(2) : x y
#> imgData names(0):
# Then we use runDetectSVG with method = 'moran' to identify the spatial variable 
# cell-types with global Moran's I.
celltype_free.I <- mob_spe |> gsvaExp("CellTypeFree") |>
  runDetectSVG(
    assay.type = "prop", # because default is 'logcounts', it should be adjused
    method = 'moransi',
    action = 'only'
  ) 
#> Identifying the spatially variable gene sets (pathway) based on moransi
#> elapsed time is 0.025000 seconds
  
celltype_free.I |> dplyr::arrange(rank) |> head()
#>               obs expect.moransi sd.moransi Z.moransi       pvalue         padj
#> GC     0.55280022   -0.003831418 0.03599200 15.465427 2.968895e-54 1.484448e-53
#> OSNs   0.34486191   -0.003831418 0.03586205  9.723183 1.200770e-22 3.001924e-22
#> M/TC   0.29854602   -0.003831418 0.03595804  8.409174 2.064542e-17 3.440904e-17
#> PGC    0.28790731   -0.003831418 0.03580376  8.148272 1.845814e-16 2.307267e-16
#> EPL-IN 0.07101694   -0.003831418 0.03555468  2.105162 1.763860e-02 1.763860e-02
#>        rank
#> GC        1
#> OSNs      2
#> M/TC      3
#> PGC       4
#> EPL-IN    5
# Second approach:
# the result was added to the original object by specific the 
# gsvaexp argument in the runDetectSVG
mob_spe <- mob_spe |> 
  runDetectSVG(
    gsvaexp = 'CellTypeFree',  
    method = 'moran',
    gsvaexp.assay.type = 'prop'
    )
#> The `gsvaexp` was specified, the specified `gsvaExp` will be used to detect
#> 'svg'.
#> Identifying the spatially variable gene sets (pathway) based on moransi
#> elapsed time is 0.011000 seconds
#> The result is added to the input object, which can be extracted using
#> `svDf()` with type='sv.moransi' for `SingleCellExperiment` or
#> `SpatialExperiment`.
#> If input object is `SVPExperiment`, and `gsvaexp` is specified
#> the result can be extracted by `gsvaExp()` (return a `SingleCellExperiment`
#> or `SpatialExperiment`),then also using `svDf()` to extract.
gsvaExp(mob_spe, 'CellTypeFree') |> svDf("sv.moransi") |> dplyr::arrange(rank) |> dplyr::filter(padj<=0.01)
#>            obs expect.moransi sd.moransi Z.moransi       pvalue         padj
#> GC   0.5528002   -0.003831418 0.03599200 15.465427 2.968895e-54 1.484448e-53
#> OSNs 0.3448619   -0.003831418 0.03586205  9.723183 1.200770e-22 3.001924e-22
#> M/TC 0.2985460   -0.003831418 0.03595804  8.409174 2.064542e-17 3.440904e-17
#> PGC  0.2879073   -0.003831418 0.03580376  8.148272 1.845814e-16 2.307267e-16
#>      rank
#> GC      1
#> OSNs    2
#> M/TC    3
#> PGC     4The obs is the Moran’s I,
expect.moransi is the expect Moran’s I
(E(I)). Moran’s I quantifies the correlation
between a single feature at a specific spatial point or region and its
neighboring points or regions. Global Moran’s I usually ranges
from -1 to 1, Global Moran’s I significantly above
E(I) indicate a distinct spatial pattern or spatial
clustering, which occurs when neighboring captured locations tend to
have similar feature value. Global Moran’s I around
E(I) suggest random spatial expression, that is, absence of
spatial pattern. Global Moran’s I significantly below
E(I) imply a chessboard-like pattern or dispersion.
# using Geary's C 
mob_spe <- mob_spe |>
  runDetectSVG(
    gsvaexp = 'CellTypeFree',
    gsvaexp.assay.type = 'prop',
    method = 'geary'
  )
#> The `gsvaexp` was specified, the specified `gsvaExp` will be used to detect
#> 'svg'.
#> Identifying the spatially variable gene sets (pathway) based on gearysc
#> elapsed time is 0.011000 seconds
#> The result is added to the input object, which can be extracted using
#> `svDf()` with type='sv.gearysc' for `SingleCellExperiment` or
#> `SpatialExperiment`.
#> If input object is `SVPExperiment`, and `gsvaexp` is specified
#> the result can be extracted by `gsvaExp()` (return a `SingleCellExperiment`
#> or `SpatialExperiment`),then also using `svDf()` to extract.
gsvaExp(mob_spe, "CellTypeFree") |> svDf('sv.gearysc') |> dplyr::arrange(rank) |> dplyr::filter(padj<=0.01)
#>              obs expect.gearysc sd.gearysc Z.gearysc       pvalue         padj
#> GC     0.4469388              1 0.03641714 15.186841 2.161288e-52 1.080644e-51
#> OSNs   0.6498013              1 0.03725246  9.400686 2.710424e-21 6.776061e-21
#> M/TC   0.7016456              1 0.03663755  8.143405 1.921574e-16 3.202624e-16
#> PGC    0.7040939              1 0.03762023  7.865611 1.836503e-15 2.295628e-15
#> EPL-IN 0.9026806              1 0.03914621  2.486050 6.458500e-03 6.458500e-03
#>        rank
#> GC        1
#> OSNs      2
#> M/TC      3
#> PGC       4
#> EPL-IN    5The obs is the Geary’s C. Global Geary’s
C is another popular index of global spatial autocorrelation,
but focuses more on differences between neighboring values. Global
Geary’s C usually range from 0 to 2. Global Geary’s C
significantly below E[C]=1 indicate a distinct spatial
pattern or spatial clustering, the value significantly above
E[C]=1 suggest a chessboard-like pattern or dispersion, the
value around E[C]=1 imply absence of spatial pattern.
# using Getis-ord's G
mob_spe <- mob_spe |>
  runDetectSVG(
    gsvaexp = 'CellTypeFree',
    gsvaexp.assay.type = 'prop',
    method = 'getis'
  )
#> The `gsvaexp` was specified, the specified `gsvaExp` will be used to detect
#> 'svg'.
#> Identifying the spatially variable gene sets (pathway) based on getisord
#> elapsed time is 0.010000 seconds
#> The result is added to the input object, which can be extracted using
#> `svDf()` with type='sv.getisord' for `SingleCellExperiment` or
#> `SpatialExperiment`.
#> If input object is `SVPExperiment`, and `gsvaexp` is specified
#> the result can be extracted by `gsvaExp()` (return a `SingleCellExperiment`
#> or `SpatialExperiment`),then also using `svDf()` to extract.
gsvaExp(mob_spe, "CellTypeFree") |> svDf('sv.getisord') |> dplyr::arrange(rank) |> dplyr::filter(padj<=0.01) 
#>              obs    expect.G         sd.G       Z.G       pvalue         padj
#> GC   0.006569309 0.003831418 0.0001825856 14.995107 3.951750e-51 1.975875e-50
#> OSNs 0.004891059 0.003831418 0.0001180610  8.975375 1.411936e-19 3.529840e-19
#> M/TC 0.005308030 0.003831418 0.0001833906  8.051731 4.081551e-16 6.802585e-16
#> PGC  0.005382313 0.003831418 0.0002026713  7.652268 9.873269e-15 1.234159e-14
#>      rank
#> GC      1
#> OSNs    2
#> M/TC    3
#> PGC     4The obs is the Getis-Ord’s G of each cell-type.
Global Getis-Ord’s G measures global spatial autocorrelation by
evaluating the clustering of high/low value, specifically identifying
hot spots or cold spots. Global Getis-Ord’s
G values have no fixed range, as they depend on the specific
dataset and spatial weight matrix. Global Getis-Ord’s G
significantly above E(G) indicate a distinct spatial
pattern or spatial clustering, which occurs when neighboring captured
locations tend to have similar feature value. Global Getis-Ord’s
G around E(G) suggest random spatial expression,
that is, absence of spatial pattern. Global Getis-Ord’s G
significantly below E(G) imply a chessboard-like pattern or
dispersion.
So the GC, OSNs, M/TC, and
PGC were significantly concentrated in the spatial
distribution compare to EPL-IN.
Identification of local spatial aggregation areas
Next, we can identify the spatial high-cluster areas for the spatial
variable features via runLISA, which provides two local
spatial statistics algorithms: Local Getis-Ord’s G and Local
Moran’s I
mob_spe_cellfree <- mob_spe |> gsvaExp("CellTypeFree")
mob_spe_cellfree <- mob_spe_cellfree |> 
    runPCA(assay.type='prop') |> runTSNE(dimred='PCA')
#> Warning in check_numbers(k = k, nu = nu, nv = nv, limit = min(dim(x)) - : more
#> singular values/vectors requested than available
#> Warning in (function (A, nv = 5, nu = nv, maxit = 1000, work = nv + 7, reorth =
#> TRUE, : You're computing too large a percentage of total singular values, use a
#> standard svd instead.
# Here, we use the `TSNE` space to identify the location clustering regions
celltypefree_lisares <- mob_spe_cellfree |> 
   runLISA(
     features = c("GC", "OSNs", "M/TC", "PGC"), 
     assay.type = 'prop', 
     reduction.used = 'TSNE', 
     weight.method = 'knn', 
     k = 8
   ) 
celltypefree_lisares$GC |> head()
#>                              Gi        E.Gi       Var.Gi      Z.Gi
#> ACAACTATGGGTTGGCGG 0.0004597541 0.003831418 2.238588e-06 -2.253498
#> ACACAGATCCTGTTCTGA 0.0011211572 0.003831418 2.241053e-06 -1.810443
#> ACATCACCTGCGCGCTCT 0.0102338506 0.003831418 2.270093e-06  4.249357
#> ACATTTAAGGCGCATGAT 0.0105317532 0.003831418 2.269617e-06  4.447544
#> ACCACTGTAATCTCCCAT 0.0005077296 0.003831418 2.243052e-06 -2.219221
#> ACCAGAGCCGTTGAGCAA 0.0017501801 0.003831418 2.246703e-06 -1.388509
#>                    Pr (z != E(Gi)) cluster.no.test cluster.test
#> ACAACTATGGGTTGGCGG    2.422777e-02             Low          Low
#> ACACAGATCCTGTTCTGA    7.022704e-02             Low       NoSign
#> ACATCACCTGCGCGCTCT    2.143851e-05            High         High
#> ACATTTAAGGCGCATGAT    8.685760e-06            High         High
#> ACCACTGTAATCTCCCAT    2.647168e-02             Low          Low
#> ACCAGAGCCGTTGAGCAA    1.649820e-01             Low       NoSignLocal Getis-Ord’s G (Gi) measures the aggregation of a feature value in a specific spot and its neighbors, determining if locations are significantly clustered with high (hot spots) or low values (cold spots). Gi usually have no fixed range, a larger Gi indicates a stronger clustering of high expression for the feature in captured location and its neighboring spots, while a smaller Gi indicates a stronger clustering of low expression.
gsvaExp(mob_spe, 'CellTypeFree') |> 
  plot_lisa_feature(
    lisa.res = celltypefree_lisares, 
    assay.type = 2, 
    pointsize = 2,
    gap_line_width = .18,
    hlpointsize= 1.8
  )The result of local Getis-Ord for cell-type
The highlighted area signifies that the feature is significantly clustered in this region.
Bivariate spatial statistical analysis
Next, to explore the bivariate spatial co-distribution,
SVP implements global and local bivariate spatial
statistics algorithms. Global bivariate spatial statistics algorithm can
be used to identify whether the bivariate is co-cluster in the specific
space. Local bivariate spatial algorithm is to find the areas of spatial
co-clustering of bivariate.
Global bivariate spatial analysis
Here, we use runGLOBALBV of SVP to explore
the spatial co-distribution between the cell-types.
gbv.res <- mob_spe |> 
  gsvaExp('CellTypeFree') |> 
  runGLOBALBV(
    features1 = c("GC", "OSNs", "M/TC", "PGC"),
    assay.type = 2, 
    permutation = NULL, # if permutation is NULL, mantel test will be used to calculated the pvalue
    add.pvalue = TRUE,
    alternative = 'greater' # one-side test 
  )
  
# gbv.res is a list contained Lee and pvalue matrix.
gbv.res |> as_tbl_df(diag=FALSE, flag.clust = TRUE) |> 
  dplyr::arrange(desc(Lee))
#> # A tibble: 6 × 4
#>   x     y         Lee   pvalue
#>   <fct> <fct>   <dbl>    <dbl>
#> 1 OSNs  PGC    0.159  1.29e-13
#> 2 M/TC  PGC    0.0325 1.69e- 3
#> 3 M/TC  OSNs  -0.102  9.54e- 1
#> 4 GC    M/TC  -0.200  1.00e+ 0
#> 5 GC    OSNs  -0.229  1   e+ 0
#> 6 GC    PGC   -0.306  1   e+ 0The Lee value typically ranges from -1 to 1. When the
index is closer to 1, it indicates a strong positive spatial association
between the two variables, meaning the high-value areas of one variable
tend to overlap with the high-value areas of the other, and low-value
areas similarly overlap, showing strong spatial consistency. Conversely,
if the index is near -1, it suggests a strong negative spatial
association, where the high-value areas of one variable tend to overlap
with the low-value areas of the other.
We developed a function plot_heatmap_globalbv to
visualize the result of global bivariate spatial analysis
# We can also display the result of global univariate spatial analysis 
moran.res <- gsvaExp(mob_spe, 'CellTypeFree') |> svDf("sv.moransi")
# The result of local univariate spatial analysis also can be added
lisa.f1.res <- gsvaExp(mob_spe, "CellTypeFree", withColData = TRUE) |> 
    cal_lisa_f1(lisa.res = celltypefree_lisares, group.by = 'layer', rm.group.nm=NA)
plot_heatmap_globalbv(gbv.res, moran.t=moran.res, lisa.t=lisa.f1.res)
#> Warning: Removed 1 row containing missing values or values outside the scale range
#> (`geom_col()`).
#> Warning: Removed 1 row containing missing values or values outside the scale range
#> (`geom_text()`).The results of clustering analysis of spatial distribution between cell-type
We can explore the spatial co-cluster between the different type of
gsvaExp. For example, we explore the spatial
co-distribution between the CellTypeFree and
Biocarta of mob_spe object.
# If the number of features is excessive, we recommend selecting 
# those with spatial variability.
 celltypeid <- gsvaExp(mob_spe, "CellTypeFree") |> svDf("sv.moransi") |> 
   dplyr::filter(padj<=0.01) |> rownames()
biocartaid <- gsvaExp(mob_spe, "Biocarta") |> 
   runDetectSVG(assay.type = 1, verbose=FALSE) |> 
   svDf() |> 
   dplyr::filter(padj <= 0.01) |>
   rownames()
#> elapsed time is 0.013000 seconds
runGLOBALBV(
  mob_spe, 
  gsvaexp = c("CellTypeFree", "Biocarta"), 
  gsvaexp.features = c(celltypeid, biocartaid[seq(20)]), 
  gsvaexp.assay.type = c(2, 1),
  permutation = NULL, 
  add.pvalue = TRUE
) -> res.gbv.celltype.biocarta
#> The `gsvaexp` is specified, the specified `gsvaExp` will be
#> used to perform the analysis.
plot_heatmap_globalbv(res.gbv.celltype.biocarta)The heatmap of spatial correlation between cell tyep and Biocarta states
Or we can also explore the co-location between the spatially variable genes and cell-types.
svgid <- mob_spe |> runDetectSVG(assay.type = 'logcounts') |> svDf() |>
         dplyr::filter(padj <= 0.01) |> dplyr::arrange(rank) |> rownames()
#> Identifying the spatially variable gene sets (pathway) based on moransi
#> elapsed time is 8.209000 seconds
#> The result is added to the input object, which can be extracted using
#> `svDf()` with type='sv.moransi' for `SingleCellExperiment` or
#> `SpatialExperiment`.
#> If input object is `SVPExperiment`, and `gsvaexp` is specified
#> the result can be extracted by `gsvaExp()` (return a `SingleCellExperiment`
#> or `SpatialExperiment`),then also using `svDf()` to extract.
res.gbv.genes.celltype <- runGLOBALBV(mob_spe,
  features1 = svgid[seq(40)],
  gsvaexp = 'CellTypeFree',
  gsvaexp.features = celltypeid,
  gsvaexp.assay.type = 2,
  permutation = NULL,
  add.pvalue = TRUE
)
#> The `gsvaexp` is specified, the specified `gsvaExp` will be
#> used to perform the analysis.
plot_heatmap_globalbv(res.gbv.genes.celltype)The heatmap of spatial correlation between cell tyep and spatially variable genes
Local bivariate spatial analysis
After global bivariate spatial analysis, we found some genes (such as
Psd, Nrgn, Kcnh3,
Gpsm1, Pcp4) and GC show
significant spatial co-aggregation. Next, we use runLOCALBV
of SVP to identify the co-aggregation areas.
lbv.res <- mob_spe |> 
  runLOCALBV(features1=c("Psd", "Nrgn", "Kcnh3", "Gpsm1", "Pcp4"),
    assay.type = 'logcounts',
    gsvaexp = 'CellTypeFree',
    gsvaexp.features=c('GC'),
    gsvaexp.assay.type = 'prop',
    weight.method='knn',
    k = 8
  ) 
#> The `gsvaexp` is specified, the specified `gsvaExp` will be
#> used to perform the analysis with `features` displayed from `rownames(data)`.
# The result can be added to gsvaExp, then visualized by ggsc.
LISAsce(mob_spe, lisa.res=lbv.res, gsvaexp.name='LBV.res') |> 
  gsvaExp("LBV.res") %>% 
  plot_lisa_feature(lisa.res = lbv.res, assay.type = 1)The heatmap of Local bivariate spatial analysis
The highlighted area denotes the region where the two variables significantly co-aggregate in space.
Session information
Here is the output of sessionInfo() on the system on
which this document was compiled:
#> R version 4.5.0 Patched (2025-04-21 r88169)
#> Platform: aarch64-apple-darwin20
#> Running under: macOS Ventura 13.7.1
#> 
#> Matrix products: default
#> BLAS:   /Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/lib/libRblas.0.dylib 
#> LAPACK: /Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.12.1
#> 
#> locale:
#> [1] C/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
#> 
#> time zone: America/New_York
#> tzcode source: internal
#> 
#> attached base packages:
#> [1] stats4    stats     graphics  grDevices utils     datasets  methods  
#> [8] base     
#> 
#> other attached packages:
#>  [1] scater_1.37.0               STexampleData_1.17.0       
#>  [3] ExperimentHub_2.99.0        AnnotationHub_3.99.0       
#>  [5] BiocFileCache_2.99.0        dbplyr_2.5.0               
#>  [7] scatterpie_0.2.4            magrittr_2.0.3             
#>  [9] ggplot2_3.5.2               SVP_1.1.0                  
#> [11] scran_1.37.0                scuttle_1.19.0             
#> [13] ggsc_1.7.0                  SpatialExperiment_1.19.0   
#> [15] SingleCellExperiment_1.31.0 SummarizedExperiment_1.39.0
#> [17] Biobase_2.69.0              GenomicRanges_1.61.0       
#> [19] GenomeInfoDb_1.45.0         IRanges_2.43.0             
#> [21] S4Vectors_0.47.0            BiocGenerics_0.55.0        
#> [23] generics_0.1.3              MatrixGenerics_1.21.0      
#> [25] matrixStats_1.5.0          
#> 
#> loaded via a namespace (and not attached):
#>   [1] fs_1.6.6                  spatstat.sparse_3.1-0    
#>   [3] httr_1.4.7                RColorBrewer_1.1-3       
#>   [5] tools_4.5.0               sctransform_0.4.1        
#>   [7] utf8_1.2.4                R6_2.6.1                 
#>   [9] lazyeval_0.2.2            uwot_0.2.3               
#>  [11] withr_3.0.2               sp_2.2-0                 
#>  [13] gridExtra_2.3             progressr_0.15.1         
#>  [15] cli_3.6.5                 spatstat.explore_3.4-2   
#>  [17] fastDummies_1.7.5         labeling_0.4.3           
#>  [19] sass_0.4.10               Seurat_5.3.0             
#>  [21] spatstat.data_3.1-6       ggridges_0.5.6           
#>  [23] pbapply_1.7-2             yulab.utils_0.2.0        
#>  [25] dichromat_2.0-0.1         parallelly_1.43.0        
#>  [27] limma_3.65.0              RSQLite_2.3.9            
#>  [29] gridGraphics_0.5-1        ica_1.0-3                
#>  [31] spatstat.random_3.3-3     dplyr_1.1.4              
#>  [33] Matrix_1.7-3              ggbeeswarm_0.7.2         
#>  [35] abind_1.4-8               lifecycle_1.0.4          
#>  [37] yaml_2.3.10               edgeR_4.7.0              
#>  [39] SparseArray_1.9.0         Rtsne_0.17               
#>  [41] grid_4.5.0                blob_1.2.4               
#>  [43] promises_1.3.2            dqrng_0.4.1              
#>  [45] crayon_1.5.3              miniUI_0.1.2             
#>  [47] lattice_0.22-7            beachmat_2.25.0          
#>  [49] cowplot_1.1.3             KEGGREST_1.49.0          
#>  [51] magick_2.8.6              pillar_1.10.2            
#>  [53] knitr_1.50                metapod_1.17.0           
#>  [55] rjson_0.2.23              future.apply_1.11.3      
#>  [57] codetools_0.2-20          fastmatch_1.1-6          
#>  [59] glue_1.8.0                ggfun_0.1.8              
#>  [61] spatstat.univar_3.1-2     data.table_1.17.0        
#>  [63] vctrs_0.6.5               png_0.1-8                
#>  [65] treeio_1.33.0             spam_2.11-1              
#>  [67] gtable_0.3.6              cachem_1.1.0             
#>  [69] xfun_0.52                 S4Arrays_1.9.0           
#>  [71] mime_0.13                 pracma_2.4.4             
#>  [73] survival_3.8-3            statmod_1.5.0            
#>  [75] bluster_1.19.0            fitdistrplus_1.2-2       
#>  [77] ROCR_1.0-11               nlme_3.1-168             
#>  [79] ggtree_3.17.0             bit64_4.6.0-1            
#>  [81] filelock_1.0.3            RcppAnnoy_0.0.22         
#>  [83] bslib_0.9.0               irlba_2.3.5.1            
#>  [85] vipor_0.4.7               KernSmooth_2.23-26       
#>  [87] colorspace_2.1-1          DBI_1.2.3                
#>  [89] tidyselect_1.2.1          bit_4.6.0                
#>  [91] compiler_4.5.0            curl_6.2.2               
#>  [93] httr2_1.1.2               BiocNeighbors_2.3.0      
#>  [95] DelayedArray_0.35.1       plotly_4.10.4            
#>  [97] scales_1.4.0              lmtest_0.9-40            
#>  [99] rappdirs_0.3.3            stringr_1.5.1            
#> [101] digest_0.6.37             goftest_1.2-3            
#> [103] spatstat.utils_3.1-3      rmarkdown_2.29           
#> [105] XVector_0.49.0            htmltools_0.5.8.1        
#> [107] pkgconfig_2.0.3           sparseMatrixStats_1.21.0 
#> [109] fastmap_1.2.0             rlang_1.1.6              
#> [111] htmlwidgets_1.6.4         UCSC.utils_1.5.0         
#> [113] shiny_1.10.0              prettydoc_0.4.1          
#> [115] DelayedMatrixStats_1.31.0 farver_2.1.2             
#> [117] jquerylib_0.1.4           zoo_1.8-14               
#> [119] jsonlite_2.0.0            BiocParallel_1.43.0      
#> [121] BiocSingular_1.25.0       GenomeInfoDbData_1.2.14  
#> [123] ggplotify_0.1.2           dotCall64_1.2            
#> [125] patchwork_1.3.0           Rcpp_1.0.14              
#> [127] viridis_0.6.5             ape_5.8-1                
#> [129] reticulate_1.42.0         stringi_1.8.7            
#> [131] ggstar_1.0.4              MASS_7.3-65              
#> [133] plyr_1.8.9                parallel_4.5.0           
#> [135] listenv_0.9.1             ggrepel_0.9.6            
#> [137] deldir_2.0-4              Biostrings_2.77.0        
#> [139] splines_4.5.0             tensor_1.5               
#> [141] locfit_1.5-9.12           igraph_2.1.4             
#> [143] spatstat.geom_3.3-6       RcppHNSW_0.6.0           
#> [145] reshape2_1.4.4            ScaledMatrix_1.17.0      
#> [147] BiocVersion_3.22.0        evaluate_1.0.3           
#> [149] SeuratObject_5.1.0        RcppParallel_5.1.10      
#> [151] BiocManager_1.30.25       tweenr_2.0.3             
#> [153] httpuv_1.6.16             RANN_2.6.2               
#> [155] tidyr_1.3.1               purrr_1.0.4              
#> [157] polyclip_1.10-7           future_1.40.0            
#> [159] scattermore_1.2           ggforce_0.4.2            
#> [161] rsvd_1.0.5                xtable_1.8-4             
#> [163] RSpectra_0.16-2           tidytree_0.4.6           
#> [165] tidydr_0.0.5              later_1.4.2              
#> [167] viridisLite_0.4.2         tibble_3.2.1             
#> [169] aplot_0.2.5               beeswarm_0.4.0           
#> [171] AnnotationDbi_1.71.0      memoise_2.0.1            
#> [173] cluster_2.1.8.1           globals_0.17.0