SpiecEasi 1.99.3
Sparse InversE Covariance estimation for Ecological Association and Statistical Inference
This package will be useful to anybody who wants to infer graphical models for all sorts of compositional data, though primarily intended for microbiome relative abundance data (generated from 16S amplicon sequence data). It also includes a generator for [overdispersed, zero inflated] multivariate, correlated count data. Please see the paper published in PLoS Comp Bio.
One small point on notation: we refer to the method as “SPIEC-EASI” and reserve “SpiecEasi” for this package.
To install SpiecEasi from Bioconductor:
if (!require("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install("SpiecEasi")
For development versions, you can install from GitHub:
if (!require("remotes", quietly = TRUE))
install.packages("remotes")
remotes::install_github("zdk123/SpiecEasi")
This package includes several vignettes covering different aspects of SpiecEasi:
Lets simulate some multivariate data under zero-inflated negative binomial model, based on (high depth/count) round 1 of the American gut project, with a sparse network. The basic steps are:
Obviously, for real data, skip 1-4.
Session info:
sessionInfo()
# R Under development (unstable) (2025-10-21 r88958)
# Platform: x86_64-apple-darwin20
# Running under: macOS Ventura 13.7.8
#
# Matrix products: default
# BLAS: /Library/Frameworks/R.framework/Versions/4.6-x86_64/Resources/lib/libRblas.0.dylib
# LAPACK: /Library/Frameworks/R.framework/Versions/4.6-x86_64/Resources/lib/libRlapack.dylib; LAPACK version 3.12.1
#
# locale:
# [1] C/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
#
# time zone: America/New_York
# tzcode source: internal
#
# attached base packages:
# [1] stats graphics grDevices utils datasets methods base
#
# other attached packages:
# [1] BiocStyle_2.39.0
#
# loaded via a namespace (and not attached):
# [1] digest_0.6.38 R6_2.6.1 bookdown_0.45
# [4] fastmap_1.2.0 xfun_0.54 cachem_1.1.0
# [7] knitr_1.50 htmltools_0.5.8.1 rmarkdown_2.30
# [10] lifecycle_1.0.4 cli_3.6.5 sass_0.4.10
# [13] jquerylib_0.1.4 compiler_4.6.0 tools_4.6.0
# [16] evaluate_1.0.5 bslib_0.9.0 yaml_2.3.10
# [19] BiocManager_1.30.27 jsonlite_2.0.0 rlang_1.1.6
Setup:
library(SpiecEasi)
data(amgut1.filt)
depths <- rowSums(amgut1.filt)
amgut1.filt.n <- t(apply(amgut1.filt, 1, norm_to_total))
amgut1.filt.cs <- round(amgut1.filt.n * min(depths))
d <- ncol(amgut1.filt.cs)
n <- nrow(amgut1.filt.cs)
e <- d
Synthesize the data:
set.seed(10010)
graph <- SpiecEasi::make_graph('cluster', d, e)
Prec <- graph2prec(graph)
Cor <- cov2cor(prec2cov(Prec))
X <- synth_comm_from_counts(amgut1.filt.cs, mar=2, distr='zinegbin', Sigma=Cor, n=n)
The main SPIEC-EASI pipeline: Data transformation, sparse inverse covariance estimation and model selection:
se <- spiec.easi(X, method='mb', lambda.min.ratio=1e-2, nlambda=15)
Examine ROC over lambda path and PR over the stars index for the selected graph:
huge::huge.roc(se$est$path, graph, verbose=FALSE)
unnamed-chunk-8-1.png
# True Postive Rate: from 0.8818898 to 0.8818898
# False Positive Rate: from 0.004191008 to 0.004191008
# Area under Curve: 0
# Maximum F1 Score: 0.9351559
stars.pr(getOptMerge(se), graph, verbose=FALSE)
unnamed-chunk-8-2.png
# True Postive Rate: from 0 to 0.02362205
# False Positive Rate: from 0 to 0
# Area under Curve: 0.511811
# Maximum F1 Score: 0.04615385
# stars selected final network under: getRefit(se)
The above example does not cover all possible options and parameters. For example, other generative network models are available, the lambda.min.ratio (the scaling factor that determines the minimum sparsity/lambda parameter) shown here might not be right for your dataset, and its possible that you’ll want more repetitions (number of subsamples) for StARS.
Now let’s apply SpiecEasi directly to the American Gut data. Don’t forget that the normalization is performed internally in the spiec.easi function. Also, we should use a larger number of stars repetitions for real data. We can pass in arguments to the inner stars selection function as a list via the parameter pulsar.params. If you have more than one processor available, you can also supply a number to ncores. Also, let’s compare results from the MB and glasso methods as well as SparCC (correlation).
Note: On Windows systems, mc.cores > 1 is not supported by default. For Windows users, we recommend using ncores=1 for serial processing or the snow cluster type for parallel processing. See the pulsar-parallel vignette for detailed Windows-specific guidance.
se.mb.amgut <- spiec.easi(amgut1.filt, method='mb', lambda.min.ratio=1e-2,
nlambda=10, pulsar.params=list(rep.num=20))
se.gl.amgut <- spiec.easi(amgut1.filt, method='glasso', lambda.min.ratio=1e-2,
nlambda=10, pulsar.params=list(rep.num=20))
sparcc.amgut <- sparcc(amgut1.filt)
Create igraph objects for visualization:
## Define arbitrary threshold for SparCC correlation matrix for the graph
sparcc.graph <- abs(sparcc.amgut$Cor) >= 0.3
diag(sparcc.graph) <- 0
library(Matrix)
#
# Attaching package: 'Matrix'
# The following objects are masked from 'package:SpiecEasi':
#
# tril, triu
sparcc.graph <- Matrix(sparcc.graph, sparse=TRUE)
## Create igraph objects
ig.mb <- adj2igraph(getRefit(se.mb.amgut))
ig.gl <- adj2igraph(getRefit(se.gl.amgut))
ig.sparcc <- adj2igraph(sparcc.graph)
Visualize using igraph plotting:
library(igraph)
library(Matrix)
## set size of vertex proportional to clr-mean
vsize <- rowMeans(clr(amgut1.filt, 1))+6
am.coord <- layout.fruchterman.reingold(ig.mb)
# Warning: `layout.fruchterman.reingold()` was deprecated in igraph 2.1.0.
# ℹ Please use `layout_with_fr()` instead.
# This warning is displayed once every 8 hours.
# Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
# generated.
par(mfrow=c(1,3))
plot(ig.mb, layout=am.coord, vertex.size=vsize, vertex.label=NA, main="MB")
plot(ig.gl, layout=am.coord, vertex.size=vsize, vertex.label=NA, main="glasso")
plot(ig.sparcc, layout=am.coord, vertex.size=vsize, vertex.label=NA, main="sparcc")
unnamed-chunk-11-1.png
We can evaluate the weights on edges networks using the terms from the underlying model. SparCC correlations can be used directly, while SpiecEasi networks need to be massaged a bit. Note that since SPIEC-EASI is based on penalized estimators, the edge weights are not directly comparable to SparCC (or Pearson/Spearman correlation coefficients):
# Create edge lists for weight comparison
secor <- cov2cor(getOptCov(se.gl.amgut))
sebeta <- symBeta(getOptBeta(se.mb.amgut), mode='maxabs')
elist.gl <- summary(triu(secor*getRefit(se.gl.amgut), k=1))
elist.mb <- summary(sebeta)
elist.sparcc <- summary(sparcc.graph*sparcc.amgut$Cor)
# Plot edge weight distributions
hist(elist.sparcc[,3], main='', xlab='edge weights')
hist(elist.mb[,3], add=TRUE, col='forestgreen')
hist(elist.gl[,3], add=TRUE, col='red')
unnamed-chunk-12-1.png
Lets look at the degree statistics from the networks inferred by each method:
dd.gl <- degree.distribution(ig.gl)
# Warning: `degree.distribution()` was deprecated in igraph 2.0.0.
# ℹ Please use `degree_distribution()` instead.
# This warning is displayed once every 8 hours.
# Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
# generated.
dd.mb <- degree.distribution(ig.mb)
dd.sparcc <- degree.distribution(ig.sparcc)
plot(0:(length(dd.sparcc)-1), dd.sparcc, ylim=c(0,.35), type='b',
ylab="Frequency", xlab="Degree", main="Degree Distributions")
points(0:(length(dd.gl)-1), dd.gl, col="red" , type='b')
points(0:(length(dd.mb)-1), dd.mb, col="forestgreen", type='b')
legend("topright", c("MB", "glasso", "sparcc"),
col=c("forestgreen", "red", "black"), pch=1, lty=1)
unnamed-chunk-13-1.png
For more advanced usage, please refer to the other vignettes: