SpiecEasi is now using the pulsar package as the backend for performing model selection. In the default parameter setting, this uses the same StARS procedure as previous versions.
As in the previous version of SpiecEasi, we can supply the
ncores argument to the pulsar.params list to break up the
subsampled computations into parallel tasks.
Note: On Windows systems,
mc.cores > 1 is not supported by default. For Windows
users, we recommend using the snow cluster type or running
in serial mode.
In this example, we set the random seed to make consistent comparison across experiments:
## Default settings ##
pargs1 <- list(rep.num=50, seed=10010)
t1 <- system.time(
se1 <- spiec.easi(amgut1.filt, method='mb', lambda.min.ratio=1e-3, nlambda=30,
sel.criterion='stars', pulsar.select=TRUE, pulsar.params=pargs1)
)
## Platform-specific or envrionment-specific parallel processing ##
if (.Platform$OS.type == "windows" || Sys.getenv("CI") == "true" || nzchar(Sys.getenv("GITHUB_ACTIONS"))) {
# On Windows, use snow cluster or run serial
pargs2 <- list(rep.num=50, seed=10010, ncores=1) # Serial for Windows
cat("Running on Windows - using serial processing\n")
} else {
# On Unix-like systems, use multicore
n_cores <- min(2, parallel::detectCores())
pargs2 <- list(rep.num=50, seed=10010, ncores=n_cores)
cat("Running on Unix-like system - using parallel processing\n")
}
t2 <- system.time(
se2 <- spiec.easi(amgut1.filt, method='mb', lambda.min.ratio=1e-3, nlambda=30,
sel.criterion='stars', pulsar.select=TRUE, pulsar.params=pargs2)
)We can further speed up StARS using the bounded-StARS (‘bstars’) method. The B-StARS approach computes network stability across the whole lambda path, but only for the first 2 subsamples. This is used to build an initial estimate of the summary statistic, which in turn gives us a lower/upper bound on the optimal lambda. The remaining subsamples are used to compute the stability over the restricted path. Since denser networks are more computational expensive to compute, this can significantly reduce computational time for datasets with many variables.
t3 <- system.time(
se3 <- spiec.easi(amgut1.filt, method='mb', lambda.min.ratio=1e-3, nlambda=30,
sel.criterion='bstars', pulsar.select=TRUE, pulsar.params=pargs1)
)
t4 <- system.time(
se4 <- spiec.easi(amgut1.filt, method='mb', lambda.min.ratio=1e-3, nlambda=30,
sel.criterion='bstars', pulsar.select=TRUE, pulsar.params=pargs2)
)We can see that in addition to the computational savings, the refit networks are identical:
## serial vs parallel
identical(getRefit(se1), getRefit(se2))
# [1] TRUE
t1[3] > t2[3]
# elapsed
# TRUE
## stars vs bstars
identical(getRefit(se1), getRefit(se3))
# [1] TRUE
t1[3] > t3[3]
# elapsed
# TRUE
identical(getRefit(se2), getRefit(se4))
# [1] TRUE
t2[3] > t4[3]
# elapsed
# TRUEFor Windows users, there are several options for parallel processing:
# For Windows users who want parallel processing
library(parallel)
cl <- makeCluster(4, type = "SOCK")
pargs.windows <- list(rep.num=50, seed=10010)
se.windows <- spiec.easi(amgut1.filt, method='mb', lambda.min.ratio=1e-3, nlambda=30,
sel.criterion='stars', pulsar.select=TRUE, pulsar.params=pargs.windows)
stopCluster(cl)Pulsar gives us the option of running stability selection in batch mode, using the batchtools package. This will be useful to anyone with access to an hpc/distributing computing system. Each subsample will be independently executed using a system-specific cluster function.
This requires an external config file which will instruct the
batchtools registry how to construct the cluster function which will
execute the individual jobs. batch.pulsar has some built in
config files that are useful for testing purposes (serial mode,
“parallel”, “snow”, etc), but it is advisable to create your own config
file and pass in the absolute path. See the batchtools
docs for instructions on how to construct config file and template
files (i.e. to interact with a queueing system such as TORQUE or
SGE).
## bargs <- list(rep.num=50, seed=10010, conffile="path/to/conf.txt")
bargs <- list(rep.num=50, seed=10010, conffile="parallel")
## See the config file stores:
pulsar::findConfFile('parallel')
## uncomment line below to turn off batchtools reporting
# options(batchtools.verbose=FALSE)
se5 <- spiec.easi(amgut1.filt, method='mb', lambda.min.ratio=1e-3, nlambda=30,
sel.criterion='stars', pulsar.select='batch', pulsar.params=bargs)Let’s compare the performance of different approaches:
# Compare timing
cat("Serial StARS:", t1[3], "seconds\n")
# Serial StARS: 360.816 seconds
cat("Platform-specific StARS:", t2[3], "seconds\n")
# Platform-specific StARS: 244.183 seconds
cat("Serial B-StARS:", t3[3], "seconds\n")
# Serial B-StARS: 79.431 seconds
cat("Platform-specific B-StARS:", t4[3], "seconds\n")
# Platform-specific B-StARS: 61.065 seconds
# Speedup factors (only meaningful on Unix-like systems)
if (.Platform$OS.type != "windows") {
cat("Parallel speedup (StARS):", t1[3]/t2[3], "\n")
cat("B-StARS speedup (serial):", t1[3]/t3[3], "\n")
cat("B-StARS speedup (parallel):", t2[3]/t4[3], "\n")
}
# Parallel speedup (StARS): 1.477646
# B-StARS speedup (serial): 4.542509
# B-StARS speedup (parallel): 3.998739rep.num: Number of subsamples for
stability selection (default: 50)ncores: Number of cores for parallel
processing (use 1 on Windows)sel.criterion: Selection criterion
(‘stars’ or ‘bstars’)pulsar.select: Whether to use pulsar
for model selectionpulsar.params: List of parameters
passed to pulsarSession info:
sessionInfo()
# R version 4.5.2 (2025-10-31)
# Platform: x86_64-pc-linux-gnu
# Running under: Ubuntu 24.04.3 LTS
#
# Matrix products: default
# BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
# LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so; LAPACK version 3.12.0
#
# locale:
# [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
# [3] LC_TIME=en_US.UTF-8 LC_COLLATE=C
# [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
# [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
# [9] LC_ADDRESS=C LC_TELEPHONE=C
# [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
#
# time zone: Etc/UTC
# tzcode source: system (glibc)
#
# attached base packages:
# [1] stats graphics grDevices utils datasets methods base
#
# other attached packages:
# [1] phyloseq_1.55.0 igraph_2.2.1 Matrix_1.7-4 SpiecEasi_1.99.3
# [5] BiocStyle_2.39.0
#
# loaded via a namespace (and not attached):
# [1] gtable_0.3.6 shape_1.4.6.1 xfun_0.56
# [4] bslib_0.9.0 ggplot2_4.0.1 rhdf5_2.55.12
# [7] Biobase_2.71.0 lattice_0.22-7 vctrs_0.7.0
# [10] rhdf5filters_1.23.3 tools_4.5.2 generics_0.1.4
# [13] biomformat_1.39.0 stats4_4.5.2 parallel_4.5.2
# [16] cluster_2.1.8.1 pkgconfig_2.0.3 huge_1.3.5
# [19] data.table_1.18.0 RColorBrewer_1.1-3 S7_0.2.1
# [22] S4Vectors_0.49.0 lifecycle_1.0.5 compiler_4.5.2
# [25] farver_2.1.2 stringr_1.6.0 Biostrings_2.79.4
# [28] Seqinfo_1.1.0 codetools_0.2-20 permute_0.9-8
# [31] htmltools_0.5.9 sys_3.4.3 buildtools_1.0.0
# [34] sass_0.4.10 yaml_2.3.12 glmnet_4.1-10
# [37] crayon_1.5.3 jquerylib_0.1.4 MASS_7.3-65
# [40] cachem_1.1.0 vegan_2.7-2 iterators_1.0.14
# [43] foreach_1.5.2 nlme_3.1-168 digest_0.6.39
# [46] stringi_1.8.7 reshape2_1.4.5 labeling_0.4.3
# [49] maketools_1.3.2 splines_4.5.2 ade4_1.7-23
# [52] fastmap_1.2.0 grid_4.5.2 cli_3.6.5
# [55] magrittr_2.0.4 survival_3.8-6 ape_5.8-1
# [58] withr_3.0.2 scales_1.4.0 rmarkdown_2.30
# [61] XVector_0.51.0 multtest_2.67.0 pulsar_0.3.11
# [64] VGAM_1.1-14 evaluate_1.0.5 knitr_1.51
# [67] IRanges_2.45.0 mgcv_1.9-4 rlang_1.1.7
# [70] Rcpp_1.1.1 glue_1.8.0 BiocManager_1.30.27
# [73] BiocGenerics_0.57.0 jsonlite_2.0.0 R6_2.6.1
# [76] Rhdf5lib_1.33.0 plyr_1.8.9