hammers 0.99.9
hammers is a utilities suite for scRNA-seq data analysis. It provides simple
tools to address tasks such as retrieving aggregate gene statistics,
finding and removing rare genes, performing representation analysis,
computing the center of mass for the expression of a gene of interest in
low-dimensional space, and calculating silhouette and cluster-normalized
silhouette.
To install hammers, run the following commands in an R session:
if (!require("BiocManager", quietly=TRUE))
install.packages("BiocManager")
BiocManager::install("hammers")
This tutorial uses an scRNA-seq human pancreas dataset. After loading the
required packages, download the dataset using the BaronPancreasData function
from scRNAseq. The dataset will be stored as a SingleCellExperiment object.
library(hammers)
library(scLang)
library(scRNAseq)
library(scater)
sceObj <- BaronPancreasData('human')
We will first normalize and
log-transform the data using the logNormCounts function from scuttle
(loaded automatically with scater).
sceObj <- logNormCounts(sceObj)
We will also need PCA and UMAP dimensions. These can be computed using the
runPCA and runUMAP functions from scater:
sceObj <- runPCA(sceObj)
sceObj <- runUMAP(sceObj)
The genePresence function extracts the genes that appear in at
least minCutoff and at most maxCutoff cells. It returns a data frame
listing the number of cells in which each gene that meets these criteria is
expressed:
df <- genePresence(sceObj, rownames(sceObj)[seq(100)], 300, 2000)
dim(df)
#> [1] 40 2
head(df)
#> Gene nCells
#> AATF AATF 1755
#> A1CF A1CF 1408
#> AACS AACS 1385
#> AAGAB AAGAB 1328
#> ABHD3 ABHD3 1220
#> AAAS AAAS 1216
The geneCellSets function lists all the cells in which selected genes are
expressed:
cellSets <- geneCellSets(sceObj, rownames(sceObj)[seq(100)])
head(cellSets[[1]])
#> [1] "human1_lib1.final_cell_0024" "human1_lib1.final_cell_0052"
#> [3] "human1_lib1.final_cell_0054" "human1_lib1.final_cell_0188"
#> [5] "human1_lib1.final_cell_0242" "human1_lib1.final_cell_0294"
findRareGenes is a wrapper around genePresence that can extract the
genes that appear in fewer than nCells cells:
df <- findRareGenes(sceObj, nCells=10)
#> Warning in asMethod(object): sparse->dense coercion: allocating vector of size
#> 1.3 GiB
dim(df)
#> [1] 5008 2
head(df)
#> Gene nCells
#> ADRA1D ADRA1D 9
#> ALAS2 ALAS2 9
#> ALPK2 ALPK2 9
#> ASPA ASPA 9
#> BDNF BDNF 9
#> BRINP2 BRINP2 9
The removeRareGenes function calls findRareGenes and subsequently removes
the identified rare genes from the object:
dim(sceObj)
#> [1] 20125 8569
sceObj <- removeRareGenes(sceObj, nCells=10)
dim(sceObj)
#> [1] 15117 8569
The repAnalysis function shows which pairs selected from two categorical
columns of a single-cell object are:
doOverrep to FALSE.df <- repAnalysis(sceObj, 'donor', 'label')
head(df)
#> donor label sharedCount donorCount labelCount
#> 29 GSM2230759 acinar 843 3605 958
#> 4 GSM2230757 beta 872 1937 2525
#> 17 GSM2230758 alpha 676 1724 2326
#> 7 GSM2230757 endothelial 130 1937 252
#> 48 GSM2230760 ductal 280 1303 1077
#> 12 GSM2230757 quiescent_stellate 92 1937 173
#> expectedCount ratio pval pvalAdj
#> 29 403.03303 2.091640 1.274205e-216 7.135547e-215
#> 4 570.76963 1.527762 3.187997e-62 1.753398e-60
#> 17 467.96872 1.444541 9.375316e-35 5.062671e-33
#> 7 56.96394 2.282146 2.054204e-24 1.088728e-22
#> 48 163.76835 1.709732 3.329598e-23 1.731391e-21
#> 12 39.10620 2.352568 9.889812e-19 5.043804e-17
df <- repAnalysis(sceObj, 'donor', 'label', doOverrep=FALSE)
head(df)
#> donor label sharedCount donorCount labelCount expectedCount ratio
#> 15 GSM2230758 acinar 3 1724 958 192.7403 0.01556498
#> 3 GSM2230757 alpha 236 1937 2326 525.7862 0.44885164
#> 43 GSM2230760 acinar 2 1303 958 145.6732 0.01372936
#> 32 GSM2230759 beta 787 3605 2525 1062.2739 0.74086354
#> 6 GSM2230757 ductal 120 1937 1077 243.4530 0.49290822
#> 37 GSM2230759 gamma 36 3605 255 107.2791 0.33557314
#> pval pvalAdj
#> 15 5.183525e-94 2.902774e-92
#> 3 1.948937e-71 1.071915e-69
#> 43 1.349905e-69 7.289488e-68
#> 32 8.059698e-41 4.271640e-39
#> 6 6.325634e-25 3.289330e-23
#> 37 1.107353e-22 5.647500e-21
The output of repAnalysis can be visualized using the pvalRiverPlot
function, which creates an alluvial plot for a data frame
having a p-value column. Thicker connecting bands represent lower p-values:
pvalRiverPlot(df, title=NULL)
The two columns can also be visualized using the distributionPlot function:
distributionPlot(sceObj, NULL, 'donor', 'label')
The function can also plot relative percentages instead of raw counts:
distributionPlot(sceObj, NULL, 'donor', 'label', type='percs')
Given a list of genes, geneCenters can compute the center of mass of each
gene based on a dimensionality reduction of choice. By default, the reduction
used by geneCenters is UMAP:
genes <- c('PRSS1', 'TTR', 'GJD2', 'MS4A8')
geneCenters(sceObj, genes)
#> UMAP1 UMAP2
#> PRSS1 -8.582282 5.340146
#> TTR 4.072024 4.118626
#> GJD2 5.733269 -5.434168
#> MS4A8 5.296443 4.101940
The centers of mass can be visualized with genesDimPlot, a function which
uses geneCenters internally:
genesDimPlot(sceObj, genes, groupBy='label')
Note: The related colsDimPlot function can plot the center of mass of
numeric columns of the metadata of a Seurat object. colsDimPlot uses
colCenters internally, which works similarly to geneCenters. Both
genesDimPlot and colsDimPlot are wrappers around pointsDimPlot, which
adds labeled points on a plot generated with scLang::dimPlot.
The computeSilhouette computes the silhouette for an input column in the
metadata or coldata of the single-cell expression object, returning the object
with an appended silhouette column:
sceObj <- computeSilhouette(sceObj, 'label', 'labelSil')
head(scCol(sceObj, 'labelSil'))
#> [1] 0.8646607 0.8496181 0.8589549 0.8629453 0.8499229 0.8356816
The normalizeSilhouette function returns a data frame whose number of rows
equals the number of cells and the number of columns equals the number of
identities in the single-cell expression object. It requires silhouette
information for the required column to be provided in the metadata/coldata of
the object. For each column in the output data frame, cells outside of the
corresponding identity get a score of 0. All the other cells get scores in
(0, 1] by adjusted min-max normalization:
df <- normalizeSilhouette(sceObj, 'label', 'labelSil')
head(df)
#> acinar beta delta activated_stellate ductal
#> human1_lib1.final_cell_0001 0.9573028 0 0 0 0
#> human1_lib1.final_cell_0002 0.9423583 0 0 0 0
#> human1_lib1.final_cell_0003 0.9516343 0 0 0 0
#> human1_lib1.final_cell_0004 0.9555986 0 0 0 0
#> human1_lib1.final_cell_0005 0.9426610 0 0 0 0
#> human1_lib1.final_cell_0006 0.9285126 0 0 0 0
#> alpha epsilon gamma endothelial quiescent_stellate
#> human1_lib1.final_cell_0001 0 0 0 0 0
#> human1_lib1.final_cell_0002 0 0 0 0 0
#> human1_lib1.final_cell_0003 0 0 0 0 0
#> human1_lib1.final_cell_0004 0 0 0 0 0
#> human1_lib1.final_cell_0005 0 0 0 0 0
#> human1_lib1.final_cell_0006 0 0 0 0 0
#> macrophage schwann mast t_cell
#> human1_lib1.final_cell_0001 0 0 0 0
#> human1_lib1.final_cell_0002 0 0 0 0
#> human1_lib1.final_cell_0003 0 0 0 0
#> human1_lib1.final_cell_0004 0 0 0 0
#> human1_lib1.final_cell_0005 0 0 0 0
#> human1_lib1.final_cell_0006 0 0 0 0
The addNormSilhouette function adds non-zero normalized silhouette
information to the single-cell expression object as a metadata/coldata column.
For visualization purposes, we will use the featurePlot function from
scLang:
sceObj <- addNormSilhouette(sceObj, df, 'labelNormSil')
featurePlot(sceObj, 'labelNormSil')
sessionInfo()
#> R version 4.6.0 alpha (2026-04-05 r89794)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 24.04.4 LTS
#>
#> Matrix products: default
#> BLAS: /home/biocbuild/bbs-3.23-bioc/R/lib/libRblas.so
#> LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.12.0 LAPACK version 3.12.0
#>
#> locale:
#> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
#> [3] LC_TIME=en_GB LC_COLLATE=C
#> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
#> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
#> [9] LC_ADDRESS=C LC_TELEPHONE=C
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
#>
#> time zone: America/New_York
#> tzcode source: system (glibc)
#>
#> attached base packages:
#> [1] stats4 stats graphics grDevices utils datasets methods
#> [8] base
#>
#> other attached packages:
#> [1] scater_1.39.4 ggplot2_4.0.2
#> [3] scuttle_1.21.5 scRNAseq_2.25.0
#> [5] SingleCellExperiment_1.33.2 SummarizedExperiment_1.41.1
#> [7] Biobase_2.71.0 GenomicRanges_1.63.2
#> [9] Seqinfo_1.1.0 IRanges_2.45.0
#> [11] S4Vectors_0.49.1 BiocGenerics_0.57.0
#> [13] generics_0.1.4 MatrixGenerics_1.23.0
#> [15] matrixStats_1.5.0 scLang_0.99.3
#> [17] hammers_0.99.9 BiocStyle_2.39.0
#>
#> loaded via a namespace (and not attached):
#> [1] RcppAnnoy_0.0.23 prismatic_1.1.2 BiocIO_1.21.0
#> [4] bitops_1.0-9 filelock_1.0.3 tibble_3.3.1
#> [7] polyclip_1.10-7 XML_3.99-0.23 lifecycle_1.0.5
#> [10] httr2_1.2.2 globals_0.19.1 lattice_0.22-9
#> [13] ensembldb_2.35.0 MASS_7.3-65 alabaster.base_1.11.4
#> [16] magrittr_2.0.5 sass_0.4.10 rmarkdown_2.31
#> [19] jquerylib_0.1.4 yaml_2.3.12 otel_0.2.0
#> [22] text2vec_0.6.6 spam_2.11-3 sp_2.2-1
#> [25] DBI_1.3.0 RColorBrewer_1.1-3 abind_1.4-8
#> [28] purrr_1.2.1 AnnotationFilter_1.35.0 ggraph_2.2.2
#> [31] RCurl_1.98-1.18 tweenr_2.0.3 rappdirs_0.3.4
#> [34] float_0.3-3 ggrepel_0.9.8 irlba_2.3.7
#> [37] alabaster.sce_1.11.0 listenv_0.10.1 RSpectra_0.16-2
#> [40] parallelly_1.46.1 codetools_0.2-20 DelayedArray_0.37.1
#> [43] ggforce_0.5.0 tidyselect_1.2.1 UCSC.utils_1.7.1
#> [46] farver_2.1.2 ScaledMatrix_1.19.0 viridis_0.6.5
#> [49] BiocFileCache_3.1.0 GenomicAlignments_1.47.0 jsonlite_2.0.0
#> [52] rsparse_0.5.3 BiocNeighbors_2.5.4 tidygraph_1.3.1
#> [55] progressr_0.19.0 ggalluvial_0.12.6 tools_4.6.0
#> [58] ggnewscale_0.5.2 Rcpp_1.1.1 glue_1.8.0
#> [61] gridExtra_2.3 SparseArray_1.11.13 xfun_0.57
#> [64] GenomeInfoDb_1.47.2 dplyr_1.2.1 HDF5Array_1.39.1
#> [67] gypsum_1.7.0 withr_3.0.2 BiocManager_1.30.27
#> [70] fastmap_1.2.0 rhdf5filters_1.23.3 rsvd_1.0.5
#> [73] digest_0.6.39 R6_2.6.1 dichromat_2.0-0.1
#> [76] RSQLite_2.4.6 cigarillo_1.1.0 h5mread_1.3.3
#> [79] RhpcBLASctl_0.23-42 tidyr_1.3.2 data.table_1.18.2.1
#> [82] rtracklayer_1.71.3 graphlayouts_1.2.3 httr_1.4.8
#> [85] S4Arrays_1.11.1 uwot_0.2.4 pkgconfig_2.0.3
#> [88] gtable_0.3.6 blob_1.3.0 S7_0.2.1
#> [91] XVector_0.51.0 htmltools_0.5.9 dotCall64_1.2
#> [94] bookdown_0.46 ProtGenerics_1.43.0 SeuratObject_5.3.0
#> [97] scales_1.4.0 alabaster.matrix_1.11.0 png_0.1-9
#> [100] lgr_0.5.2 knitr_1.51 reshape2_1.4.5
#> [103] rjson_0.2.23 curl_7.0.0 cachem_1.1.0
#> [106] rhdf5_2.55.16 liver_1.28 stringr_1.6.0
#> [109] BiocVersion_3.23.1 vipor_0.4.7 parallel_4.6.0
#> [112] AnnotationDbi_1.73.1 mlapi_0.1.1 restfulr_0.0.16
#> [115] pillar_1.11.1 grid_4.6.0 alabaster.schemas_1.11.0
#> [118] vctrs_0.7.2 BiocSingular_1.27.1 beachmat_2.27.4
#> [121] dbplyr_2.5.2 cluster_2.1.8.2 beeswarm_0.4.0
#> [124] paletteer_1.7.0 evaluate_1.0.5 magick_2.9.1
#> [127] tinytex_0.59 henna_0.7.5 GenomicFeatures_1.63.2
#> [130] cli_3.6.5 compiler_4.6.0 Rsamtools_2.27.1
#> [133] rlang_1.2.0 crayon_1.5.3 abdiv_0.2.0
#> [136] future.apply_1.20.2 labeling_0.4.3 ggeasy_0.1.6
#> [139] rematch2_2.1.2 ggbeeswarm_0.7.3 plyr_1.8.9
#> [142] stringi_1.8.7 alabaster.se_1.11.0 viridisLite_0.4.3
#> [145] BiocParallel_1.45.0 Biostrings_2.79.5 lazyeval_0.2.3
#> [148] Matrix_1.7-5 ExperimentHub_3.1.0 bit64_4.6.0-1
#> [151] future_1.70.0 Rhdf5lib_1.33.6 KEGGREST_1.51.1
#> [154] alabaster.ranges_1.11.0 AnnotationHub_4.1.0 igraph_2.2.3
#> [157] memoise_2.0.1 bslib_0.10.0 bit_4.6.0