Contents

1 Introduction

hammers is a utilities suite for scRNA-seq data analysis. It provides simple tools to address tasks such as retrieving aggregate gene statistics, finding and removing rare genes, performing representation analysis, computing the center of mass for the expression of a gene of interest in low-dimensional space, and calculating silhouette and cluster-normalized silhouette.

2 Installation

To install hammers, run the following commands in an R session:

if (!require("BiocManager", quietly=TRUE))
    install.packages("BiocManager")

BiocManager::install("hammers")

3 Prerequisites

In addition to hammers, you need to install scater and scRNAseq for this tutorial.

4 Loading and preparing data

This tutorial uses an scRNA-seq human pancreas dataset. After loading the required packages, download the dataset using the BaronPancreasData function from scRNAseq. The dataset will be stored as a SingleCellExperiment object.

library(hammers)
library(scLang)
library(scRNAseq)
library(scater)

sceObj <- BaronPancreasData('human')

We will first normalize and log-transform the data using the logNormCounts function from scuttle (loaded automatically with scater).

sceObj <- logNormCounts(sceObj)

We will also need PCA and UMAP dimensions. These can be computed using the runPCA and runUMAP functions from scater:

sceObj <- runPCA(sceObj)
sceObj <- runUMAP(sceObj)

5 Extracting gene information

The genePresence function extracts the genes that appear in at least minCutoff and at most maxCutoff cells. It returns a data frame listing the number of cells in which each gene that meets these criteria is expressed:

df <- genePresence(sceObj, rownames(sceObj)[seq(100)], 300, 2000)
dim(df)
#> [1] 40  2
head(df)
#>        Gene nCells
#> AATF   AATF   1755
#> A1CF   A1CF   1408
#> AACS   AACS   1385
#> AAGAB AAGAB   1328
#> ABHD3 ABHD3   1220
#> AAAS   AAAS   1216

The geneCellSets function lists all the cells in which selected genes are expressed:

cellSets <- geneCellSets(sceObj, rownames(sceObj)[seq(100)])
head(cellSets[[1]])
#> [1] "human1_lib1.final_cell_0024" "human1_lib1.final_cell_0052"
#> [3] "human1_lib1.final_cell_0054" "human1_lib1.final_cell_0188"
#> [5] "human1_lib1.final_cell_0242" "human1_lib1.final_cell_0294"

findRareGenes is a wrapper around genePresence that can extract the genes that appear in fewer than nCells cells:

df <- findRareGenes(sceObj, nCells=10)
#> Warning in asMethod(object): sparse->dense coercion: allocating vector of size
#> 1.3 GiB
dim(df)
#> [1] 5008    2
head(df)
#>          Gene nCells
#> ADRA1D ADRA1D      9
#> ALAS2   ALAS2      9
#> ALPK2   ALPK2      9
#> ASPA     ASPA      9
#> BDNF     BDNF      9
#> BRINP2 BRINP2      9

The removeRareGenes function calls findRareGenes and subsequently removes the identified rare genes from the object:

dim(sceObj)
#> [1] 20125  8569
sceObj <- removeRareGenes(sceObj, nCells=10)
dim(sceObj)
#> [1] 15117  8569

6 Representation analysis

The repAnalysis function shows which pairs selected from two categorical columns of a single-cell object are:

df <- repAnalysis(sceObj, 'donor', 'label')
head(df)
#>         donor              label sharedCount donorCount labelCount
#> 29 GSM2230759             acinar         843       3605        958
#> 4  GSM2230757               beta         872       1937       2525
#> 17 GSM2230758              alpha         676       1724       2326
#> 7  GSM2230757        endothelial         130       1937        252
#> 48 GSM2230760             ductal         280       1303       1077
#> 12 GSM2230757 quiescent_stellate          92       1937        173
#>    expectedCount    ratio          pval       pvalAdj
#> 29     403.03303 2.091640 1.274205e-216 7.135547e-215
#> 4      570.76963 1.527762  3.187997e-62  1.753398e-60
#> 17     467.96872 1.444541  9.375316e-35  5.062671e-33
#> 7       56.96394 2.282146  2.054204e-24  1.088728e-22
#> 48     163.76835 1.709732  3.329598e-23  1.731391e-21
#> 12      39.10620 2.352568  9.889812e-19  5.043804e-17
df <- repAnalysis(sceObj, 'donor', 'label', doOverrep=FALSE)
head(df)
#>         donor  label sharedCount donorCount labelCount expectedCount      ratio
#> 15 GSM2230758 acinar           3       1724        958      192.7403 0.01556498
#> 3  GSM2230757  alpha         236       1937       2326      525.7862 0.44885164
#> 43 GSM2230760 acinar           2       1303        958      145.6732 0.01372936
#> 32 GSM2230759   beta         787       3605       2525     1062.2739 0.74086354
#> 6  GSM2230757 ductal         120       1937       1077      243.4530 0.49290822
#> 37 GSM2230759  gamma          36       3605        255      107.2791 0.33557314
#>            pval      pvalAdj
#> 15 5.183525e-94 2.902774e-92
#> 3  1.948937e-71 1.071915e-69
#> 43 1.349905e-69 7.289488e-68
#> 32 8.059698e-41 4.271640e-39
#> 6  6.325634e-25 3.289330e-23
#> 37 1.107353e-22 5.647500e-21

The output of repAnalysis can be visualized using the pvalRiverPlot function, which creates an alluvial plot for a data frame having a p-value column. Thicker connecting bands represent lower p-values:

pvalRiverPlot(df, title=NULL)

The two columns can also be visualized using the distributionPlot function:

distributionPlot(sceObj, NULL, 'donor', 'label')

The function can also plot relative percentages instead of raw counts:

distributionPlot(sceObj, NULL, 'donor', 'label', type='percs')

7 Computing centers of mass of gene expression

Given a list of genes, geneCenters can compute the center of mass of each gene based on a dimensionality reduction of choice. By default, the reduction used by geneCenters is UMAP:

genes <- c('PRSS1', 'TTR', 'GJD2', 'MS4A8')
geneCenters(sceObj, genes)
#>           UMAP1     UMAP2
#> PRSS1 -8.582282  5.340146
#> TTR    4.072024  4.118626
#> GJD2   5.733269 -5.434168
#> MS4A8  5.296443  4.101940

The centers of mass can be visualized with genesDimPlot, a function which uses geneCenters internally:

genesDimPlot(sceObj, genes, groupBy='label')

Note: The related colsDimPlot function can plot the center of mass of numeric columns of the metadata of a Seurat object. colsDimPlot uses colCenters internally, which works similarly to geneCenters. Both genesDimPlot and colsDimPlot are wrappers around pointsDimPlot, which adds labeled points on a plot generated with scLang::dimPlot.

8 Computing silhouette and normalized silhouette

The computeSilhouette computes the silhouette for an input column in the metadata or coldata of the single-cell expression object, returning the object with an appended silhouette column:

sceObj <- computeSilhouette(sceObj, 'label', 'labelSil')
head(scCol(sceObj, 'labelSil'))
#> [1] 0.8646607 0.8496181 0.8589549 0.8629453 0.8499229 0.8356816

The normalizeSilhouette function returns a data frame whose number of rows equals the number of cells and the number of columns equals the number of identities in the single-cell expression object. It requires silhouette information for the required column to be provided in the metadata/coldata of the object. For each column in the output data frame, cells outside of the corresponding identity get a score of 0. All the other cells get scores in (0, 1] by adjusted min-max normalization:

df <- normalizeSilhouette(sceObj, 'label', 'labelSil')
head(df)
#>                                acinar beta delta activated_stellate ductal
#> human1_lib1.final_cell_0001 0.9573028    0     0                  0      0
#> human1_lib1.final_cell_0002 0.9423583    0     0                  0      0
#> human1_lib1.final_cell_0003 0.9516343    0     0                  0      0
#> human1_lib1.final_cell_0004 0.9555986    0     0                  0      0
#> human1_lib1.final_cell_0005 0.9426610    0     0                  0      0
#> human1_lib1.final_cell_0006 0.9285126    0     0                  0      0
#>                             alpha epsilon gamma endothelial quiescent_stellate
#> human1_lib1.final_cell_0001     0       0     0           0                  0
#> human1_lib1.final_cell_0002     0       0     0           0                  0
#> human1_lib1.final_cell_0003     0       0     0           0                  0
#> human1_lib1.final_cell_0004     0       0     0           0                  0
#> human1_lib1.final_cell_0005     0       0     0           0                  0
#> human1_lib1.final_cell_0006     0       0     0           0                  0
#>                             macrophage schwann mast t_cell
#> human1_lib1.final_cell_0001          0       0    0      0
#> human1_lib1.final_cell_0002          0       0    0      0
#> human1_lib1.final_cell_0003          0       0    0      0
#> human1_lib1.final_cell_0004          0       0    0      0
#> human1_lib1.final_cell_0005          0       0    0      0
#> human1_lib1.final_cell_0006          0       0    0      0

The addNormSilhouette function adds non-zero normalized silhouette information to the single-cell expression object as a metadata/coldata column. For visualization purposes, we will use the featurePlot function from scLang:

sceObj <- addNormSilhouette(sceObj, df, 'labelNormSil')
featurePlot(sceObj, 'labelNormSil')

Session information

sessionInfo()
#> R version 4.6.0 alpha (2026-04-05 r89794)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 24.04.4 LTS
#> 
#> Matrix products: default
#> BLAS:   /home/biocbuild/bbs-3.23-bioc/R/lib/libRblas.so 
#> LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.12.0  LAPACK version 3.12.0
#> 
#> locale:
#>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
#>  [3] LC_TIME=en_GB              LC_COLLATE=C              
#>  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
#>  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
#>  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
#> 
#> time zone: America/New_York
#> tzcode source: system (glibc)
#> 
#> attached base packages:
#> [1] stats4    stats     graphics  grDevices utils     datasets  methods  
#> [8] base     
#> 
#> other attached packages:
#>  [1] scater_1.39.4               ggplot2_4.0.2              
#>  [3] scuttle_1.21.5              scRNAseq_2.25.0            
#>  [5] SingleCellExperiment_1.33.2 SummarizedExperiment_1.41.1
#>  [7] Biobase_2.71.0              GenomicRanges_1.63.2       
#>  [9] Seqinfo_1.1.0               IRanges_2.45.0             
#> [11] S4Vectors_0.49.1            BiocGenerics_0.57.0        
#> [13] generics_0.1.4              MatrixGenerics_1.23.0      
#> [15] matrixStats_1.5.0           scLang_0.99.3              
#> [17] hammers_0.99.9              BiocStyle_2.39.0           
#> 
#> loaded via a namespace (and not attached):
#>   [1] RcppAnnoy_0.0.23         prismatic_1.1.2          BiocIO_1.21.0           
#>   [4] bitops_1.0-9             filelock_1.0.3           tibble_3.3.1            
#>   [7] polyclip_1.10-7          XML_3.99-0.23            lifecycle_1.0.5         
#>  [10] httr2_1.2.2              globals_0.19.1           lattice_0.22-9          
#>  [13] ensembldb_2.35.0         MASS_7.3-65              alabaster.base_1.11.4   
#>  [16] magrittr_2.0.5           sass_0.4.10              rmarkdown_2.31          
#>  [19] jquerylib_0.1.4          yaml_2.3.12              otel_0.2.0              
#>  [22] text2vec_0.6.6           spam_2.11-3              sp_2.2-1                
#>  [25] DBI_1.3.0                RColorBrewer_1.1-3       abind_1.4-8             
#>  [28] purrr_1.2.1              AnnotationFilter_1.35.0  ggraph_2.2.2            
#>  [31] RCurl_1.98-1.18          tweenr_2.0.3             rappdirs_0.3.4          
#>  [34] float_0.3-3              ggrepel_0.9.8            irlba_2.3.7             
#>  [37] alabaster.sce_1.11.0     listenv_0.10.1           RSpectra_0.16-2         
#>  [40] parallelly_1.46.1        codetools_0.2-20         DelayedArray_0.37.1     
#>  [43] ggforce_0.5.0            tidyselect_1.2.1         UCSC.utils_1.7.1        
#>  [46] farver_2.1.2             ScaledMatrix_1.19.0      viridis_0.6.5           
#>  [49] BiocFileCache_3.1.0      GenomicAlignments_1.47.0 jsonlite_2.0.0          
#>  [52] rsparse_0.5.3            BiocNeighbors_2.5.4      tidygraph_1.3.1         
#>  [55] progressr_0.19.0         ggalluvial_0.12.6        tools_4.6.0             
#>  [58] ggnewscale_0.5.2         Rcpp_1.1.1               glue_1.8.0              
#>  [61] gridExtra_2.3            SparseArray_1.11.13      xfun_0.57               
#>  [64] GenomeInfoDb_1.47.2      dplyr_1.2.1              HDF5Array_1.39.1        
#>  [67] gypsum_1.7.0             withr_3.0.2              BiocManager_1.30.27     
#>  [70] fastmap_1.2.0            rhdf5filters_1.23.3      rsvd_1.0.5              
#>  [73] digest_0.6.39            R6_2.6.1                 dichromat_2.0-0.1       
#>  [76] RSQLite_2.4.6            cigarillo_1.1.0          h5mread_1.3.3           
#>  [79] RhpcBLASctl_0.23-42      tidyr_1.3.2              data.table_1.18.2.1     
#>  [82] rtracklayer_1.71.3       graphlayouts_1.2.3       httr_1.4.8              
#>  [85] S4Arrays_1.11.1          uwot_0.2.4               pkgconfig_2.0.3         
#>  [88] gtable_0.3.6             blob_1.3.0               S7_0.2.1                
#>  [91] XVector_0.51.0           htmltools_0.5.9          dotCall64_1.2           
#>  [94] bookdown_0.46            ProtGenerics_1.43.0      SeuratObject_5.3.0      
#>  [97] scales_1.4.0             alabaster.matrix_1.11.0  png_0.1-9               
#> [100] lgr_0.5.2                knitr_1.51               reshape2_1.4.5          
#> [103] rjson_0.2.23             curl_7.0.0               cachem_1.1.0            
#> [106] rhdf5_2.55.16            liver_1.28               stringr_1.6.0           
#> [109] BiocVersion_3.23.1       vipor_0.4.7              parallel_4.6.0          
#> [112] AnnotationDbi_1.73.1     mlapi_0.1.1              restfulr_0.0.16         
#> [115] pillar_1.11.1            grid_4.6.0               alabaster.schemas_1.11.0
#> [118] vctrs_0.7.2              BiocSingular_1.27.1      beachmat_2.27.4         
#> [121] dbplyr_2.5.2             cluster_2.1.8.2          beeswarm_0.4.0          
#> [124] paletteer_1.7.0          evaluate_1.0.5           magick_2.9.1            
#> [127] tinytex_0.59             henna_0.7.5              GenomicFeatures_1.63.2  
#> [130] cli_3.6.5                compiler_4.6.0           Rsamtools_2.27.1        
#> [133] rlang_1.2.0              crayon_1.5.3             abdiv_0.2.0             
#> [136] future.apply_1.20.2      labeling_0.4.3           ggeasy_0.1.6            
#> [139] rematch2_2.1.2           ggbeeswarm_0.7.3         plyr_1.8.9              
#> [142] stringi_1.8.7            alabaster.se_1.11.0      viridisLite_0.4.3       
#> [145] BiocParallel_1.45.0      Biostrings_2.79.5        lazyeval_0.2.3          
#> [148] Matrix_1.7-5             ExperimentHub_3.1.0      bit64_4.6.0-1           
#> [151] future_1.70.0            Rhdf5lib_1.33.6          KEGGREST_1.51.1         
#> [154] alabaster.ranges_1.11.0  AnnotationHub_4.1.0      igraph_2.2.3            
#> [157] memoise_2.0.1            bslib_0.10.0             bit_4.6.0