Contents

1 Introduction

scLang is a suite for package development for scRNA-seq analysis. It offers functions that can operate on both Seurat and SingleCellExperiment objects. These functions are primarily aimed to help developers build tools compatible with both types of input.

2 Installation

To install scLang, run the following commands in an R session:

if (!require("BiocManager", quietly=TRUE))
    install.packages("BiocManager")

BiocManager::install("scLang")

3 Prerequisites

In addition to scLang, you need to install scater and scRNAseq for this tutorial.

4 Loading and preparing data

This tutorial uses an scRNA-seq human pancreas dataset. After loading the required packages, download the dataset using the BaronPancreasData function from scRNAseq. The dataset will be stored as a SingleCellExperiment object.

library(scLang)
library(scRNAseq)
library(scater)
library(Seurat)

sceObj <- BaronPancreasData('human')

Next, we will normalize and log-transform the data using the logNormCounts function from scuttle (loaded automatically with scater):

sceObj <- logNormCounts(sceObj)

We will also need PCA and UMAP dimensions. These can be computed using the runPCA and runUMAP functions from scater:

sceObj <- runPCA(sceObj)
sceObj <- runUMAP(sceObj)

Now we will convert the dataset to a Seurat object:

seuratObj <- as.Seurat(sceObj)

5 Extracting the expression matrix

The scExpMat function extracts the expression matrix from a Seurat or SingleCellExperiment object:

mat1 <- scExpMat(sceObj)
dim(mat1)
#> [1] 20125  8569
mat2 <- scExpMat(seuratObj)
identical(mat1, mat2)
#> [1] TRUE

Note: By default, this function extracts normalized and log-transformed data, looking for the data assay for a Seurat object, and for the logcounts assay for a SingleCellExperiment object. This behavior can be changed using the dataType parameter.

scExpMat can also take a matrix as an argument. This option is useful when building functions allowing users to use either a single-cell expression matrix or an object of a dedicated class (Seurat, SingleCellExperiment) as input:

mat2 <- scExpMat(mat1)
identical(mat1, mat2)
#> [1] TRUE

By default, scExpMatconverts the expression matrix to a dense matrix. If this behavior is not desired, conversion can be skipped by setting densify to FALSE:

is(mat1)[1]
#> [1] "matrix"
mat2 <- scExpMat(sceObj, densify=FALSE)
is(mat2)[2]
#> [1] "CsparseMatrix"

scExpMatcan also extract the expression data only for selected genes:

mat2 <- scExpMat(sceObj, genes=rownames(sceObj)[seq(30, 80)])
dim(mat2)
#> [1]   51 8569

6 Extracting and altering metadata/coldata columns

The scCol function extracts a column from the metadata of the Seurat object or the coldata of the SingleCellExperiment object:

col1 <- scCol(seuratObj, 'label')
col2 <- scCol(sceObj, 'label')
identical(col1, col2)
#> [1] TRUE
head(col1)
#> [1] "acinar" "acinar" "acinar" "acinar" "acinar" "acinar"

It can also be used to insert a new column. Here, we just make a modified copy of the label column for the Seurat object:

scCol(seuratObj, 'labelCopy') <- paste0(scCol(seuratObj, 'label'), '_copy')
head(seuratObj[['labelCopy']])
#>                               labelCopy
#> human1_lib1.final_cell_0001 acinar_copy
#> human1_lib1.final_cell_0002 acinar_copy
#> human1_lib1.final_cell_0003 acinar_copy
#> human1_lib1.final_cell_0004 acinar_copy
#> human1_lib1.final_cell_0005 acinar_copy
#> human1_lib1.final_cell_0006 acinar_copy

7 Extracting and altering the metadata/coldata or its column names

The metadataDF function extracts the metadata/coldata data frame from a Seurat or SingleCellExpression object:

df1 <- metadataDF(seuratObj)
df2 <- metadataDF(sceObj)
identical(df1, df2)
#> [1] FALSE
head(df1)[, c(1, 2)]
#>                             orig.ident nCount_originalexp
#> human1_lib1.final_cell_0001     human1              22412
#> human1_lib1.final_cell_0002     human1              27953
#> human1_lib1.final_cell_0003     human1              16895
#> human1_lib1.final_cell_0004     human1              19300
#> human1_lib1.final_cell_0005     human1              15067
#> human1_lib1.final_cell_0006     human1              15747

The metadataNames function extracts the column names of the metadata/coldata data frame from a Seurat or SingleCellExpression object:

colNames1 <- metadataNames(seuratObj)
colNames2 <- metadataNames(sceObj)
identical(colNames1, colNames2)
#> [1] FALSE
head(colNames1)
#> [1] "orig.ident"           "nCount_originalexp"   "nFeature_originalexp"
#> [4] "donor"                "label"                "sizeFactor"

8 Creating frequency tables

The scColCounts and scColPairCounts functions are wrappers around dplyr::count and enable counting frequencies of elements from one or two categorical columns in a Seurat or SingleCellExpression object:

freq1 <- scColCounts(sceObj, 'donor')
freq2 <- scColCounts(seuratObj, 'donor')
identical(freq1, freq2)
#> [1] TRUE
head(freq1)
#> GSM2230757 GSM2230758 GSM2230759 GSM2230760 
#>       1937       1724       3605       1303
freq1 <- scColPairCounts(sceObj, 'donor', 'label')
freq2 <- scColPairCounts(seuratObj, 'donor', 'label')
identical(freq1, freq2)
#> [1] TRUE
head(freq1)
#>        donor              label   n
#> 1 GSM2230757             acinar 110
#> 2 GSM2230757 activated_stellate  51
#> 3 GSM2230757              alpha 236
#> 4 GSM2230757               beta 872
#> 5 GSM2230757              delta 214
#> 6 GSM2230757             ductal 120

9 Visualization

scLang includes three visualization functions that adapt Seurat visualization tools, extending their usage to SingleCellExpression objects in addition to Seurat objects.

The dimPlot function mimics the essential behavior of the DimPlot function from Seurat:

dimPlot(sceObj, groupBy='label')
#> Found more than one class "package_version" in cache; using the first, from namespace 'SeuratObject'
#> Also defined by 'alabaster.base'
#> Found more than one class "package_version" in cache; using the first, from namespace 'SeuratObject'
#> Also defined by 'alabaster.base'
#> Found more than one class "package_version" in cache; using the first, from namespace 'SeuratObject'
#> Also defined by 'alabaster.base'
#> Found more than one class "package_version" in cache; using the first, from namespace 'SeuratObject'
#> Also defined by 'alabaster.base'

The featurePlot functions mimics the FeaturePlot function from Seurat (though using a different color scheme):

featurePlot(sceObj, 'SOX4')
#> Found more than one class "package_version" in cache; using the first, from namespace 'SeuratObject'
#> Also defined by 'alabaster.base'
#> Found more than one class "package_version" in cache; using the first, from namespace 'SeuratObject'
#> Also defined by 'alabaster.base'
#> Found more than one class "package_version" in cache; using the first, from namespace 'SeuratObject'
#> Also defined by 'alabaster.base'
#> Found more than one class "package_version" in cache; using the first, from namespace 'SeuratObject'
#> Also defined by 'alabaster.base'

The violinPlot functions mimics the ViolinPlot function from Seurat:

violinPlot(sceObj, 'SOX4', groupBy='label')

Session information

sessionInfo()
#> R version 4.6.0 alpha (2026-04-05 r89794)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 24.04.4 LTS
#> 
#> Matrix products: default
#> BLAS:   /home/biocbuild/bbs-3.23-bioc/R/lib/libRblas.so 
#> LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.12.0  LAPACK version 3.12.0
#> 
#> locale:
#>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
#>  [3] LC_TIME=en_GB              LC_COLLATE=C              
#>  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
#>  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
#>  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
#> 
#> time zone: America/New_York
#> tzcode source: system (glibc)
#> 
#> attached base packages:
#> [1] stats4    stats     graphics  grDevices utils     datasets  methods  
#> [8] base     
#> 
#> other attached packages:
#>  [1] Seurat_5.4.0                SeuratObject_5.3.0         
#>  [3] sp_2.2-1                    scater_1.39.4              
#>  [5] ggplot2_4.0.2               scuttle_1.21.5             
#>  [7] scRNAseq_2.25.0             SingleCellExperiment_1.33.2
#>  [9] SummarizedExperiment_1.41.1 Biobase_2.71.0             
#> [11] GenomicRanges_1.63.2        Seqinfo_1.1.0              
#> [13] IRanges_2.45.0              S4Vectors_0.49.1           
#> [15] BiocGenerics_0.57.0         generics_0.1.4             
#> [17] MatrixGenerics_1.23.0       matrixStats_1.5.0          
#> [19] scLang_0.99.3               BiocStyle_2.39.0           
#> 
#> loaded via a namespace (and not attached):
#>   [1] spatstat.sparse_3.1-0    ProtGenerics_1.43.0      bitops_1.0-9            
#>   [4] httr_1.4.8               RColorBrewer_1.1-3       sctransform_0.4.3       
#>   [7] tools_4.6.0              alabaster.base_1.11.4    R6_2.6.1                
#>  [10] HDF5Array_1.39.1         uwot_0.2.4               lazyeval_0.2.3          
#>  [13] rhdf5filters_1.23.3      withr_3.0.2              gridExtra_2.3           
#>  [16] progressr_0.19.0         cli_3.6.5                spatstat.explore_3.8-0  
#>  [19] fastDummies_1.7.5        labeling_0.4.3           prismatic_1.1.2         
#>  [22] alabaster.se_1.11.0      sass_0.4.10              spatstat.data_3.1-9     
#>  [25] S7_0.2.1                 ggridges_0.5.7           pbapply_1.7-4           
#>  [28] Rsamtools_2.27.1         dichromat_2.0-0.1        parallelly_1.46.1       
#>  [31] RSQLite_2.4.6            BiocIO_1.21.0            spatstat.random_3.4-5   
#>  [34] ica_1.0-3                dplyr_1.2.1              Matrix_1.7-5            
#>  [37] ggbeeswarm_0.7.3         abind_1.4-8              lifecycle_1.0.5         
#>  [40] yaml_2.3.12              rhdf5_2.55.16            SparseArray_1.11.13     
#>  [43] BiocFileCache_3.1.0      Rtsne_0.17               paletteer_1.7.0         
#>  [46] grid_4.6.0               blob_1.3.0               promises_1.5.0          
#>  [49] ExperimentHub_3.1.0      crayon_1.5.3             miniUI_0.1.2            
#>  [52] lattice_0.22-9           beachmat_2.27.4          cowplot_1.2.0           
#>  [55] GenomicFeatures_1.63.2   cigarillo_1.1.0          KEGGREST_1.51.1         
#>  [58] magick_2.9.1             pillar_1.11.1            knitr_1.51              
#>  [61] abdiv_0.2.0              rjson_0.2.23             future.apply_1.20.2     
#>  [64] codetools_0.2-20         glue_1.8.0               spatstat.univar_3.1-7   
#>  [67] data.table_1.18.2.1      vctrs_0.7.2              png_0.1-9               
#>  [70] gypsum_1.7.0             spam_2.11-3              gtable_0.3.6            
#>  [73] rematch2_2.1.2           cachem_1.1.0             xfun_0.57               
#>  [76] S4Arrays_1.11.1          mime_0.13                tidygraph_1.3.1         
#>  [79] survival_3.8-6           tinytex_0.59             fitdistrplus_1.2-6      
#>  [82] ROCR_1.0-12              liver_1.28               nlme_3.1-169            
#>  [85] bit64_4.6.0-1            alabaster.ranges_1.11.0  filelock_1.0.3          
#>  [88] RcppAnnoy_0.0.23         GenomeInfoDb_1.47.2      bslib_0.10.0            
#>  [91] irlba_2.3.7              vipor_0.4.7              KernSmooth_2.23-26      
#>  [94] otel_0.2.0               DBI_1.3.0                tidyselect_1.2.1        
#>  [97] bit_4.6.0                compiler_4.6.0           curl_7.0.0              
#> [100] httr2_1.2.2              BiocNeighbors_2.5.4      h5mread_1.3.3           
#> [103] DelayedArray_0.37.1      plotly_4.12.0            bookdown_0.46           
#> [106] rtracklayer_1.71.3       scales_1.4.0             lmtest_0.9-40           
#> [109] ggeasy_0.1.6             rappdirs_0.3.4           goftest_1.2-3           
#> [112] stringr_1.6.0            digest_0.6.39            spatstat.utils_3.2-2    
#> [115] alabaster.matrix_1.11.0  rmarkdown_2.31           XVector_0.51.0          
#> [118] htmltools_0.5.9          pkgconfig_2.0.3          dbplyr_2.5.2            
#> [121] fastmap_1.2.0            ensembldb_2.35.0         rlang_1.2.0             
#> [124] htmlwidgets_1.6.4        UCSC.utils_1.7.1         shiny_1.13.0            
#> [127] farver_2.1.2             jquerylib_0.1.4          zoo_1.8-15              
#> [130] jsonlite_2.0.0           BiocParallel_1.45.0      BiocSingular_1.27.1     
#> [133] RCurl_1.98-1.18          magrittr_2.0.5           dotCall64_1.2           
#> [136] patchwork_1.3.2          Rhdf5lib_1.33.6          Rcpp_1.1.1              
#> [139] ggnewscale_0.5.2         viridis_0.6.5            reticulate_1.45.0       
#> [142] stringi_1.8.7            alabaster.schemas_1.11.0 ggalluvial_0.12.6       
#> [145] ggraph_2.2.2             MASS_7.3-65              AnnotationHub_4.1.0     
#> [148] plyr_1.8.9               parallel_4.6.0           listenv_0.10.1          
#> [151] ggrepel_0.9.8            deldir_2.0-4             Biostrings_2.79.5       
#> [154] graphlayouts_1.2.3       splines_4.6.0            tensor_1.5.1            
#> [157] igraph_2.2.3             spatstat.geom_3.7-3      RcppHNSW_0.6.0          
#> [160] reshape2_1.4.5           ScaledMatrix_1.19.0      BiocVersion_3.23.1      
#> [163] XML_3.99-0.23            evaluate_1.0.5           henna_0.7.5             
#> [166] BiocManager_1.30.27      tweenr_2.0.3             httpuv_1.6.17           
#> [169] RANN_2.6.2               tidyr_1.3.2              purrr_1.2.1             
#> [172] polyclip_1.10-7          future_1.70.0            scattermore_1.2         
#> [175] alabaster.sce_1.11.0     ggforce_0.5.0            rsvd_1.0.5              
#> [178] xtable_1.8-8             restfulr_0.0.16          AnnotationFilter_1.35.0 
#> [181] RSpectra_0.16-2          later_1.4.8              viridisLite_0.4.3       
#> [184] tibble_3.3.1             memoise_2.0.1            beeswarm_0.4.0          
#> [187] AnnotationDbi_1.73.1     GenomicAlignments_1.47.0 cluster_2.1.8.2         
#> [190] globals_0.19.1