The purpose of this vignette is to show users how to retrofit their
methylSig < 0.99.0 code to work with the refactor in
version 0.99.0 and later.
In versions < 0.99.0 of methylSig, the
methylSigReadData() function read Bismark coverage files,
Bismark genome-wide CpG reports, or MethylDackel bedGraphs.
Additionally, users could destrand the data, filter by coverage, and
filter SNPs.
In versions >= 0.99.0 of methylSig, the user should
read data with bsseq::read.bismark() and then apply
functions that were once bundled within
methylSigReadData().
files = c(
system.file('extdata', 'bis_cov1.cov', package='methylSig'),
system.file('extdata', 'bis_cov2.cov', package='methylSig')
)
bsseq_stranded = bsseq::read.bismark(
files = files,
colData = data.frame(row.names = c('test1','test2')),
rmZeroCov = FALSE,
strandCollapse = FALSE
)After reading data, filter by coverage. Note, we are changing our dataset to something we can use with the downstream functions.
# Load data for use in the rest of the vignette
data(BS.cancer.ex, package = 'bsseqData')
bs = BS.cancer.ex[1:10000]
bs = filter_loci_by_coverage(bs, min_count = 5, max_count = 500)
#> Warning in value[[3L]](cond): expected no additional right arguments for
#> 'DelayedNaryIsoOp', falling back to an unknown matrix
#> Warning in value[[3L]](cond): expected no additional right arguments for
#> 'DelayedNaryIsoOp', falling back to an unknown matrixIf the locations of C-to-T and G-to-A SNPs are known, or some other set of location should be removed:
In versions < 0.99.0 of methylSig, the
methylSigTile() function combined aggregating CpG data over
pre-defined tiles and genomic windows.
In versions >= 0.99.0 of methylSig, tiling is
separated into two functions, tile_by_regions() and
tile_by_windows(). Users should chooose one or the
other.
windowed_bs = tile_by_windows(bs = bs, win_size = 10000)
#> Warning in value[[3L]](cond): expected no additional right arguments for
#> 'DelayedNaryIsoOp', falling back to an unknown matrix
#> Warning in value[[3L]](cond): expected no additional right arguments for
#> 'DelayedNaryIsoOp', falling back to an unknown matrix# Collapsed promoters on chr21 and chr22
data(promoters_gr, package = 'methylSig')
promoters_bs = tile_by_regions(bs = bs, gr = promoters_gr)
#> Warning in value[[3L]](cond): expected no additional right arguments for
#> 'DelayedNaryIsoOp', falling back to an unknown matrix
#> Warning in value[[3L]](cond): expected no additional right arguments for
#> 'DelayedNaryIsoOp', falling back to an unknown matrixIn versions < 0.99.0 of methylSig, the
methylSigCalc function had a min.per.group
parameter to determine how many samples per group had to have coverage
in order to be tested.
In versions >= 0.99.0 of methylSig, the
min.per.group functionality is performed by a separate
function filter_loci_by_group_coverage(). Also note the
change in form to define dispersion calculations, and the use of local
information.
# Look a the phenotype data for bs
bsseq::pData(bs)
#> DataFrame with 6 rows and 2 columns
#> Type Pair
#> <character> <character>
#> C1 cancer pair1
#> C2 cancer pair2
#> C3 cancer pair3
#> N1 normal pair1
#> N2 normal pair2
#> N3 normal pair3
# Require at least two samples from cancer and two samples from normal
bs = filter_loci_by_group_coverage(
bs = bs,
group_column = 'Type',
c('cancer' = 2, 'normal' = 2))After removing loci with insufficient information, we can now use the
diff_methylsig() test.
In versions < 0.99.0 of methylSig, the
methylSigDSS function also had a min.per.group
parameter to determine how many samples per group had to have coverage.
Users also couldn’t specify which methylation groups to recover. The
form of design, formula, and
contrast, remain the same in versions >= 0.99.0.
In versions >= 0.99.0 of methylSig, the single
methylSigDSS() function is replaced by a fit function
diff_dss_fit() and a test functiotn
diff_dss_test(). As with diff_methylsig(),
users should ensure enough samples have sufficient coverage with the
filter_loci_by_group_coverage() function. The
design and formula are unchanged in their
forms.
If a continuous covariate is to be tested,
filter_loci_by_group_coverage() should be skipped, as there
are no groups. In prior versions of methylSigDSS(), this
was not possible, and the group constraints were incorrectly applied
prior to testing on a continuous covariate.
# IF NOT DONE PREVIOUSLY
# Require at least two samples from cancer and two samples from normal
bs = filter_loci_by_group_coverage(
bs = bs,
group_column = 'Type',
c('cancer' = 2, 'normal' = 2))# Test the simplest model with an intercept and Type
diff_fit_simple = diff_dss_fit(
bs = bs,
design = bsseq::pData(bs),
formula = as.formula('~ Type'))
#> Fitting DML model for CpG site:The contrast parameter is also changed in its form. Note
the, additional parameters to specify how to recover group methylation.
methylation_group_column and
methylation_groups should be specified for group versus
group comparisons. For continuous covariates,
methylation_group_column is sufficient, and the samples
will be grouped into top/bottom 25 percentile based on the continuous
covariate column name given in
methylation_group_column.
# Test the simplest model for cancer vs normal
# Note, 2 rows corresponds to 2 columns in diff_fit_simple$X
simple_contrast = matrix(c(0,1), ncol = 1)
diff_simple_gr = diff_dss_test(
bs = bs,
diff_fit = diff_fit_simple,
contrast = simple_contrast,
methylation_group_column = 'Type',
methylation_groups = c('case' = 'cancer', 'control' = 'normal'))sessionInfo()
#> R version 4.5.1 (2025-06-13)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 24.04.3 LTS
#>
#> Matrix products: default
#> BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
#> LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so; LAPACK version 3.12.0
#>
#> locale:
#> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
#> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=C
#> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
#> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
#> [9] LC_ADDRESS=C LC_TELEPHONE=C
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
#>
#> time zone: Etc/UTC
#> tzcode source: system (glibc)
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] methylSig_1.23.0 BiocStyle_2.37.1
#>
#> loaded via a namespace (and not attached):
#> [1] SummarizedExperiment_1.39.2 rjson_0.2.23
#> [3] xfun_0.53 bslib_0.9.0
#> [5] rhdf5_2.53.6 Biobase_2.69.1
#> [7] lattice_0.22-7 bitops_1.0-9
#> [9] rhdf5filters_1.21.4 tools_4.5.1
#> [11] generics_0.1.4 curl_7.0.0
#> [13] stats4_4.5.1 parallel_4.5.1
#> [15] R.oo_1.27.1 Matrix_1.7-4
#> [17] BSgenome_1.79.0 data.table_1.17.8
#> [19] RColorBrewer_1.1-3 cigarillo_1.1.0
#> [21] S4Vectors_0.47.6 sparseMatrixStats_1.21.0
#> [23] lifecycle_1.0.4 compiler_4.5.1
#> [25] farver_2.1.2 Rsamtools_2.25.3
#> [27] Biostrings_2.77.2 statmod_1.5.1
#> [29] Seqinfo_0.99.4 codetools_0.2-20
#> [31] permute_0.9-8 htmltools_0.5.8.1
#> [33] sys_3.4.3 buildtools_1.0.0
#> [35] sass_0.4.10 RCurl_1.98-1.17
#> [37] yaml_2.3.10 crayon_1.5.3
#> [39] jquerylib_0.1.4 R.utils_2.13.0
#> [41] BiocParallel_1.43.4 DelayedArray_0.37.0
#> [43] cachem_1.1.0 limma_3.65.7
#> [45] abind_1.4-8 gtools_3.9.5
#> [47] locfit_1.5-9.12 digest_0.6.37
#> [49] restfulr_0.0.16 maketools_1.3.2
#> [51] splines_4.5.1 fastmap_1.2.0
#> [53] grid_4.5.1 cli_3.6.5
#> [55] SparseArray_1.9.2 DSS_2.59.0
#> [57] S4Arrays_1.9.3 XML_3.99-0.19
#> [59] h5mread_1.1.1 DelayedMatrixStats_1.31.0
#> [61] scales_1.4.0 httr_1.4.7
#> [63] rmarkdown_2.30 XVector_0.49.3
#> [65] matrixStats_1.5.0 R.methodsS3_1.8.2
#> [67] beachmat_2.25.5 HDF5Array_1.37.0
#> [69] evaluate_1.0.5 knitr_1.50
#> [71] BiocIO_1.19.0 GenomicRanges_1.63.0
#> [73] IRanges_2.43.8 rtracklayer_1.69.1
#> [75] rlang_1.1.6 Rcpp_1.1.0
#> [77] glue_1.8.0 BiocManager_1.30.26
#> [79] BiocGenerics_0.55.4 bsseq_1.45.4
#> [81] jsonlite_2.0.0 R6_2.6.1
#> [83] Rhdf5lib_1.31.1 GenomicAlignments_1.47.0
#> [85] MatrixGenerics_1.21.0