Contents

1 Introduction

PostChicago is a toolbox for integrating, annotating, visualising and quantifying Capture-(Hi)-C (from now on Capture-C) data derived from the output of the CHiCAGO pipeline (Chicago package [1]).
The dataset used in this vignette and supplied with the package is from a retinoic-acid (RA) mediated embryonic stem cell (ESC) differentiation (Mahara et al., 2024 [2], Capture-C Set1).

library(PostChicago)

2 Workflow

3 Setup

3.1 Before you run PostChicago:

PostChicago relies on pre-processed files. This can be done using tools for Hi-C processing up to the point of making filtered-paired BAM files, such as HiCUP [3], followed by creating Chicago input (or .chinput) files using the CHiCAGO tools script bam2chicago.sh.

Before you use PostChicago, you need the following default directories:

  • dataPath: Contains chicago input (.chinput) files generated from paired BAM files by CHiCAGO (bam2chicago.sh). Only required in combination with runChicagoForPostChicago()
  • chicagoOutputDir: This is where the output of the CHiCAGO pipeline will be stored. These chicagoData objects are required for the PostChicago pipeline
  • designDir: Contains the 5 design files required by CHiCAGO and described in the CHiCAGO workflow
  • intDir: Directory containing the intervals of interest, such as ChIP-Seq peaks
  • outputDir: This is where the output of the PostChicago pipeline will be stored

This is the typical directory structure:

extdata <- system.file("extdata", package = "PostChicago")
dir(extdata)
#> [1] "dataPath"    "designDir"   "intervals"   "postchicago" "results"

path <- file.path(tempdir(), "my_package_output")
dir.create(path, showWarnings = FALSE)

#Before we run PostChicago, we define the default directories first:
dataPath <- file.path(extdata, "dataPath")

designDir <- file.path(extdata, "designDir")
intDir <- file.path(extdata, "intervals")

outputDir <- file.path(path, "postchicago")
dir.create(outputDir)

chicagoOutputDir <- file.path(extdata, "results")

## checking which .chinput files we have stored:
dir(dataPath)
#> [1] "RA_0h_rep1.chinput"  "RA_0h_rep2.chinput"  "RA_72h_rep1.chinput"
#> [4] "RA_72h_rep2.chinput"

## Check if you have already stored the chicagoData tables:
dir(chicagoOutputDir)
#>  [1] "cd_RA_0h.rds"               "cd_RA_0h_rep1.rds"         
#>  [3] "cd_RA_0h_rep2.rds"          "cd_RA_72h.rds"             
#>  [5] "cd_RA_72h_rep1.rds"         "cd_RA_72h_rep2.rds"        
#>  [7] "fnreadsPerBait"             "gDistrTotIntsPerBait"      
#>  [9] "hSigIntsPerBait"            "iDistReadsPerSigIntPerBait"

3.1.1 Run CHiCAGO

PostChicago contains the function runChicagoForPostChicago() to call on CHiCAGO so that the data is created and named to seamlessly fit into our pipeline. However, chicagoData objects can also be created and saved using the standard CHiCAGO pipeline.

  • Chicago Settings:

    We supply embryonic stem cell (ESC) data before (RA_0h) and after (RA_72h) a 72h of retinoic acid (RA) treatment for diffrerentiation to neuronal precursor cells.  For our ESC data, we need to modify the standard Chicago settings slightly. The settings depend on your cell type and restriction enzyme and should corerspond to the settings you used for the design files (See CHiCAGO documentation for more). Note that minFragLen and maxFragLen recommendations vary for each enzyme and refer to CHiCAGO documentation for more.
weightsPath <- file.path(
  system.file("extdata", package = "Chicago"),
  "weights"
)
weightSettings <- file.path(weightsPath, "mESC-2reps.settings")
weightSettings <- read.delim(weightSettings, header = FALSE)

mySettings <- Chicago::defaultSettings()
mySettings[grep("weight", names(mySettings))] <- weightSettings[, 2]
mySettings$minFragLen <- 100
  • Data frame
    A typical Capture-C experiment consists of several samples and at least two replicates per sample. For full functionality of PostChicago, we advise to run the Chicago pipeline on samples as well as on the individual replicates. We define the sample information in the small dataframe df. This table should contain at least two columns: filename contains the file names, type contains the sample name. All lines with the same sample name will be treated as replicates.

    Below an example of how we usually construct df:
fls <- list.files(dataPath)
fls
#> [1] "RA_0h_rep1.chinput"  "RA_0h_rep2.chinput"  "RA_72h_rep1.chinput"
#> [4] "RA_72h_rep2.chinput"
    
df <- data.frame(
    filename = fls,
    RA = stringr::str_remove(fls, "_rep.*"),
    rep = stringr::str_remove(
        stringr::str_remove(fls, "^.*h_"), "\\.chinput$"
    )
)

df$type <- df$RA

head(df)
#>              filename     RA  rep   type
#> 1  RA_0h_rep1.chinput  RA_0h rep1  RA_0h
#> 2  RA_0h_rep2.chinput  RA_0h rep2  RA_0h
#> 3 RA_72h_rep1.chinput RA_72h rep1 RA_72h
#> 4 RA_72h_rep2.chinput RA_72h rep2 RA_72h

Now you can run Chicago!

Note that for PostChicago we require more than the standard output of the Chicago pipeline. Instead, we save the entire chicagoData objects, containing reads and scores for all pairwise interactions, similar to bedgraph files. We will require this information for plotting and quantification of interaction strengths for all samples. In contrast, standard Chicago output (.ibed files) only saves this information for significant interactions.
The output will be saved in the chicagoOutputDir. The conventional naming of the chicagoData objects is cd_<mysamplename>. They can later be loaded manually using load(<myChicagoDataObject>) and accessed via cd.

runChicagoForPostChicago(
    df = df, mySettings = mySettings, outputDir = chicagoOutputDir,
    DataPath = dataPath, DesignDir = designDir
)

Running it now on individual replicates. This is recommended for the downstream analysis.

df$type <- as.character(paste(df$RA, df$rep, sep = "_"))

head(df)
#>              filename     RA  rep        type
#> 1  RA_0h_rep1.chinput  RA_0h rep1  RA_0h_rep1
#> 2  RA_0h_rep2.chinput  RA_0h rep2  RA_0h_rep2
#> 3 RA_72h_rep1.chinput RA_72h rep1 RA_72h_rep1
#> 4 RA_72h_rep2.chinput RA_72h rep2 RA_72h_rep2

runChicagoForPostChicago(
    df = df, mySettings = mySettings, 
    outputDir = chicagoOutputDir,
    DataPath = dataPath, DesignDir = designDir
)

3.1.2 Read in the datasets

Regardless of whether the chicagoData objects have been created via runChicagoForPostChicago() or via CHiCAGO (or you are using this infrastructure for entirely different data), they will now be loaded into a list.

For this, PostChicago requires two additional objects:

  • baited_genes is a GenomicRanges object that annotates view points and is created from the .baitmap (one of the CHiCAGO design files stored in the designDir), which allows to easily iterate through the data and add meaningful names to the plots. This object has two mandatory columns: 1.) re_id containing the restriction fragment IDs that correspond to the view points (baits) and .rmap; 2.) genename: name of the bait
  • rmapgr is a GenomicRanges object annotating all restriction fragments and is created from the .rmap (one of CHiCAGO design files stored in the designDir). The only mandatory column is id, containing the ID of the restriction fragment.

To generate baited_genes and rmapgr objects from design files, we can use the helper functions baitmap2baited_genes() and rmap2rmapgr():

baited_genes <- baitmap2baited_genes(designDir, save = FALSE)
rmapgr <- rmap2rmapgr(designDir, save = FALSE)

Now we are ready to read in the chicagoData tables created by runChicagoForPostChicago() into a list using the function loadCdList(). These tables contain replicate normalized reads in the column N, replicate read counts under N.1, N.2,... and Chicago scores in score. We provide precomputed chicagoData objects as examples with the package.

dir(chicagoOutputDir)
#>  [1] "cd_RA_0h.rds"               "cd_RA_0h_rep1.rds"         
#>  [3] "cd_RA_0h_rep2.rds"          "cd_RA_72h.rds"             
#>  [5] "cd_RA_72h_rep1.rds"         "cd_RA_72h_rep2.rds"        
#>  [7] "fnreadsPerBait"             "gDistrTotIntsPerBait"      
#>  [9] "hSigIntsPerBait"            "iDistReadsPerSigIntPerBait"
cdlist <- loadCdList(
    resultsDir = chicagoOutputDir,
    baited_genes = baited_genes
)
names(cdlist)
#> [1] "RA_0h"       "RA_0h_rep1"  "RA_0h_rep2"  "RA_72h"      "RA_72h_rep1"
#> [6] "RA_72h_rep2"

For downstream analysis, we extract the list L containing all integrated data and LL containing data from individual replicates:

pooledSamples <- cdlist[-grep("rep", names(cdlist))]
replicateSamples <- cdlist[grep("rep", names(cdlist))]
rm(cdlist)

These steps can also easily be done manually without these helper functions, as long as the input files are organised as such.

4 Basic statistics

To assess the quality of the data and individual replicates, PostChicago provides the function plotSigIntsStats(). In addition to the summary statistics, such as read counts per sample, this function can plot various distributions across baits using the argument plotDistribution. We will omit this step here and set plotDistribution=FALSE, since our test dataset only contains one gene. plotSigIntsStats() is important to determine inconsistencies that cannot be explained by biology, for instance a sample with a much larger total read number, but a much smaller number of significant interactions. We also find that samples with a high background have a shift in read counts within significant interactions towards a higher read count.

plotSigIntsStats(pooledSamples, plotDistribution = FALSE, 
    plotExamples = TRUE, col = palette.colors(3)[2:3])
Basic statistics of Capture-C data.

Figure 1: Basic statistics of Capture-C data

5 Visualisation

The main function for data visualisation is plotInteractions().
This function uses the chicagoData objects saved by runChicagoForPostchicago() and stored in L (samples) or LL (replicates) to generate normalized line plots over the genomic location around a view point (bait).

5.1 Lineplots

plotInteractions() iterates through the list of chicagoData tables in L or LL and plots a running mean read coverage across k adjacent fragments surrounding the view point as specified by id. The reads are normalized to the total number of reads mapped to baits per 100,000 reads and further to the total number of baits in each capture experiment, which enables a comparison of interactions between different captures. In the plots, the bait position is indicated by the grey triangle. In addition to k, we can specify the distance plotted around a bait using the zoom argument, with default being +/- 200kb. By default, the plot title contains the baitID, zoom and k, however we can add a more descriptive name, such as the genename of the view point, using the argument name
The default function creates line plots with bait +/- 200kb:

col <- palette.colors(3)[2:3]
name <- "Hoxb3"
id <- baited_genes[baited_genes$genename == name]$re_id
k <- 15
ylim <- c(0, 200)
plotInteractions(pooledSamples, id, k,
    ylim = ylim, show.legend = TRUE, name = name,
    rmapgr = rmapgr, col = col
)
Capture-C signal +/- 200kb around the Hoxb3 TSS.

Figure 2: Capture-C signal +/- 200kb around the Hoxb3 TSS

The plot above shows a clear difference between the samples with RA_72h strongly losing the interaction on the left. To assess reproducibility, we can now make the plots for individual replicates, stored in the object LL. For this, we can keep the sample colours, but redefine the line types using the lty option. We can also change the plotted area via zoom. For a larger zoom, k should be adjusted to allow for smooth display of the data:

colreps <- rep(col, each = 2)
lty <- rep(1:2, 2)
zoom <- 500000
k <- 41
plotInteractions(replicateSamples, id, k,
    zoom = zoom, ylim = ylim, 
    show.legend = TRUE, name = name,
    rmapgr = rmapgr, col = colreps, lty = lty
)
Replicate Capture-C +/- 500kb around the Hoxb3 TSS.

Figure 3: Replicate Capture-C +/- 500kb around the Hoxb3 TSS

In addition to changing the zoom, which plots the data symmetrically around the bait, we can specify the exact genomic region using xlim, which overrides any zoom settings. To make sure default zoom (+/-0.2Mb) information is omitted from the plot title, we set zoom to NA. Additionally, the legend can be plotted outside of the plot using the show.legend.outside option:

par(mfrow = c(1, 2))
xlim <- c(96150000, 96350000)
zoom <- as.numeric(NA)
k <- 31
plotInteractions(pooledSamples, id,
    zoom = zoom, k = k, ylim = ylim, 
    show.legend = FALSE, name = name,
    rmapgr = rmapgr, col = col, 
    show.legend.outside = TRUE, xlim = xlim
)
Capture-C signal within a set interval.

Figure 4: Capture-C signal within a set interval

5.1.1 Annotation of Line plots with intervals:

Annotation with intervals, such as ChIP-Seq peaks, is generally done from a GenomicRangesList object. This can be done manually. For ease of use, PostChicago offers the function loadGrl() that automatically constructs the list from either .rds or .bed files that are supplied in the intDir. Let’s load our intervals into a GenomicRangesList object grl:

dir(intDir)
#> [1] "K27ac_peaks.bed"  "K27me3_peaks.bed"
grl <- loadGrl(intDir = intDir)

Intervals in the grl can then be added to the line plots using the intervals argument:

colintervals <- palette.colors(5)[4:5]

zoom <- 1000000
k <- 51

par(mfrow = c(1, 2))
plotInteractions(pooledSamples, id, k,
    zoom = zoom, ylim = ylim, show.legend = FALSE,
    show.legend.intervals = FALSE, name = name,
    intervals = grl, colintervals = colintervals,
    rmapgr = rmapgr, col = col, show.legend.outside = TRUE
)
Capture-C signal and ChIP-Seq intervals around Hoxb3. H3K27ac/H3K27me3 peaks are shown in blue/green

Figure 5: Capture-C signal and ChIP-Seq intervals around Hoxb3
H3K27ac/H3K27me3 peaks are shown in blue/green

5.2 Creating Genome Browser tracks:

Lineplots represent static snap shots. For a more flexible data browsing, PostChicago provides the function makeBedGraphs(). Per default, this function creates two types of files for an id (= bait): i) bedgraphs of normalized interactions for each sample using the same normalization as for the Lineplots; ii) a bed file containing the bait. Unless specified otherwise, using the argument outputDir, all files are stored in the default directory called bedgraphs.

## define the bait of interest:
bait <- unique(pooledSamples[[1]]$baitID)[1] 

## create bedgraphs for the selected view point:
makeBedGraphs(bait, pooledSamples, rmapgr,
    baited_genes = baited_genes,
    outputDir = paste0(outputDir, "/bedgraphs")
)

6 Data integration

6.1 Create interactions table

To integrate data from multiple samples, we can create an interactions table containing all information about any significant interaction in one experiment using the makeIntsTable() function. We first extract all significant interactions across samples, annotate them with raw and pooled weighted average read counts. The reads are then normalized (downsampled) based on the total number of reads mapped to baits across samples (library size). While providing replicate data is optional, for completeness of the analysis we recommend to include data from both pooled samples and replicates, i.e. both L and LL. The function automatically creates pairwise interaction plots and saves them. Each interaction receives a unique intID, which is baitID;otherEndID. Since the creation of this table is quite resource-intensive, it is processed in chunks using ngroups. For efficient computing time, each group should contain ~50-150 baits. In this experiment, we are processing just one bait, therefore we will process them as ngroups = 1.
Once saved, the interaction table can be called like any other table in the .txt format.

ints <- makeIntsTable(pooledSamples, baited_genes,
    repscores = TRUE, LL = replicateSamples,
    ngroups = 1, outfolder = outputDir
)

Next, we annotate our interactions table with the intervals in grl using the annotateInts() function. The default requires intervals to have a direct overlap with the otherEnds. This can be changed to being within x bp using the maxgp argument, that works similarly to maxgap in findOverlaps().

ints <- annotateInts(ints = ints, grl = grl)

As quality control of our data, we can use makeQCplots(), which plots sample correlations of raw and downsampled read counts and Chicago scores. Two types of heatmaps are made, displaying either overall spearman and pearson correlations or clustering individual interactions.

makeQCplots(ints, outputDir, fontsize=8)
QC heatmaps

Figure 6: QC heatmaps

6.2 Visualize significant interactions using Lineplots

Significant interactions within the interactions table can be supplied to the plotInteractions() function using the argument d and visualized as vertical bars:

name <- "Hoxb3"
id <- baited_genes[baited_genes$genename == name]$re_id
k <- 31
zoom <- 500000
ylim <- c(0, 100)

par(mfrow = c(1, 2))
plotInteractions(pooledSamples, id, k, zoom,
    ylim = ylim, show.legend = FALSE,
    show.legend.intervals = FALSE, name = name,
    intervals = grl, colintervals = colintervals,
    rmapgr = rmapgr, col = col,
    show.legend.outside = TRUE, d = ints
)
Visualization of significant interactions within lineplots. Significant interactions around Hoxb3 are shaded.

Figure 7: Visualization of significant interactions within lineplots
Significant interactions around Hoxb3 are shaded.

The function can visualize up to 8 different interaction types using different colors, provided they are annotated in the column called clusters_refined. Below an example of how to annotate the interactions table and how to plot these interactions.

## select interactions detected only at 0h, these are 'lost'
lost <- ints[ints$RA_0h_score >= 5 & ints$RA_72h_score < 3, ]$intID

## select interactions detected only at 72h, these are 'gained'
gained <- ints[ints$RA_0h_score < 3 & ints$RA_72h_score >= 5, ]$intID

## annotate the interaction table with interaction types:
ints$clusters_refined <- "both"
ints[ints$intID %in% lost, ]$clusters_refined <- "RA_0h"
ints[ints$intID %in% gained, ]$clusters_refined <- "RA_72h"

par(mfrow = c(1, 2))
plotInteractions(pooledSamples, id, k, zoom,
    ylim = ylim, show.legend = FALSE,
    show.legend.intervals = FALSE,
    name = name, intervals = grl,
    colintervals = colintervals,
    rmapgr = rmapgr, col = col,
    show.legend.outside = TRUE, d = ints
)
Significant interactions colored by interaction type.

Figure 8: Significant interactions colored by interaction type

7 Quantification

7.1 Quantification within intervals of interest

So far, we have only compared interactions at the restriction fragment level. However, we are often more interested in interactions with specific intervals, such as ChIP-Seq peaks or gene promoters, which can overlap more than one restriction fragment. To achieve this, we first need to convert our gene-to-fragment interactions into gene-to-peak interactions.

7.1.1 Making oneGeneOnePeak tables

The first step is to create oneGeneOnePeak tables using the function makeOneGeneOnePeak(). Interactions with large intervals, such as ChIP-Seq peaks, can have multiple fragment-level interactions. In contrast, interval interactions within oneGeneOnePeak tables assign only one interaction per specific interval per gene. These interaction tables are automatically saved in the directory defined in the argument folder.

quantfolder <- paste0(outputDir, "/oneGeneOnePeak")
ogoplist <- makeOneGeneOnePeak(
    grl = grl, rmapgr = rmapgr, 
    ints = ints, folder = quantfolder
)

##look at the oneGeneOnePeak table:
ogoplist[[1]][1:2,]
#>                                                   intID  baitID
#> 3958676;3958519 3958676;3958519,3958520,3958521,3958522 3958676
#> 3958676;3958526 3958676;3958526,3958527,3958528,3958529 3958676
#>                                      otherEndID seqnames_bait start_bait
#> 3958676;3958519 3958519,3958520,3958521,3958522         chr11   96343553
#> 3958676;3958526 3958526,3958527,3958528,3958529         chr11   96343553
#>                 end_bait seqnames_otherEnd start_otherEnd end_otherEnd
#> 3958676;3958519 96345280             chr11       96283263     96285534
#> 3958676;3958526 96345280             chr11       96286496     96289482
#>                 interval_id_K27ac_peaks seqnames_K27ac_peaks start_K27ac_peaks
#> 3958676;3958519                    2740                chr11          96284201
#> 3958676;3958526                    2741                chr11          96286770
#>                 end_K27ac_peaks summit_re_id
#> 3958676;3958519        96285181      3958521
#> 3958676;3958526        96287453      3958527

##The tables are also saved in the quantfolder:
list.files(path = quantfolder, pattern = "oneGeneOnePeak")
#> [1] "oneGeneOnePeak_K27ac_peaks_interacting.txt" 
#> [2] "oneGeneOnePeak_K27me3_peaks_interacting.txt"

Examples:

We can visualize oneGeneOnePeak interactions with plotInteractions() by supplying oneGeneOnePeak tables instead of the original interactions table via the argument d:

fls_ogop <- list.files(path = quantfolder, pattern = "oneGeneOnePeak")

## Load oneGeneOnePeak file for H3K27me3 peaks:
ogop_K27me3 <- read.delim(paste(quantfolder, 
                                fls_ogop[grep("K27me3", fls_ogop)],sep = "/"), 
                          stringsAsFactors = FALSE)
ogop_K27me3$clusters_refined <- "K27me3_interaction"

## Load oneGeneOnePeak file for H3K27ac peaks:
ogop_K27ac <- read.delim(
    paste(quantfolder, fls_ogop[grep("K27ac", fls_ogop)],sep = "/"), 
    stringsAsFactors = FALSE
)
ogop_K27ac$clusters_refined <- "K27ac_interaction"

par(mfrow = c(1, 2))

## K27ac-overlapping interactions
plotInteractions(pooledSamples, id, k, zoom = zoom,
    ylim = ylim, show.legend = TRUE,
    name = paste("K27ac interactions \n", sep = ","),
    intervals = grl, colintervals = colintervals,
    rmapgr = rmapgr, col = col, d = ogop_K27ac,
    colints = colintervals[1]
)

## K27me3-overlapping interactions
plotInteractions(pooledSamples, id, k, zoom = zoom,
  ylim = ylim, show.legend = TRUE,
  name = paste("K27me3 interactions \n", sep = ","),
  intervals = grl, rmapgr = rmapgr, col = col,
  colintervals = colintervals, d = ogop_K27me3,
  colints = colintervals[2]
)
Visualization of oneGeneOnePeak interactions. Interactions with H3K27me3 and H3K27ac are shaded.

Figure 9: Visualization of oneGeneOnePeak interactions
Interactions with H3K27me3 and H3K27ac are shaded.

7.1.2 Matrices surrounding intervals of interest

Now that we have our oneGeneOnePeak files, we will quantify reads and scores under Hoxb3 promoter-interacting H3K27ac peaks.  To do this, we will first create interaction matrices centered at restriction fragment overlapping the center of interacting ranges using the function getMatrix(). The distance from center in restriction fragments is given via the argument zoom. The default function creates Lm, a list of matrices based on the supplied list of chicagoData objects. This list contains matrices with ranges overlaps across all samples and distance-matched control matrices. The matrices are flipped in a way that the view point is always to the right of the interaction to standardize orientation visually, since Capture-C signal tends to increase towards the view point. Lm is saved within the resfolder and can be loaded into an object called Lm.

As an example, we will now create such matrices for H3K27ac peaks.

name <- names(grl)[grep("K27ac", names(grl))]
name
#> [1] "K27ac_peaks"

## Zoom defines the +/- distance (in restriction fragments) around the intervals
## for which the matrices will be created
zoom <- 100

## Folder in which the matrices will be stored:
resfolder <- paste0(outputDir, "/matrices/")
if (!file.exists(resfolder)) {
    dir.create(resfolder)
}

## un-normalized matrix with reads
m_reads <- getMatrix(pooledSamples,
    zoom = zoom, d = ogop_K27ac, resfolder = resfolder, type = "reads",
    norm = FALSE, name = name, rmapgr = rmapgr
)

## matrix normalized to the center of interactions
## in sample 1 (as defined per normsam) with reads
m_reads_norm <- getMatrix(pooledSamples,
    zoom = zoom, d = ogop_K27ac, resfolder = resfolder, type = "reads",
    norm = TRUE, name = name, rmapgr = rmapgr, normsam = 1
)

## matrix normalized to the center of interactions
## in sample 2 (as defined per normsam) with reads
m_reads_norm2 <- getMatrix(pooledSamples,
    zoom = zoom, d = ogop_K27ac, resfolder = resfolder, type = "reads",
    norm = TRUE, name = name, rmapgr = rmapgr, normsam = 2
)

## un-normalized matrix with scores
m_scores <- getMatrix(pooledSamples,
    zoom = zoom, d = ogop_K27ac, resfolder = resfolder, type = "scores",
    norm = FALSE, name = name, rmapgr = rmapgr
)

## matrix normalized to the center of interactions
## in sample 1 (as defined per normsam) with scores
m_scores_norm <- getMatrix(pooledSamples,
    zoom = zoom, d = ogop_K27ac, resfolder = resfolder, type = "scores",
    norm = TRUE, name = name, rmapgr = rmapgr, normsam = 1
)

## matrix normalized to the center of interactions
## in sample 2 (as defined per normsam) with scores
m_scores_norm2 <- getMatrix(pooledSamples,
    zoom = zoom, d = ogop_K27ac, resfolder = resfolder, type = "scores",
    norm = TRUE, name = name, rmapgr = rmapgr, normsam = 2
)

7.1.3 Plotting aggregate profiles of interactions

Now we can use the lists of matrices to plot aggregate profiles of different interaction types using the plotAggregatePeaks() function. We are plotting aggregate read counts over all interactions. Then we will plot normalized read counts to one of the samples, which makes individual interactions more comparable.

## Loading precomputed matrices:
name <- names(grl)[grep("K27ac", names(grl))]
name
#> [1] "K27ac_peaks"

## Plots:

colaggr <- rep(col, 2)
lty <- rep(c(1, 3), each = 2)

par(mfrow = c(2, 3))

plotAggregatePeaks(m_reads,
    plotwhat = "median", k = 3, xlim = c(-50, 50),
    mainprefix = "K27ac\n", ylab = "reads",
    col = colaggr, lty = lty
)

plotAggregatePeaks(m_reads_norm,
    plotwhat = "median", k = 3, xlim = c(-50, 50),
    mainprefix = "K27ac norm to 0h\n", ylab = "reads_norm",
    col = colaggr, lty = lty
)

plotAggregatePeaks(m_reads_norm2,
    plotwhat = "median", k = 3, xlim = c(-50, 50),
    mainprefix = "K27ac norm to 72h\n", ylab = "reads_norm",
    col = colaggr, lty = lty
)

plotAggregatePeaks(m_scores,
    plotwhat = "median", k = 3, xlim = c(-50, 50),
    mainprefix = "K27ac scores\n", ylab = "scores",
    col = colaggr, lty = lty
)

plotAggregatePeaks(m_scores_norm,
    plotwhat = "median", k = 3, xlim = c(-50, 50),
    mainprefix = "K27ac scores norm to 0h\n",
    ylab = "scores_norm",
    col = colaggr, lty = lty
)

plotAggregatePeaks(m_scores_norm2,
    plotwhat = "median", k = 3, xlim = c(-50, 50),
    mainprefix = "K27ac scores norm to 72h\n",
    ylab = "scores_norm",
    col = colaggr, lty = lty
)
Aggregate Capture-C profiles. Dotted lines: Distance-matched background.

Figure 10: Aggregate Capture-C profiles
Dotted lines: Distance-matched background.

The above plots show that Hoxb3 interactions with H3K27ac increase upon differentiation, with subtle differences depending on the sample to which the data are normalized. For sanity check, we recommend to test out normalization to different samples. Also, to avoid inflation of fold changes, a general recommendation is to use the positive control sample as reference.

Let’s compare Hoxb3 interactions with H3K27ac peaks with its H3K27me3 peak interactions:

name <- names(grl)[grep("K27me3", names(grl))]
name
#> [1] "K27me3_peaks"

## matrix normalized to the center of interactions
## in sample 2 (as defined per normsam) with reads
m_reads_norm2_K27me3 <- getMatrix(pooledSamples,
    zoom = zoom, d = ogop_K27me3, resfolder = resfolder,
    type = "reads", norm = TRUE,
    name = name, rmapgr = rmapgr, normsam = 2
)

# Plots:

colaggr <- rep(col, 2)
lty <- rep(c(1, 3), each = 2)

par(mfrow = c(1, 2))
plotAggregatePeaks(m_reads_norm2,
    plotwhat = "median", k = 3,
    xlim = c(-50, 50),
    mainprefix = "K27ac norm to 72h\n",
    ylab = "reads_norm", col = colaggr, lty = lty
)

plotAggregatePeaks(m_reads_norm2_K27me3,
    plotwhat = "median", k = 3,
    xlim = c(-50, 50),
    mainprefix = "K27me3 norm to 72h\n",
    ylab = "reads_norm", col = colaggr, lty = lty
)
Aggregate Capture-C profiles for different intervals.

Figure 11: Aggregate Capture-C profiles for different intervals

This comparison shows that while the Hoxb3 promoter gains interactions with H3K27ac peaks, it loses interactions with H3K27me3 peaks, consistent with the line plots.
getMatrix() can use two types of read normalization: downsampling (default) and scaling. In both cases reads are normalized to the read coverage, but in case of scaling, the reads are scaled by the total number of promoters in the Capture experiment upon downsampling to a default library size of 100,000. This is meant to enable comparison between different capture experiments and works in the same way as read normalization in plotInteractions(). On the other hand, in case of downsampling, the reads are simply downsampled to the smallest library in L.

Below is a comparison of the two read normalization methods:

## matrix with scaling reads:
m_reads_scaling <- getMatrix(pooledSamples, zoom, ogop_K27ac,
    resfolder = resfolder, type = "reads", norm = FALSE,
    name = name, rmapgr = rmapgr, readnorm = "scaling"
)

## plots:
par(mfrow = c(1, 2))
plotAggregatePeaks(m_reads,
    plotwhat = "median", k = 3,
    xlim = c(-50, 50),
    mainprefix = "K27ac downsampled\n",
    ylab = "reads", col = colaggr, lty = lty
)

plotAggregatePeaks(m_reads_scaling,
    plotwhat = "median", k = 3,
    xlim = c(-50, 50),
    mainprefix = "K27ac scaled\n",
    ylab = "reads", col = colaggr, lty = lty
)
Aggregate Capture-C profiles for different normalizations. Downsampling normalizes read counts within one experiment, whereas scaling  
also normalizes to the number of capture view points.

Figure 12: Aggregate Capture-C profiles for different normalizations
Downsampling normalizes read counts within one experiment, whereas scaling
also normalizes to the number of capture view points.

7.1.4 Quantify reads in epigenetic peak interactions directly

fls_ogop <- list.files(path = quantfolder, pattern = "oneGeneOnePeak")

## Load oneGeneOnePeak file for H3K27me3 peaks:
ogop_K27me3 <- read.delim(paste(quantfolder, fls_ogop[grep("K27me3", fls_ogop)],
                sep = "/"), stringsAsFactors = FALSE)
ogop_K27me3$clusters_refined <- "K27me3 interaction"

## Quantify
annotatedTable <- quantifyWithinPeaks(
    data = ogop_K27me3, 
    cdlist = pooledSamples, 
    cdlist.reps = replicateSamples, 
    rmapgr = rmapgr, 
    minsize = 100, 
    maxsize = 40000,
    file = paste0(outputDir, '/aggregated5_annotated.txt')
)

##Look at the oneGeneOnePeak table before...
names(ogop_K27me3)
#>  [1] "intID"                    "baitID"                  
#>  [3] "otherEndID"               "seqnames_bait"           
#>  [5] "start_bait"               "end_bait"                
#>  [7] "seqnames_otherEnd"        "start_otherEnd"          
#>  [9] "end_otherEnd"             "interval_id_K27me3_peaks"
#> [11] "seqnames_K27me3_peaks"    "start_K27me3_peaks"      
#> [13] "end_K27me3_peaks"         "summit_re_id"            
#> [15] "clusters_refined"

##and after the annotation
names(annotatedTable)
#>  [1] "intID"                          "baitID"                        
#>  [3] "otherEndID"                     "seqnames_bait"                 
#>  [5] "start_bait"                     "end_bait"                      
#>  [7] "seqnames_otherEnd"              "start_otherEnd"                
#>  [9] "end_otherEnd"                   "interval_id_K27me3_peaks"      
#> [11] "seqnames_K27me3_peaks"          "start_K27me3_peaks"            
#> [13] "end_K27me3_peaks"               "summit_re_id"                  
#> [15] "clusters_refined"               "RA_0h_N"                       
#> [17] "RA_72h_N"                       "RA_0h_score"                   
#> [19] "RA_72h_score"                   "RA_0h_rep1_N"                  
#> [21] "RA_0h_rep2_N"                   "RA_72h_rep1_N"                 
#> [23] "RA_72h_rep2_N"                  "RA_0h_rep1_score"              
#> [25] "RA_0h_rep2_score"               "RA_72h_rep1_score"             
#> [27] "RA_72h_rep2_score"              "RA_0h_N_downsampled"           
#> [29] "RA_72h_N_downsampled"           "RA_0h_rep1_N_downsampled"      
#> [31] "RA_0h_rep2_N_downsampled"       "RA_72h_rep1_N_downsampled"     
#> [33] "RA_72h_rep2_N_downsampled"      "totalValidFrags"               
#> [35] "RA_0h_N_downsampled_mean"       "RA_72h_N_downsampled_mean"     
#> [37] "RA_0h_rep1_N_downsampled_mean"  "RA_0h_rep2_N_downsampled_mean" 
#> [39] "RA_72h_rep1_N_downsampled_mean" "RA_72h_rep2_N_downsampled_mean"
#> [41] "RA_0h_score_mean"               "RA_72h_score_mean"             
#> [43] "RA_0h_rep1_score_mean"          "RA_0h_rep2_score_mean"         
#> [45] "RA_72h_rep1_score_mean"         "RA_72h_rep2_score_mean"

Let’s now plot our quantification as boxplots using the function boxplotsCapC() for pooled…

##plot samples:
boxplotsCapC(annotatedTable, col=col, show_notch = FALSE)
Boxplots for quantitative comparison of Capture-C signal. Two types of read or score quantification is provided that give slightly different results:  
(i) Total signal within a oneGeneOnePeak interaction and  
(ii) Total signal normalized to the number of restriction fragments within which the reads  
were quantified (right).

Figure 13: Boxplots for quantitative comparison of Capture-C signal
Two types of read or score quantification is provided that give slightly different results:
(i) Total signal within a oneGeneOnePeak interaction and
(ii) Total signal normalized to the number of restriction fragments within which the reads
were quantified (right).

…and for the replicate data:

##plot replicate data:
boxplotsCapC(annotatedTable, col=rep(col,each=2), 
    reps=TRUE, show_notch = FALSE
)
Boxplots for replicate comparison.

Figure 14: Boxplots for replicate comparison

7.2 Data integration to interaction peaks

Visualizing interactions, for instance in line plots, quickly reveals that one strong interaction peak can actually be represented by multiple disjointed interactions. These interactions can be merged using the aggregatePeaks() function of PostChicago. This function takes a table of interactions and merges all otherEnds interacting with one bait that are within a distance of dis restriction fragments of each other.
If we would like to define sample-specific interaction peaks, the supplied table should contain only these sample-specific interactions.

Below is a comparison of fragment-level interactions with interactions aggregated over 5 restriction sites:

## Define names of significant interactions shown in the plot
ints$clusters_refined <- "interaction"

## Distance in restriction fragments over which to aggregate:
dis <- 5

## Let's aggregate all interactions within 5 fragments distance of each other:
aggregated5 <- aggregatePeaks(ints, dis, names(pooledSamples),
    fileprefix = "ints", outdir = outputDir
)
#> dis 5 , aggregating peaks...

## Let's aggregate all interactions within 5 fragments distance of each other
## that are significant at 0h:
aggregated5_0h <- aggregatePeaks(ints[ints$RA_0h_score >= 5, ], dis,
    names(pooledSamples),
    fileprefix = "ints_0h", outdir = outputDir
)
#> dis 5 , aggregating peaks...

## Let's aggregate all interactions within 5 fragments distance of each other
## that are significant at 72h:
aggregated5_72h <- aggregatePeaks(ints[ints$RA_72h_score >= 5, ], dis,
  names(pooledSamples),
  fileprefix = "ints_72h",
  outdir = outputDir
)
#> dis 5 , aggregating peaks...

nrow(ints)
#> [1] 82
nrow(aggregated5)
#> [1] 35
nrow(aggregated5_0h)
#> [1] 16
nrow(aggregated5_72h)
#> [1] 23

## Example plots:
name <- "Hoxb3"
id <- baited_genes[baited_genes$genename == name]$re_id
k <- 11
ylim <- c(0, 200)
xlim <- c(96150000, 96350000)
zoom <- as.numeric(NA)
colints <- 'yellow'

par(mfrow = c(2, 2))

plotInteractions(pooledSamples, id, k, zoom,
    ylim = ylim, show.legend = TRUE,
    name = paste("Fragment-level ints \n", name, sep = ""),
    rmapgr = rmapgr, col = col, d = ints, 
    xlim = xlim, colints = colints
)

plotInteractions(pooledSamples, id, k, zoom,
    ylim = ylim, show.legend = TRUE,
    name = paste("Aggregated ints, dis = ", dis, " frags \n", 
                 name, sep = ""),
    rmapgr = rmapgr, col = col, d = aggregated5, 
    xlim = xlim, colints = colints
)

plotInteractions(pooledSamples, id, k, zoom,
    ylim = ylim, show.legend = TRUE,
    name = paste("Aggregated RA_0h ints, dis = ", dis, " frags \n", 
                 name, sep = ""),
    rmapgr = rmapgr, col = col, d = aggregated5_0h, 
    xlim = xlim, colints = colints
)

plotInteractions(pooledSamples, id, k, zoom,
    ylim = ylim, show.legend = TRUE,
    name = paste("Aggregated RA_72h ints, dis = ", dis, " frags \n", 
                 name, sep = ""),
    rmapgr = rmapgr, col = col, d = aggregated5_72h, 
    xlim = xlim, colints = colints
)
Lineplots with different types of interactions. Interactions are shaded, either for fragment-to-fragment interactions (top left),  
aggregated within 5 restriction fragments over all (top right), RA_0h-specific (bottom-left)  
and RA_72h-specific (bottom right) significant interactions.

Figure 15: Lineplots with different types of interactions
Interactions are shaded, either for fragment-to-fragment interactions (top left),
aggregated within 5 restriction fragments over all (top right), RA_0h-specific (bottom-left)
and RA_72h-specific (bottom right) significant interactions.

This visualization nicely confirms that, as suspected from the interaction profiles, the majority of interactions in this region are specific for RA_0h.

Apart from fragment-level aggregation, PostChicago provides the function aggregatePeaks_regions(), which aggregates significant interactions based on the distance in bp. Below an example on how peak aggregation works over a 10kb distance.


## Distance in bp over which to aggregate:
disbp <- 10000

## Let's aggregate all interactions within 5 fragments distance of each other:
aggregated10kb <- aggregatePeaks_regions(ints, disbp, names(L),
    fileprefix = "ints",
    outdir = outputDir
)
#> distance of 10000 , aggregating peaks...
nrow(aggregated10kb)
#> [1] 11

## plot:
par(mfrow = c(1, 2))
plotInteractions(pooledSamples, id, k, zoom,
    ylim = ylim, show.legend = TRUE,
    name = paste("Aggregated ints, dis = ", dis, 
                 " frags \n", name, sep = ""),
    rmapgr = rmapgr, col = col, d = aggregated5, 
    xlim = xlim, colints = colints
)
plotInteractions(pooledSamples, id, k, zoom,
    ylim = ylim, show.legend = TRUE,
    name = paste("Aggregated ints, dis = ", disbp / 1000, 
                 "kb \n", name, sep = ""),
    rmapgr = rmapgr, col = col, d = aggregated10kb, 
    xlim = xlim, colints = colints
)
Aggregation by distance between significant interactions. Interactions are either aggregated over 5 resitriction fragments (left) or over 10kb

Figure 16: Aggregation by distance between significant interactions
Interactions are either aggregated over 5 resitriction fragments (left) or over 10kb

These examples reveal that while an aggregation over 10kb nicely highlights the two interaction peaks at the lost interaction on the left, the interactions more proximal to the view point, however, lose their granularity. As with peak calling for ChIP-Seq, we recommend testing out different aggregation types and distances for each new Capture-C experiment.
Interaction peaks can now be used to create new grl objects for oneGeneOnePeak-level analysis analogous to the analysis for epigenetic ranges.

7.3 Direct quantification within interaction peaks

Interaction peaks summarize multiple significant interactions within a peak. Accurate quantification of all signal under the interaction peaks includes quantification of reads and scores within restriction fragments that are covered, but not necessarily contain significant interactions and are therefore not part of our original interaction table.
Such quantification can be done using the oneGeneOnePeak-level analysis which requires creating a matrix following signal quantification from the matrix, as discussed above. Alternatively, direct quantification is provided by the function quantifyWithinPeaks(). This function quantifies the total sum of raw and downsampled reads and scores within the whole interaction peak and further normalizes these total counts for the total number of valid restriction fragments within the interaction peak. The function can also be used with oneGeneOnePeak tables.
We specify the data which to annotate, the ChicagoData lists used for the annotation. In addition, the function normalizes for the total number of valid restriction fragments, which are specified during the preparation of ChicagoData objects by the parameters minFragLen and maxFragLen. If such valid fragments were specified during the preparation of of ChicagoData objects, they should be also specified for quantifyWithinPeaks(). Unvalid restriction fragments are then omitted from the normalization. Here, we use minsize = 100 and maxsize = 40000 to specify that we used minFragLen = 100 and the default maxFragLen for the generation of ChicagoData objects.

annotatedTable <- quantifyWithinPeaks(
    data = aggregated5, 
    cdlist = pooledSamples, 
    cdlist.reps = replicateSamples, 
    rmapgr = rmapgr, 
    minsize = 100, 
    maxsize = 40000,
    file = paste0(outputDir, '/aggregated5_annotated.txt')
)

8 Session Info

sessionInfo()
#> R version 4.6.0 RC (2026-04-17 r89917)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 24.04.4 LTS
#> 
#> Matrix products: default
#> BLAS:   /home/biocbuild/bbs-3.24-bioc/R/lib/libRblas.so 
#> LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.12.0  LAPACK version 3.12.0
#> 
#> locale:
#>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
#>  [3] LC_TIME=en_GB              LC_COLLATE=C              
#>  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
#>  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
#>  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
#> 
#> time zone: America/New_York
#> tzcode source: system (glibc)
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#> [1] PostChicago_0.99.2 BiocStyle_2.41.0  
#> 
#> loaded via a namespace (and not attached):
#>   [1] DBI_1.3.0                   bitops_1.0-9               
#>   [3] gridExtra_2.3               httr2_1.2.2                
#>   [5] rlang_1.2.0                 magrittr_2.0.5             
#>   [7] otel_0.2.0                  matrixStats_1.5.0          
#>   [9] compiler_4.6.0              RSQLite_3.53.2             
#>  [11] systemfonts_1.3.2           vctrs_0.7.3                
#>  [13] stringr_1.6.0               pkgconfig_2.0.3            
#>  [15] crayon_1.5.3                fastmap_1.2.0              
#>  [17] magick_2.9.1                backports_1.5.1            
#>  [19] dbplyr_2.6.0                XVector_0.53.0             
#>  [21] labeling_0.4.3              Rsamtools_2.29.0           
#>  [23] rmarkdown_2.31              ragg_1.5.2                 
#>  [25] tinytex_0.60                purrr_1.2.2                
#>  [27] bit_4.6.0                   xfun_0.58                  
#>  [29] cachem_1.1.0                cigarillo_1.3.0            
#>  [31] jsonlite_2.0.0              blob_1.3.0                 
#>  [33] DelayedArray_0.39.3         BiocParallel_1.47.0        
#>  [35] parallel_4.6.0              cluster_2.1.8.2            
#>  [37] R6_2.6.1                    bslib_0.11.0               
#>  [39] stringi_1.8.7               RColorBrewer_1.1-3         
#>  [41] rtracklayer_1.73.0          rpart_4.1.27               
#>  [43] GenomicRanges_1.65.0        jquerylib_0.1.4            
#>  [45] Rcpp_1.1.1-1.1              Seqinfo_1.3.0              
#>  [47] bookdown_0.47               SummarizedExperiment_1.43.0
#>  [49] knitr_1.51                  base64enc_0.1-6            
#>  [51] IRanges_2.47.2              BiocBaseUtils_1.15.1       
#>  [53] Matrix_1.7-5                nnet_7.3-20                
#>  [55] tidyselect_1.2.1            rstudioapi_0.19.0          
#>  [57] dichromat_2.0-0.1           abind_1.4-8                
#>  [59] yaml_2.3.12                 codetools_0.2-20           
#>  [61] curl_7.1.0                  lattice_0.22-9             
#>  [63] tibble_3.3.1                withr_3.0.2                
#>  [65] Biobase_2.73.1              S7_0.2.2                   
#>  [67] evaluate_1.0.5              foreign_0.8-91             
#>  [69] BiocFileCache_3.3.0         Biostrings_2.81.3          
#>  [71] pillar_1.11.1               BiocManager_1.30.27        
#>  [73] filelock_1.0.3              MatrixGenerics_1.25.0      
#>  [75] Chicago_1.41.0              checkmate_2.3.4            
#>  [77] stats4_4.6.0                generics_0.1.4             
#>  [79] RCurl_1.98-1.19             S4Vectors_0.51.3           
#>  [81] ggplot2_4.0.3               scales_1.4.0               
#>  [83] glue_1.8.1                  pheatmap_1.0.13            
#>  [85] Hmisc_5.2-5                 tools_4.6.0                
#>  [87] BiocIO_1.23.3               data.table_1.18.4          
#>  [89] GenomicAlignments_1.49.0    XML_3.99-0.23              
#>  [91] grid_4.6.0                  Delaporte_8.4.3            
#>  [93] tidyr_1.3.2                 colorspace_2.1-2           
#>  [95] patchwork_1.3.2             htmlTable_2.5.0            
#>  [97] restfulr_0.0.17             Formula_1.2-5              
#>  [99] cli_3.6.6                   rappdirs_0.3.4             
#> [101] textshaping_1.0.5           S4Arrays_1.13.0            
#> [103] dplyr_1.2.1                 gtable_0.3.6               
#> [105] sass_0.4.10                 digest_0.6.39              
#> [107] BiocGenerics_0.59.7         SparseArray_1.13.2         
#> [109] rjson_0.2.23                htmlwidgets_1.6.4          
#> [111] farver_2.1.2                memoise_2.0.1              
#> [113] htmltools_0.5.9             lifecycle_1.0.5            
#> [115] httr_1.4.8                  MASS_7.3-65                
#> [117] bit64_4.8.2

9 References

Appendix

  1. Cairns, J., Freire-Pritchett, P., Wingett, S.W., Varnai, C., Dimond, A., Plagnol, V., Zerbino, D., Schoenfelder, S., Javierre, B.M., Osborne, C., et al. (2016). CHiCAGO: robust detection of DNA looping interactions in Capture Hi-C data. Genome Biol 17, 127.Link

  2. Mahara, S., Prussing, S., Smialkovska, V., Krall, S., Holliman, S., Blum, B., Dachtler, V., Borgers, H., Sollier, E., Plass, C., and Feldmann, A. (2024). Transient promoter interactions modulate developmental gene activation. Mol Cell 84, 4486-4502 e4487. Link

  3. Wingett, S., Ewels, P., Furlan-Magaril, M., Nagano, T., Schoenfelder, S., Fraser, P., and Andrews, S. (2015). HiCUP: pipeline for mapping and processing Hi-C data. F1000Res 4, 1310. Link