This vignette demonstrates the core functionality of the
HiCaptuRe package using example datasets bundled with the
package. We will walk through typical tasks such as loading interaction
data, annotating interactions, and exporting results in various
formats.
To begin, we first load the example data files provided in
HiCaptuRe:
ibed1_file <- system.file("extdata", "ibed1_example.zip", package = "HiCaptuRe")
ibed2_file <- system.file("extdata", "ibed2_example.zip", package = "HiCaptuRe")
peakmatrix_file <- system.file("extdata", "peakmatrix_example.zip", package = "HiCaptuRe")
annotation_file <- system.file("extdata", "annotation_example.txt", package = "HiCaptuRe")These files will be used throughout the vignette to showcase the full
HiCaptuRe workflow.
Next, we load the package:
load_interactions()The first step in any HiCaptuRe workflow is importing
your interaction file. This is done using the
load_interactions() function.
This function performs multiple tasks:
Automatically detects the format of the input file
(ibed, seqMonk, washU,
washUold, bedpe, or
peakmatrix)
Removes technical artifacts such as duplicated interactions
Normalizes the data into a consistent and accessible structure: a
HiCaptuRe object
Specifically, load_interactions() ensures that:
Each interaction appears only once (even if present as A–B and B–A in the file)
For duplicate interactions with differing CHiCAGO scores, the highest score is retained
Structural consistency is enforced across input formats (e.g., missing annotations or read counts are filled in with placeholders)
The function automatically detects the format (in this case, ibed) and loads the data into a structured HiCaptuRe object.
## HiCaptuRe object with 4352 interactions and 9 metadata columns:
## seqnames1 ranges1 seqnames2 ranges2 |
## <Rle> <IRanges> <Rle> <IRanges> |
## [1] 19 290159-302184 --- 19 343893-369651 |
## [2] 19 290159-302184 --- 19 370987-379828 |
## [3] 19 290159-302184 --- 19 402130-410516 |
## [4] 19 343893-369651 --- 19 530387-539467 |
## [5] 19 506618-515156 --- 19 530387-539467 |
## ... ... ... ... ... ... .
## [4348] 19 58462925-58468938 --- 19 58477045-58497925 |
## [4349] 19 58462925-58468938 --- 19 58517548-58521749 |
## [4350] 19 58462925-58468938 --- 19 58563728-58576169 |
## [4351] 19 58517548-58521749 --- 19 58576170-58581023 |
## [4352] 19 58517548-58521749 --- 19 58581053-58583740 |
## bait_1 ID_1 bait_2 ID_2
## <character> <integer> <character> <integer>
## [1] ENST00000327790,ENST.. 759694 ENST00000264819,ENST.. 759702
## [2] ENST00000327790,ENST.. 759694 ENST00000530711,ENST.. 759704
## [3] ENST00000327790,ENST.. 759694 ENST00000332235 759707
## [4] ENST00000264819,ENST.. 759702 ENST00000215574,ENST.. 759719
## [5] ENST00000359315,ENST.. 759715 ENST00000215574,ENST.. 759719
## ... ... ... ... ...
## [4348] ENST00000535298,ENST.. 771164 ENST00000516525,ENST.. 771167
## [4349] ENST00000535298,ENST.. 771164 ENST00000354590,ENST.. 771171
## [4350] ENST00000535298,ENST.. 771164 ENST00000600004,ENST.. 771177
## [4351] ENST00000354590,ENST.. 771171 . 771178
## [4352] ENST00000354590,ENST.. 771171 . 771180
## reads CS counts int distance
## <integer> <numeric> <integer> <character> <numeric>
## [1] 21 6.07 1 B_B 60600
## [2] 15 7.00 1 B_B 79236
## [3] 10 5.60 1 B_B 110151
## [4] 5 7.83 1 B_B 178155
## [5] 18 11.40 1 B_B 24040
## ... ... ... ... ... ...
## [4348] 121 8.12 1 B_B 21553
## [4349] 39 5.76 1 B_B 53717
## [4350] 40 9.23 1 B_B 104017
## [4351] 116 6.54 1 B_OE 58948
## [4352] 131 8.42 1 B_OE 62748
## -------
## regions: 2073 ranges and 4 metadata columns
## seqinfo: 24 sequences from GRCh38 genome
##
## [Slots in HiCaptuRe object]:
## - @parameters(2) : digest, load
## - @ByBaits(0) : NULL
## - @ByRegions(0) : NULL
HiCaptuRe object?The HiCaptuRe object extends the standard
GenomicInteractions object by including additional metadata
and slots relevant to Capture Hi-C experiments.
Each interaction includes:
bait_1, bait_2: annotations for each
anchor. If not captured, a “.” placeholder is used.
ID_1, ID_2: restriction fragment IDs
derived from the reference genome digest (via
digest_genome()).
reads: number of reads supporting the
interaction.
CS: CHiCAGO score associated with the
interaction.
counts: count of times the interaction appears (will
always be 1 post-cleaning).
int: interaction class — "B_B" for
bait–bait or "B_OE" for bait–other end.
distance: distance (in bp) between the midpoints of
the two interacting fragments.
Note When loading formats that lack read count or
annotation information (e.g., washU or bedpe),
load_interactions() automatically fills:
"non-annotated" in the bait fields
0 in the reads column
Beyond interaction data, the HiCaptuRe object contains
additional internal components stored in S4 slots. These include both
inherited slots from the GenomicInteractions class and new
ones added specifically by HiCaptuRe.
We can inspect the available slots using
slotNames():
## [1] "parameters" "ByBaits" "ByRegions" "anchor1"
## [5] "anchor2" "regions" "NAMES" "elementMetadata"
## [9] "metadata"
The slots anchor1, anchor2, and others like
regions and elementMetadata come from the
GenomicInteractions class. HiCaptuRe
introduces three new slots:
parameters: stores metadata for
reproducibility
ByBaits and ByRegions: used to store
interaction summaries generated by other functions
The parameters slot is automatically updated each time
you run a major HiCaptuRe function. This allows full
traceability of how the object was built, including the genome used,
enzyme, digestion settings, and file origins.
We can inspect this slot with getParameters():
## $digest
## Genome
## "BSgenome.Hsapiens.NCBI.GRCh38"
## Genome_Package
## "BSgenome.Hsapiens.NCBI.GRCh38"
## Restriction_Enzyme
## "HindIII"
## Motif
## "AAGCTT"
## Cut_Position
## "1"
## Selected_Chromosomes
## "1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,X,Y"
## PAR_mask
## "TRUE"
## PAR_file
## "/tmp/RtmpgxJTAX/Rinst238b5e588862/HiCaptuRe/extdata/PAR_Homo_sapiens_coordinates.txt"
##
## $load
## file
## "/tmp/RtmpgxJTAX/Rinst238b5e588862/HiCaptuRe/extdata/ibed1_example.zip"
## format
## "ibed"
This tells us that two major operations have been logged:
digest: shows the genome, restriction enzyme, and
motif used when the genome was processed via
digest_genome()
load: tracks the interaction file path and the
format detected during load_interactions()
This tracking system supports transparency and reproducibility throughout the analysis pipeline.
digest_genome()The function digest_genome() performs a virtual
digestion of a reference genome using a restriction enzyme motif. It
generates a data frame of restriction fragments, each identified by a
unique fragment_ID, which defines the resolution for
subsequent interaction mapping.
This function is used both explicitly and internally:
You can call it directly to explore the digestion or prepare custom fragments.
It is called internally by load_interactions() to
ensure that all loaded interaction files share a consistent genomic
fragment map.
digest_genome() supports both manual specification and
automatic lookup of enzyme details:
If you provide only RE_name (e.g., “HindIII”), the
function will automatically fill in the known motif and
cut position
Supported enzymes include:
| Enzyme | Motif | Cut_Position |
|---|---|---|
| HindIII | A^AGCTT | 1 |
| EcoRI | G^AATTC | 1 |
| BamHI | G^GATCC | 1 |
| MboI | ^GATC | 0 |
| DpnII | ^GATC | 0 |
You can also manually override the motif and cut position if needed
Key arguments that customize the digestion:
genome: Genome identifier (e.g.,
"GRCh38"). Must match a BSgenome
package.
select_chr: Vector of chromosomes to digest (e.g.,
1:22, "X", "Y"). This helps skip
unplaced contigs or alternative scaffolds.
PAR_mask: Logical. If TRUE, masks
pseudoautosomal regions (PARs) from the Y chromosome to match X,
preventing artificial duplicates.
PAR_file: Optional file with PAR coordinates
(columns: seqnames, start, end).
For "GRCh38", this file is included in the package and used
automatically.
Note: For human genomes, PAR masking can avoid
differences between UCSC and Ensembl versions of chromosome Y. If
PAR_mask = TRUE, masked regions in Y are replaced with
"N" to prevent motif matches.
The first time you digest a genome, it may take a few seconds to
compute all fragments. Internally, HiCaptuRe caches this
result when used via load_interactions(), making repeated
use much faster.
digest <- digest_genome(genome = "BSgenome.Hsapiens.NCBI.GRCh38", RE_name = "HindIII")
head(digest$digest)## seqnames start end fragment_ID
## <char> <num> <num> <int>
## 1: 1 1 16007 1
## 2: 1 16008 24571 2
## 3: 1 24572 27981 3
## 4: 1 27982 30429 4
## 5: 1 30430 32153 5
## 6: 1 32154 32774 6
This returns a list with:
digest: a data frame with seqnames,
start, end, and
fragment_ID
parameters: metadata about the digestion process
(enzyme, motif, PAR settings, etc.)
seqinfo: reference genome sequence metadata
annotate_interactions()The annotate_interactions() function allows you to
assign biological annotations to bait fragments in your interaction
data. This step is especially important when working with interaction
files that lack annotation, such as those in washU or bedpe formats.
In Capture Hi-C experiments, the capture library defines a set of target regions (e.g., gene promoters, enhancers, or structural variants) that were enriched during sequencing. The annotation file provided to this function should represent that design — one line per targeted restriction fragment.
Annotation refers to linking each restriction fragment to a meaningful identifier, such as:
Ensembl gene or transcript ID
Gene symbol
Enhancer or regulatory region ID
Custom feature names (e.g., from a BED or GTF file)
ibed1_annotated <- annotate_interactions(
interactions = ibed1,
annotation = annotation_file
)
ibed1_annotated## HiCaptuRe object with 4352 interactions and 9 metadata columns:
## seqnames1 ranges1 seqnames2 ranges2 |
## <Rle> <IRanges> <Rle> <IRanges> |
## [1] 19 290159-302184 --- 19 343893-369651 |
## [2] 19 290159-302184 --- 19 370987-379828 |
## [3] 19 290159-302184 --- 19 402130-410516 |
## [4] 19 343893-369651 --- 19 530387-539467 |
## [5] 19 506618-515156 --- 19 530387-539467 |
## ... ... ... ... ... ... .
## [4348] 19 58462925-58468938 --- 19 58477045-58497925 |
## [4349] 19 58462925-58468938 --- 19 58517548-58521749 |
## [4350] 19 58462925-58468938 --- 19 58563728-58576169 |
## [4351] 19 58517548-58521749 --- 19 58576170-58581023 |
## [4352] 19 58517548-58521749 --- 19 58581053-58583740 |
## bait_1 ID_1 bait_2 ID_2 reads
## <character> <integer> <character> <integer> <integer>
## [1] PLPP2 759694 MIER2 759702 21
## [2] PLPP2 759694 THEG 759704 15
## [3] PLPP2 759694 C2CD4C 759707 10
## [4] MIER2 759702 CDC34 759719 5
## [5] TPGS1,MADCAM1-AS1 759715 CDC34 759719 18
## ... ... ... ... ... ...
## [4348] ZNF324 771164 RNU6-1337P,RN7SL693P 771167 121
## [4349] ZNF324 771164 ZBTB45 771171 39
## [4350] ZNF324 771164 MZF1,CENPBD1P1,ENSG0.. 771177 40
## [4351] ZBTB45 771171 . 771178 116
## [4352] ZBTB45 771171 . 771180 131
## CS counts int distance
## <numeric> <integer> <character> <numeric>
## [1] 6.07 1 B_B 60600
## [2] 7.00 1 B_B 79236
## [3] 5.60 1 B_B 110151
## [4] 7.83 1 B_B 178155
## [5] 11.40 1 B_B 24040
## ... ... ... ... ...
## [4348] 8.12 1 B_B 21553
## [4349] 5.76 1 B_B 53717
## [4350] 9.23 1 B_B 104017
## [4351] 6.54 1 B_OE 58948
## [4352] 8.42 1 B_OE 62748
## -------
## regions: 2073 ranges and 4 metadata columns
## seqinfo: 24 sequences from GRCh38 genome
##
## [Slots in HiCaptuRe object]:
## - @parameters(3) : digest, load, annotate
## - @ByBaits(0) : NULL
## - @ByRegions(0) : NULL
This call updates the fields bait_1 and
bait_2 with new annotations for each bait fragment based on
overlap with your capture library.
For example, the original bait_1 column contained only
Ensembl transcript IDs; now it includes gene names.
As with all major HiCaptuRe functions, annotation
settings are tracked in the object’s parameters slot:
## annotation
## "/tmp/RtmpgxJTAX/Rinst238b5e588862/HiCaptuRe/extdata/annotation_example.txt"
After annotating interactions, it is often useful to focus on a
subset of the data based on a biologically meaningful list of features.
HiCaptuRe supports two main ways to filter
interactions:
By bait name using interactionsByBaits()
By genomic region using
interactionsByRegions()
Both functions return a new HiCaptuRe object containing
only the selected interactions, and each one updates its own
corresponding summary slot.
interactionsByBaits()The interactionsByBaits() function filters your
interaction dataset to retain only those interactions where at least one
anchor corresponds to a bait of interest.
This is especially useful when you want to focus your analysis on specific genes or regulatory elements (e.g., from an RNA-seq differential expression result or a curated gene list).
baits_of_interest <- c("DAZAP1", "PLIN3", "FPR3", "TP53")
ibed_byBaits <- interactionsByBaits(
interactions = ibed1_annotated,
baits = baits_of_interest
)
ibed_byBaits## HiCaptuRe object with 22 interactions and 9 metadata columns:
## seqnames1 ranges1 seqnames2 ranges2 |
## <Rle> <IRanges> <Rle> <IRanges> |
## [1] 19 1426730-1442089 --- 19 1442969-1471442 |
## [2] 19 1426730-1442089 --- 19 1508633-1520533 |
## [3] 19 1426730-1442089 --- 19 1587363-1604954 |
## [4] 19 1426730-1442089 --- 19 1847649-1858038 |
## [5] 19 4861596-4868984 --- 19 4913058-4922645 |
## ... ... ... ... ... ... .
## [18] 19 51789893-51802002 --- 19 53672810-53676875 |
## [19] 19 51789893-51802002 --- 19 53676876-53690296 |
## [20] 19 51789893-51802002 --- 19 56802424-56804634 |
## [21] 19 51789893-51802002 --- 19 56826647-56832535 |
## [22] 19 51789893-51802002 --- 19 56889548-56900508 |
## bait_1 ID_1 bait_2 ID_2
## <character> <integer> <character> <integer>
## [1] DAZAP1,RPS15,ENSG000.. 759786 APC2,ENSG00000267317 759788
## [2] DAZAP1,RPS15,ENSG000.. 759786 ADAMTSL5 759793
## [3] DAZAP1,RPS15,ENSG000.. 759786 MBD3,UQCR11 759804
## [4] DAZAP1,RPS15,ENSG000.. 759786 REXO1,ENSG00000267125 759846
## [5] PLIN3 760214 . 760219
## ... ... ... ... ...
## [18] FPR3 769831 . 770286
## [19] FPR3 769831 MIR515-2,MIR515-1,MI.. 770287
## [20] FPR3 769831 . 770835
## [21] FPR3 769831 . 770844
## [22] FPR3 769831 . 770863
## reads CS counts int distance
## <integer> <numeric> <integer> <character> <numeric>
## [1] 138 11.38 1 B_B 22796
## [2] 36 6.92 1 B_B 80173
## [3] 4 5.01 1 B_B 161749
## [4] 8 5.36 1 B_B 418434
## [5] 99 7.92 1 B_OE 52561
## ... ... ... ... ... ...
## [18] 12 8.56 1 B_OE 1878895
## [19] 8 10.83 1 B_B 1887638
## [20] 7 5.03 1 B_OE 5007581
## [21] 8 6.88 1 B_OE 5033643
## [22] 10 10.61 1 B_OE 5099080
## -------
## regions: 2073 ranges and 4 metadata columns
## seqinfo: 24 sequences from GRCh38 genome
##
## [Slots in HiCaptuRe object]:
## - @parameters(4) : digest, load, annotate, ByBaits_1
## - @ByBaits(1) : [[1]] 4 baits
## - @ByRegions(0) : NULL
In this case, the filtered object contains only the 22 interactions involving the selected baits.
When printing the resulting object, you’ll notice in the output that
the ByBaits slot has been updated.
getByBaits()To view the bait-centric summary added by this function:
## [[1]]
## # A tibble: 4 × 7
## fragmentID bait N_int NOE interactingID interactingAnnotation
## <int> <chr> <dbl> <dbl> <chr> <chr>
## 1 759786 DAZAP1 4 0 759788,759793,759804,7598… APC2,ENSG00000267317…
## 2 760214 PLIN3 5 3 760219,760228,760231,7602… .,KDM4B,SAFB2,SAFB
## 3 769831 FPR3 13 8 768661,768665,770276,7702… .,DPRX,RN7SL317P,RNU…
## 4 NA TP53 0 0 <NA> <NA>
## # ℹ 1 more variable: interactingDistance <chr>
This summary includes:
The bait name and fragment ID where it is present
Number of interactions it participates in
Number of distinct other ends that is interacting with
IDs, annotations and distance of the interacting fragments
If some bait is not present in the data it creates a row with missing data.
Each time you call interactionsByBaits(), a new entry is
added to the ByBaits slot, so you can keep track of
multiple filtering events.
As the previous functions the slot parameters has also
been updated.
interactionsByRegions()The interactionsByRegions() function filters the
interaction dataset to retain interactions in which at least one anchor
overlaps a given region of interest.
This is ideal for integrating orthogonal omics data such as ChIP-seq peaks, CUT&RUN binding sites, ATAC-seq regions, or structural variant calls.
regions <- GenomicRanges::GRanges(
seqnames = 19,
ranges = IRanges(start = c(500000, 1000000), end = c(510000, 1100000))
)
ibed_byRegions <- interactionsByRegions(
interactions = ibed1_annotated,
regions = regions
)
ibed_byRegions## HiCaptuRe object with 10 interactions and 17 metadata columns:
## seqnames1 ranges1 seqnames2 ranges2 |
## <Rle> <IRanges> <Rle> <IRanges> |
## [1] 19 506618-515156 --- 19 530387-539467 |
## [2] 19 1017629-1022815 --- 19 1065924-1076134 |
## [3] 19 1017629-1022815 --- 19 1232135-1263473 |
## [4] 19 1017629-1022815 --- 19 1650906-1661305 |
## [5] 19 1022816-1036370 --- 19 1065924-1076134 |
## [6] 19 1065924-1076134 --- 19 897225-905872 |
## [7] 19 1065924-1076134 --- 19 906012-907931 |
## [8] 19 1065924-1076134 --- 19 1232135-1263473 |
## [9] 19 1065924-1076134 --- 19 1263474-1266236 |
## [10] 19 1065924-1076134 --- 19 1410179-1413602 |
## bait_1 ID_1 bait_2 ID_2 reads
## <character> <integer> <character> <integer> <integer>
## [1] TPGS1,MADCAM1-AS1 759715 CDC34 759719 18
## [2] TMEM259,RNU6-2 759752 ARHGAP45 759755 69
## [3] TMEM259,RNU6-2 759752 CIRBP,ATP5F1D,CBARP,.. 759767 9
## [4] TMEM259,RNU6-2 759752 TCF3 759813 6
## [5] CNN2 759753 ARHGAP45 759755 69
## [6] ARHGAP45 759755 . 759745 11
## [7] ARHGAP45 759755 . 759747 13
## [8] ARHGAP45 759755 CIRBP,ATP5F1D,CBARP,.. 759767 22
## [9] ARHGAP45 759755 . 759768 16
## [10] ARHGAP45 759755 . 759783 8
## CS counts int distance region_1 Nregion_1
## <numeric> <integer> <character> <numeric> <logical> <numeric>
## [1] 11.40 1 B_B 24040 TRUE 1
## [2] 16.86 1 B_B 50807 TRUE 1
## [3] 6.02 1 B_B 227582 TRUE 1
## [4] 5.05 1 B_B 635883 TRUE 1
## [5] 7.52 1 B_B 41436 TRUE 1
## [6] 5.08 1 B_OE 169480 TRUE 1
## [7] 6.37 1 B_OE 164057 TRUE 1
## [8] 7.58 1 B_B 176775 TRUE 1
## [9] 9.73 1 B_OE 193826 TRUE 1
## [10] 6.03 1 B_OE 340861 TRUE 1
## regionID_1 regionCov_1 region_2 Nregion_2 regionID_2 regionCov_2
## <character> <numeric> <logical> <numeric> <character> <numeric>
## [1] 1 3383 FALSE 0 <NA> 0
## [2] 2 5187 TRUE 1 2 10211
## [3] 2 5187 FALSE 0 <NA> 0
## [4] 2 5187 FALSE 0 <NA> 0
## [5] 2 13555 TRUE 1 2 10211
## [6] 2 10211 FALSE 0 <NA> 0
## [7] 2 10211 FALSE 0 <NA> 0
## [8] 2 10211 FALSE 0 <NA> 0
## [9] 2 10211 FALSE 0 <NA> 0
## [10] 2 10211 FALSE 0 <NA> 0
## -------
## regions: 2073 ranges and 4 metadata columns
## seqinfo: 24 sequences from GRCh38 genome
##
## [Slots in HiCaptuRe object]:
## - @parameters(4) : digest, load, annotate, ByRegions_1
## - @ByBaits(0) : NULL
## - @ByRegions(1) : [[1]] 2 regions
After filtering, the resulting HiCaptuRe object includes
8 new metadata columns, 4 for each anchor:
region_1/2 Logical: Does this anchor overlap any
region?Nregion_1/2 Integer: Number of overlapping regionsregionID_1/2 Character: IDs of the overlapping
regionsregionCov_1/2 Numeric: Total base pair overlap between
anchor and region(s)getByRegions()To view the region-centric summary added by this function:
## [[1]]
## GRanges object with 2 ranges and 6 metadata columns:
## seqnames ranges strand | regionID N_int Nfragment
## <Rle> <IRanges> <Rle> | <integer> <integer> <integer>
## [1] 19 500000-510000 * | 1 1 1
## [2] 19 1000000-1100000 * | 2 9 3
## NfragmentOE fragmentID fragmentAnnot
## <integer> <character> <character>
## [1] 0 759715 TPGS1,MADCAM1-AS1
## [2] 0 759752,759753,759755 TMEM259,RNU6-2,CNN2,..
## -------
## seqinfo: 1 sequence from an unspecified genome; no seqlengths
The ByRegions slot provides a region-centric summary,
including:
Region ID
Number of interactions involving fragments that overlap the region
Number of fragments in data overlapping the region
Number of other end fragments in data overlapping the region
IDs and annotation of the overlapping fragments
As with ByBaits, multiple calls to
interactionsByRegions() are logged as separate elements,
preserving the analysis history. And the parameters slot is
also updated.
intersect_interactions()The intersect_interactions() function allows you to
compare and classify interactions across multiple HiCaptuRe
datasets, identifying shared and unique interactions. This is analogous
to a classic Venn diagram or UpSet plot operation for genomic
interactions.
This function is useful when comparing biological replicates, different cell types, or experimental conditions to identify reproducible or condition-specific contacts.
To run this function, you must provide a named list of at least two
HiCaptuRe objects. Each dataset should ideally be annotated
using the same genome and bait reference for consistency.
ibed2 <- load_interactions(file = ibed2_file, genome = "BSgenome.Hsapiens.NCBI.GRCh38")
ibed2_annotated <- annotate_interactions(interactions = ibed2, annotation = annotation_file)
interactions_list <- list(A = ibed1_annotated, B = ibed2_annotated)
output <- intersect_interactions(interactions_list = interactions_list)The function returns a list with three elements:
A named list of HiCaptuRe objects representing each
intersection class:
Unique interactions in each dataset
Shared interactions across datasets
## $A
## HiCaptuRe object with 2 interactions and 9 metadata columns:
## seqnames1 ranges1 seqnames2 ranges2 | bait_1
## <Rle> <IRanges> <Rle> <IRanges> | <character>
## [1] 19 290159-302184 --- 19 343893-369651 | PLPP2
## [2] 19 290159-302184 --- 19 370987-379828 | PLPP2
## ID_1 bait_2 ID_2 reads CS_A counts int
## <integer> <character> <integer> <integer> <numeric> <integer> <character>
## [1] 759694 MIER2 759702 21 6.07 1 B_B
## [2] 759694 THEG 759704 15 7.00 1 B_B
## distance
## <numeric>
## [1] 60600
## [2] 79236
## -------
## regions: 2073 ranges and 4 metadata columns
## seqinfo: 24 sequences from GRCh38 genome
##
## [Slots in HiCaptuRe object]:
## - @parameters(3) : digest, load, annotate
## - @ByBaits(0) : NULL
## - @ByRegions(0) : NULL
##
## $B
## HiCaptuRe object with 2 interactions and 9 metadata columns:
## seqnames1 ranges1 seqnames2 ranges2 | bait_1
## <Rle> <IRanges> <Rle> <IRanges> | <character>
## [1] 19 370987-379828 --- 19 450586-456228 | THEG
## [2] 19 1065924-1076134 --- 19 1086678-1112128 | ARHGAP45
## ID_1 bait_2 ID_2 reads CS_B counts
## <integer> <character> <integer> <integer> <numeric> <integer>
## [1] 759704 . 759711 27 5.50 1
## [2] 759755 SBNO2,POLR2E,GPX4 759758 69 7.79 1
## int distance
## <character> <numeric>
## [1] B_OE 77999
## [2] B_B 28374
## -------
## regions: 2016 ranges and 4 metadata columns
## seqinfo: 24 sequences from GRCh38 genome
##
## [Slots in HiCaptuRe object]:
## - @parameters(3) : digest, load, annotate
## - @ByBaits(0) : NULL
## - @ByRegions(0) : NULL
##
## $`A:B`
## HiCaptuRe object with 2 interactions and 10 metadata columns:
## seqnames1 ranges1 seqnames2 ranges2 | bait_1
## <Rle> <IRanges> <Rle> <IRanges> | <character>
## [1] 19 290159-302184 --- 19 402130-410516 | PLPP2
## [2] 19 506618-515156 --- 19 530387-539467 | TPGS1,MADCAM1-AS1
## ID_1 bait_2 ID_2 reads CS_A CS_B counts
## <integer> <character> <integer> <integer> <numeric> <numeric> <integer>
## [1] 759694 C2CD4C 759707 15 5.6 7.39 1
## [2] 759715 CDC34 759719 32 11.4 11.47 1
## int distance
## <character> <numeric>
## [1] B_B 110151
## [2] B_B 24040
## -------
## regions: 2016 ranges and 4 metadata columns
## seqinfo: 24 sequences from GRCh38 genome
##
## [Slots in HiCaptuRe object]:
## - @parameters(3) : digest, load, annotate
## - @ByBaits(0) : NULL
## - @ByRegions(0) : NULL
For shared interactions (present in more than one dataset), the result is returned in a peakmatrix-like format, with separate columns containing CHiCAGO scores for each sample.
An UpSet plot showing the distribution of intersection sets across samples:
This plot is ideal for comparing many datasets simultaneously, and clearly visualizes the number of interactions in each intersection class.
A Venn diagram visualization of the intersections:
Note: The Venn diagram is only generated when the number of datasets is less than 8 to maintain visual clarity.
distance_summaryThe distance_summary() function provides a quantitative
overview of interaction distances, stratified into defined distance
intervals. This is particularly useful when comparing distance
profiles between different samples or conditions, such as to
identify global shifts toward short- or long-range interactions.
dist_sum <- distance_summary(
interactions = ibed1_annotated,
breaks = seq(0, 10^6, 10^5),
sample = "ibed1"
)
dist_sum## # A tibble: 33 × 6
## int total_per_int sample HiCaptuRe breaks value
## <chr> <int> <chr> <int> <fct> <int>
## 1 Total NA ibed1 4352 (0,1e+05] 1064
## 2 B_B 1708 ibed1 4352 (0,1e+05] 330
## 3 B_OE 2644 ibed1 4352 (0,1e+05] 734
## 4 Total NA ibed1 4352 (1e+05,2e+05] 1114
## 5 B_B 1708 ibed1 4352 (1e+05,2e+05] 372
## 6 B_OE 2644 ibed1 4352 (1e+05,2e+05] 742
## 7 Total NA ibed1 4352 (2e+05,3e+05] 749
## 8 B_B 1708 ibed1 4352 (2e+05,3e+05] 270
## 9 B_OE 2644 ibed1 4352 (2e+05,3e+05] 479
## 10 Total NA ibed1 4352 (3e+05,4e+05] 391
## # ℹ 23 more rows
In this example, interaction distances are grouped into bins from 0 to 1 Mb in 100 kb increments.
The function returns a tibble where each row represents a specific combination of:
int: Type of interaction — either “B_B” (bait–bait),
“B_OE” (bait–other end), or “Total” (combined).
total_per_int: Total number of interactions of each
type across all distance bins.
sample: Sample name, as specified in the sample
argument.
HiCaptuRe: Total number of interactions in the input
HiCaptuRe object.
breaks: Distance bin label (e.g., [0,1e5],
(1e5,2e5], etc.).
value: Number of interactions of the given type
(int) within that distance bin.
plot_distance_summary()The plot_distance_summary() function generates bar plots
from the output of distance_summary(), allowing you to
explore how interactions are distributed across genomic distances.
You can visualize interaction counts in three different ways, depending on the normalization strategy:
Plots the raw number of interactions per distance bin, without normalization.
plots <- plot_distance_summary(distances = dist_sum, type_of_value = "absolute")
plots$int_dist
plots$total_distNormalizes values within each interaction type. This shows the proportion of B_B or B_OE interactions that fall into each distance bin.
plots <- plot_distance_summary(distances = dist_sum, type_of_value = "by_int_type")
plots$int_dist_norm_intNormalizes values by the total number of interactions in the full dataset. This helps compare global interaction profiles across samples.
plots <- plot_distance_summary(distances = dist_sum, type_of_value = "by_total")
plots$int_dist_norm_totalpeakmatrix2list()The peakmatrix2list() function is an auxiliary
utility designed specifically for working with interaction data
stored in peakmatrix format. This format is often used
in multi-sample Capture Hi-C experiments, such as liCHi-C, where
interactions from all samples are consolidated into a single table with
per-sample CHiCAGO scores.
This function splits a peakmatrix-formatted HiCaptuRe
object into individual interaction sets, one per
sample, based on a user-defined CHiCAGO score threshold. The result is a
named list of HiCaptuRe
objects, each containing only the interactions that
pass the cutoff in that specific sample.
Use peakmatrix2list() only when:
Your interaction data was loaded using a peakmatrix file
You need to work with per-sample interaction sets
You want to perform downstream filtering or exporting for each sample independently
## Warning in process_function(data): reads column set to 0 because peakmatrix
## format does not contain this info
## [1] "cellA" "cellB"
Each element in the output list corresponds to one sample, and
contains a filtered HiCaptuRe object with only those
interactions that passed the CHiCAGO score cutoff in that sample.
export_interactions()The export_interactions() function allows you to save a
processed HiCaptuRe object to disk in a variety of formats
for downstream analysis, visualization, or sharing.
This function is typically used at the end of a workflow, after annotation, filtering, or formatting steps have been applied.
Supported Output Formats
The exported file can be written in the following formats:
ibed (default): Standard interaction format used
throughout HiCaptuRe
peakmatrix: Multi-sample interaction matrix (only
valid for peakmatrix input)
washU: Format for WashU Epigenome Browser (newer
version)
washUold: Legacy WashU format
cytoscape: Edge list suitable for Cytoscape network
visualization
bedpe: Standard BEDPE format compatible with many
genomic tools
export_interactions(
interactions = ibed1_annotated,
file = "/path/to/folder/ibed_annotated.ibed",
type = "ibed"
)Notes and Behavior
If the HiCaptuRe object originates from a
peakmatrix, it can be exported as:
A single peakmatrix file using
format = "peakmatrix"
Multiple files (one per sample) if exporting in any non-peakmatrix format
The function will automatically name the output files based on sample names and append the appropriate extension.
You can choose whether to overwrite existing files using the
over.write = TRUE argument.
Optional metadata export: Set
parameters = TRUE to write a .parameters file
alongside your exported interaction file. This records all processing
steps (e.g., digestion, loading, annotation, filtering), supporting
reproducibility.
## R version 4.5.2 (2025-10-31)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 24.04.3 LTS
##
## Matrix products: default
## BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
## LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so; LAPACK version 3.12.0
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_US.UTF-8 LC_COLLATE=C
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## time zone: Etc/UTC
## tzcode source: system (glibc)
##
## attached base packages:
## [1] stats4 stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] BSgenome.Hsapiens.NCBI.GRCh38_1.3.1000
## [2] BSgenome_1.79.1
## [3] rtracklayer_1.71.3
## [4] BiocIO_1.21.0
## [5] Biostrings_2.79.4
## [6] XVector_0.51.0
## [7] GenomicRanges_1.63.1
## [8] Seqinfo_1.1.0
## [9] IRanges_2.45.0
## [10] S4Vectors_0.49.0
## [11] BiocGenerics_0.57.0
## [12] generics_0.1.4
## [13] HiCaptuRe_1.1.0
## [14] kableExtra_1.4.0
## [15] knitr_1.51
## [16] BiocStyle_2.39.0
##
## loaded via a namespace (and not attached):
## [1] RColorBrewer_1.1-3 sys_3.4.3
## [3] rstudioapi_0.17.1 jsonlite_2.0.0
## [5] magrittr_2.0.4 GenomicFeatures_1.63.1
## [7] farver_2.1.2 rmarkdown_2.30
## [9] vctrs_0.6.5 memoise_2.0.1
## [11] Rsamtools_2.27.0 RCurl_1.98-1.17
## [13] base64enc_0.1-3 rstatix_0.7.3
## [15] htmltools_0.5.9 S4Arrays_1.11.1
## [17] progress_1.2.3 curl_7.0.0
## [19] broom_1.0.11 SparseArray_1.11.10
## [21] Formula_1.2-5 sass_0.4.10
## [23] KernSmooth_2.23-26 bslib_0.9.0
## [25] htmlwidgets_1.6.4 plyr_1.8.9
## [27] Gviz_1.53.1 httr2_1.2.2
## [29] cachem_1.1.0 buildtools_1.0.0
## [31] GenomicAlignments_1.47.0 igraph_2.2.1
## [33] lifecycle_1.0.5 pkgconfig_2.0.3
## [35] Matrix_1.7-4 R6_2.6.1
## [37] fastmap_1.2.0 MatrixGenerics_1.23.0
## [39] digest_0.6.39 colorspace_2.1-2
## [41] AnnotationDbi_1.73.0 textshaping_1.0.4
## [43] Hmisc_5.2-5 RSQLite_2.4.5
## [45] ggpubr_0.6.2 labeling_0.4.3
## [47] filelock_1.0.3 httr_1.4.7
## [49] abind_1.4-8 compiler_4.5.2
## [51] withr_3.0.2 bit64_4.6.0-1
## [53] htmlTable_2.4.3 S7_0.2.1
## [55] backports_1.5.0 BiocParallel_1.45.0
## [57] carData_3.0-5 DBI_1.2.3
## [59] UpSetR_1.4.0 gplots_3.3.0
## [61] ggsignif_0.6.4 biomaRt_2.67.0
## [63] rappdirs_0.3.3 DelayedArray_0.37.0
## [65] rjson_0.2.23 caTools_1.18.3
## [67] gtools_3.9.5 tools_4.5.2
## [69] foreign_0.8-90 otel_0.2.0
## [71] nnet_7.3-20 glue_1.8.0
## [73] InteractionSet_1.39.0 restfulr_0.0.16
## [75] grid_4.5.2 checkmate_2.3.3
## [77] cluster_2.1.8.1 gtable_0.3.6
## [79] tidyr_1.3.2 ensembldb_2.35.0
## [81] ggVennDiagram_1.5.4 data.table_1.18.0
## [83] hms_1.1.4 utf8_1.2.6
## [85] car_3.1-3 xml2_1.5.1
## [87] pillar_1.11.1 stringr_1.6.0
## [89] dplyr_1.1.4 BiocFileCache_3.1.0
## [91] lattice_0.22-7 deldir_2.0-4
## [93] bit_4.6.0 biovizBase_1.57.1
## [95] tidyselect_1.2.1 maketools_1.3.2
## [97] gridExtra_2.3 ProtGenerics_1.43.0
## [99] SummarizedExperiment_1.41.0 svglite_2.2.2
## [101] xfun_0.55 Biobase_2.71.0
## [103] matrixStats_1.5.0 stringi_1.8.7
## [105] UCSC.utils_1.7.1 lazyeval_0.2.2
## [107] yaml_2.3.12 evaluate_1.0.5
## [109] codetools_0.2-20 cigarillo_1.1.0
## [111] interp_1.1-6 tibble_3.3.0
## [113] BiocManager_1.30.27 cli_3.6.5
## [115] rpart_4.1.24 systemfonts_1.3.1
## [117] jquerylib_0.1.4 GenomicInteractions_1.43.1
## [119] Rcpp_1.1.0.8.2 dichromat_2.0-0.1
## [121] GenomeInfoDb_1.47.2 dbplyr_2.5.1
## [123] png_0.1-8 XML_3.99-0.20
## [125] parallel_4.5.2 ggplot2_4.0.1
## [127] blob_1.2.4 prettyunits_1.2.0
## [129] latticeExtra_0.6-31 jpeg_0.1-11
## [131] AnnotationFilter_1.35.0 bitops_1.0-9
## [133] viridisLite_0.4.2 VariantAnnotation_1.57.1
## [135] scales_1.4.0 purrr_1.2.1
## [137] crayon_1.5.3 rlang_1.1.7
## [139] KEGGREST_1.51.1
load_interactions()
annotate_interactions()intersect_interactions()distance_summary
peakmatrix2list()export_interactions()