% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/combineRegions.R
\name{combineRegions}
\alias{combineRegions}
\title{Combine overlapping genomic regions from different samples to create
a single set of consensus genomic regions}
\usage{
combineRegions(
  data,
  foundInSamples = 2,
  combinedCenter = "nearest",
  removeFlankOverlaps = TRUE,
  annotateWithInputNames = FALSE,
  combinedSampleName = NULL,
  outputFormat = "GenomicRanges",
  showMessages = TRUE
)
}
\arguments{
\item{data}{PeakCombiner data frame structure with required columns
named \code{chrom}, \code{start}, \code{end}, \code{name},
\code{score}, \code{strand}, \code{center}, \code{sample_name}. Additional
columns will be dropped}

\item{foundInSamples}{Only include genomic regions that are found
in at least \code{foundInSamples} \strong{number}
of samples. If \code{foundInSamples} is a fraction
between 0 and 1, then only include genomic
regions that ar found in at least
\code{foundInSamples} \strong{fraction} of samples.
Default value is 2.}

\item{combinedCenter}{Defines how the column 'center' will be
populated for each genomic region in the output
data. Allowed options are
* \code{middle}        - the mathematical center of the new region
* \code{strongest}     - the 'center' of the input region that has the
the highest 'score' of all overlapping input
regions
* \code{nearest}       - the 'center' of the input region that is closest
to mean of the 'center's of all overlapping
input regions (default)}

\item{removeFlankOverlaps}{TRUE (default) / FALSE. If TRUE, the combined
regions are checked for an overlap with an input
summit. Regions without such an overlap are
considered as false positive regions caused by an
artificial overlap of neighboring regions due to
the expansion step. If FLASE, this step will be
skipped.}

\item{annotateWithInputNames}{TRUE / FALSE (default). If TRUE, a new
column named 'input_names' is created
in the output data that is populated for
each combined genomic region with the
'name's of all contributing input regions.
If the column 'input_names' already
exists, it will be overwritten.}

\item{combinedSampleName}{Optionally defines how the column 'sample_name'
is populated for the output data.
If not used, then the default is to simply
concatenate all input
sample_names into a single comma-separated
string}

\item{outputFormat}{Character value to define format of output object.
Accepted values are "GenomicRanges" (default), "tibble"
or "data.frame".}

\item{showMessages}{Logical value of TRUE (default) or FALSE. Defines if
info messages are displayed or not.}
}
\value{
A tibble with the columns \code{chrom}, \code{start}, \code{end}, \code{name}, \code{score},
\code{strand}, \code{center}, \code{sample_name}, and optionally \code{input_names}.
The definitions of these columns are
described in full in the Details below. Use as input for functions
\link{centerExpandRegions} and \link{filterRegions}.
}
\description{
\link{combineRegions} is the main function of this package and
combines overlapping genomic regions from different samples to create
a single set of consensus genomic regions.

The accepted input is the PeakCombiner data frame is created from the
function \link{prepareInputRegions} and has optionally
already been centered and expanded and / or filtered using
\link{centerExpandRegions} and \link{filterRegions},
respectively.
Please see \link{prepareInputRegions} for more details.
}
\details{
\link{combineRegions} creates a set of consensus genomic regions by
combining overlapping genomic regions from different samples.
The general steps within this function are:
\itemize{
\item Identify overlapping genomic regions from the input samples
\item Retain overlapping genomic regions that are found in at least
\code{foundInSamples} samples. In this way, you can remove rare or
sample-specific regions
\item Note that overlapping genomic regions must contain at least one 'center'
from its input sample regions to be considered a valid genomic region.
\item As you can use the output data from this step again (e.g., to
center and expand the new set of consensus regions), we must define
the 'center', 'score', 'sample_name', and 'name' values for the new
genomic regions. We do this as follows:
\itemize{
\item 'center' is defined by the \code{combinedCenter} parameter, which has three
options.
* \code{middle}        - the mathematical center of the new region
* \code{strongest}     - the 'center' of the input region that has the
the highest 'score' of all overlapping input
regions
* \code{nearest}       - the 'center' of the input region that is closest
to mean of the 'center's of all overlapping
input regions (default)
\item 'score' is the score of the genomic region from the sample whose
'center's was used, or the mean of the 'score's if \code{middle} was selected
for the \code{combinedCenter} parameter
\item 'sample_name' can be user defined (\code{combinedSampleName}) or is a
concatenated string of all input 'sample_names' (default).
\item 'name' is created by combining 'sample_name' and row number to create a
unique identifier for each newly created genomic region.
}
}

Note, the output data.frame columns \code{sample_name}, \code{name} and \code{score}
will be updated.
}
\examples{
# Load in and prepare a an accepted tibble
utils::data(syn_data_bed)

data_prepared <- prepareInputRegions(
  data = syn_data_bed,
  outputFormat = "tibble",
  showMessages = FALSE
)

# Lets combine the input data by defining all potential option
combineRegions(
  data = data_prepared,
  foundInSamples = 2,
  combinedCenter = "nearest",
  annotateWithInputNames = TRUE,
  combinedSampleName = "consensus",
  outputFormat = "tibble",
  showMessages = TRUE
)

}
