% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/COCOA.R
\name{aggregateSignal}
\alias{aggregateSignal}
\title{Score a region set using feature contribution scores}
\usage{
aggregateSignal(
  signal,
  signalCoord,
  regionSet,
  signalCol = c("PC1", "PC2"),
  signalCoordType = "default",
  scoringMetric = "default",
  verbose = FALSE,
  absVal = TRUE,
  rsOL = NULL,
  pOlap = NULL,
  returnCovInfo = TRUE,
  .checkInput = TRUE
)
}
\arguments{
\item{signal}{Matrix of feature contribution scores (the contribution of 
each epigenetic feature to each target variable). One named column for each 
target variable.
One row for each original epigenetic feature (should be same order 
as original data/signalCoord). For (an unsupervised) example, if PCA was
done on epigenetic data and the
goal was to find region sets associated with the principal components, you 
could use the x$rotation output of prcomp(epigenetic data) as the
feature contribution scores/`signal` parameter.}

\item{signalCoord}{A GRanges object or data frame with coordinates 
for the genomic signal/original epigenetic data. 
Coordinates should be in the 
same order as the original data and the feature contribution scores 
(each item/row in signalCoord
corresponds to a row in signal). If a data.frame, 
must have chr and start columns (optionally can have end column, 
depending on the epigenetic data type).}

\item{regionSet}{A genomic ranges (GRanges) object with regions corresponding
to the same biological annotation.
Must be from the same reference genome as the coordinates for the actual data/samples (signalCoord).}

\item{signalCol}{A character vector with the names of the sample variables
of interest/target variables (e.g. PCs or sample phenotypes).}

\item{signalCoordType}{Character. Can be "default", "singleBase", or 
"multiBase". This describes whether the coordinates for `signal` 
(`signalCoord`) are each a single base (e.g. as for DNA methylation)
or a region/multiple bases (e.g. as for chromatin accessibility). 
Different scoring
options are available for each type of data. If "default" is given,
the type of coordinates will be detected automatically. For "default", if each
coordinate start value equals the coordinate end value 
(all(start(signalCoord) == end(signalCoord))), "singleBase"
will be used. Otherwise, "multiBase" will be used.}

\item{scoringMetric}{A character object with the scoring metric.
There are different methods available for 
signalCoordType="singleBase" vs  signalCoordType="multiBase".
For "singleBase", the available methods are "regionMean", 
"regionMedian", "simpleMean", and "simpleMedian". 
The default method is "regionMean".
For "multiBase", the methods are "proportionWeightedMean", 
"simpleMean", and "simpleMedian". The default is "proportionWeightedMean".
"regionMean" is a weighted
average of the signal, weighted by region (absolute value of signal 
if absVal=TRUE). First the signal is
averaged within each regionSet region, 
then all the regions are averaged. With
"regionMean" method, be cautious in interpretation for
region sets with low number of regions that overlap signalCoord. The
"regionMedian" method is the same as "regionMean" but the median is taken
at each step instead of the mean.
The "simpleMean"
method is just the unweighted average of all (absolute) signal values that
overlap the given region set. For multiBase data, this includes
signal regions that overlap a regionSet region at all (1 base
overlap or more) and the signal for each overlapping region is
given the same weight for the average regardless of how much it overlaps.
The "simpleMedian" method is the same as "simpleMean" but takes the median 
instead of the mean. 
"proportionWeightedMean" is a weighted average of all signalCoord 
regions that overlap with regionSet regions. For each signalCoord region
that overlaps with a regionSet region, we calculate what proportion
of the regionSet region is covered. Then this proportion is used to
weight the signal value when calculating the mean. 
The denominator of the mean
is the sum of all the proportion overlaps.}

\item{verbose}{A "logical" object. Whether progress 
of the function should be shown. One
bar indicates the region set is completed.}

\item{absVal}{Logical. If TRUE, take the absolute value of values in
signal. Choose TRUE if you think there may be some 
genomic loci in a region set that will increase and others
will decrease (if there may be anticorrelation between
regions in a region set). Choose FALSE if you expect regions in a 
given region set to all change in the same direction (all be positively
correlated with each other).}

\item{rsOL}{a "SortedByQueryHits" object 
(output of findOverlaps function). Should have the overlap
information between signalCoord and one item of GRList (one unique region set).
The region set must be the "subject" in findOverlaps 
and signalCoord must be the "query". E.g. findOverlaps(subject=regionSet,
query=signalCoord).
Providing this information can greatly improve permutation speed since the 
overlaps will not have to be calculated for each permutation. 
When using this parameter, signalCoord, 
genomicSignal, and the region set must be in the same order as they were
when olList was created. Otherwise, the wrong genomic loci will be referenced
(e.g. if epigenetic features were filtered out of genomicSignal after rsOL
was created.)}

\item{pOlap}{Numeric vector. Only used if rsOL is given and scoringMetric
is "proportionWeightedMean". This vector should contain the proportion of 
each regionSet region that is overlapped by a signalCoord region. The 
order of pOlap should be the same as the overlaps in rsOL.}

\item{returnCovInfo}{logical. If TRUE, the following coverage and 
region set info will
be calculated and included in function output: regionSetCoverage, 
signalCoverage, totalRegionNumber, and meanRegionSize. For the
proportionWeightedMean scoring method, 
sumProportionOverlap will also be calculated.}

\item{.checkInput}{A "logical" object. For programmatic use only.
Whether inputs to the function should be checked for 
correctness/appropriateness. This parameter may be used by some COCOA
functions to prevent unnecessary checks of objects 
after arguments have already been checked once.}
}
\value{
A data.frame with one row and the following 
columns: one column for each item of signalCol with names given
by signalCol. These columns have scores for the region set for each signalCol.
Other columns: signalCoverage (formerly cytosine_coverage) which
has number of epigenetic features that overlapped at all with regionSet,
regionSetCoverage which has number of regions from regionSet
that overlapped any of the epigenetic features, 
totalRegionNumber that has
number of regions in regionSet, meanRegionSize that has average
size in base pairs of regions in regionSet, the average is based on
all regions in regionSet and not just ones that overlap.
For "multiBase" data, if the "proportionWeightedMean" scoring metric 
is used, then the output will also have a "sumProportionOverlap" column.
During this scoring method, the proportion overlap between each signalCoord
region and overlapping regionSet region is calculated. This column is
the sum of all those proportion overlaps and is another way to quantify
coverage of regionSet in addition to regionSetCoverage.
}
\description{
First, this function identifies which epigenetic features 
overlap the region set. 
Then the region set is scored using the feature contribution scores 
(`signal` input) 
according to the `scoringMetric` parameter.
}
\examples{
data("brcaATACCoord1")
data("brcaATACData1")
data("esr1_chr1")
featureContributionScores <- prcomp(t(brcaATACData1))$rotation
rsScores <- aggregateSignal(signal=featureContributionScores, 
                                 signalCoord=brcaATACCoord1, 
                                 regionSet=esr1_chr1, 
                                 signalCol=c("PC1", "PC2"), 
                                 scoringMetric="default")
}
