% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/workflow.R
\name{estimateBsWidth}
\alias{estimateBsWidth}
\title{Function to estimate the appropriate binding site width together with the
optimal gene-wise filter level.}
\usage{
estimateBsWidth(
  object,
  bsResolution = c("medium", "fine", "coarse"),
  geneResolution = c("medium", "coarse", "fine", "finest"),
  est.maxBsWidth = 13,
  est.minimumStepGain = 0.02,
  est.maxSites = Inf,
  est.subsetChromosome = "chr1",
  est.minWidth = 2,
  est.offset = 1,
  sensitive = FALSE,
  sensitive.size = 5,
  sensitive.minWidth = 2,
  anno.annoDB = NULL,
  anno.genes = NULL,
  bsResolution.steps = NULL,
  geneResolution.steps = NULL,
  quiet = TRUE,
  veryQuiet = FALSE,
  reportScoresPerBindingSite = FALSE,
  ...
)
}
\arguments{
\item{object}{a \code{\link{BSFDataSet}} object with stored crosslink sites.
This means that ranges should have a width = 1.}

\item{bsResolution}{character; level of resolution at which different binding
site width should be tested}

\item{geneResolution}{character; level of resolution at which gene-wise filtering
steps should be tested}

\item{est.maxBsWidth}{numeric; the largest binding site width which should
considered in the testing}

\item{est.minimumStepGain}{numeric; the minimum additional gain in the score
in percent the next binding site width has to have, to be selected as best option}

\item{est.maxSites}{numeric; maximum number of PureCLIP sites that are used;}

\item{est.subsetChromosome}{character; define on which chromosome the
estimation should be done in function \code{\link{estimateBsWidth}}}

\item{est.minWidth}{the minimum size of regions that are subjected to the
iterative merging routine, after the initial region concatenation.}

\item{est.offset}{constant added to the flanking count in the signal-to-flank
ratio calculation to avoid division by Zero}

\item{sensitive}{logical; whether to enable sensitive pre-filtering before
binding site merging or not}

\item{sensitive.size}{numeric; the size (in nucleotides) of the merged
sensitive region}

\item{sensitive.minWidth}{numeric; the minimum size (in nucleoties) of the
merged sensitive region}

\item{anno.annoDB}{an object of class \code{OrganismDbi} that contains
the gene annotation (!!! Experimental !!!).}

\item{anno.genes}{an object of class \code{\link{GenomicRanges}} that represents
the gene ranges directly}

\item{bsResolution.steps}{numeric vector; option to use a user defined threshold
for binding site width directly. Overwrites \code{bsResolution}}

\item{geneResolution.steps}{numeric vector; option to use a user defined threshold
vector for gene-wise filtering resolution. Overwrites \code{geneResolution}}

\item{quiet}{logical; whether to print messages}

\item{veryQuiet}{logical; whether to suppress all messages}

\item{reportScoresPerBindingSite}{report the ratio score for each binding site
separately. Warning! This is for debugging and testing only. Downstream
functions can be impaired.}

\item{...}{additional arguments passed to \code{\link{pureClipGeneWiseFilter}}}
}
\value{
an object of class \code{\link{BSFDataSet}} with binding sites with
the `params` slots `bsSize` and `geneFilter` being filled
}
\description{
This function tests different width of binding sites for different gene-wise
filtering steps. For each test the signal-to-score ratio is calculated. The
mean over all gene-wise filterings at each binding site width is used to
extract the optimal width, which serves as anchor to select the optimal
gene-wise filter.
}
\details{
Parameter estimation is done on a subset of all crosslink sites
(\code{est.subsetChromosome}).

Gene-level filter can be tested with varying
levels of accuracy ranging from `finest` to `coarse`, representing 1% and
20% steps, respectively.

Binding site computation at each step can be done on three different accuracy
level (\code{bsResolution}). Option `fine` is equal to a normal run
of the \code{\link{makeBindingSites}} function. `medium` will perform
a shorter version of the binding site computation, skipping some of the
refinement steps. Option `coarse` will approximate binding sites by merged
crosslinks regions, aligning the center at the site with the highest score.

For each binding site in each set given the defined resolutions a signal-to-
flank score ratio is calculated and the mean of this score per set is returned.
Next a mean of means is created which results in a single score for each
binding site width that was tested. The width that yielded the highest score
is selected as optimal. In addtion the \code{minimumStepGain} option
allows control over the minimum additional gain in the score that a tested
width has to have to be selected as the best option.

To enhance the sensitivity of the binding site estimation, the sensitivity
mode exists. In this mode crosslink sites undergo a pre-filtering and merging
step, to exclude potential artifical peaks (experimental-, mapping-biases).
If sensitivity mode is activated the \code{est.minWidth} option should be set
to 1.

The optimal geneFilter is selected as the first one that passes the merged
mean of the selected optimal binding site width.

The function is part of the standard workflow performed by \code{\link{BSFind}}.
}
\examples{
# load clip data
files <- system.file("extdata", package="BindingSiteFinder")
load(list.files(files, pattern = ".rda$", full.names = TRUE))
load(list.files(files, pattern = ".rds$", full.names = TRUE)[1])
load(list.files(files, pattern = ".rds$", full.names = TRUE)[2])
estimateBsWidth(bds, anno.genes = gns, est.maxBsWidth = 19,
 geneResolution = "coarse", bsResolution = "coarse", est.subsetChromosome = "chr22")

}
\seealso{
\code{\link{BSFind}},
\code{\link{estimateBsWidthPlot}}
}
