% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/amp_pool.R
\name{amp_pool}
\alias{amp_pool}
\title{create amplicon pool}
\usage{
amp_pool(data, amp)
}
\arguments{
\item{data}{A dataframe containing the location of each mutation.}

\item{amp}{The length of amplicons in number of base pairs}
}
\value{
A dataframe containing the genomic coordinates of all potential amplicons
}
\description{
create a dataframe containing the coordinates of all potential
amplicons for hotspot testing
}
\details{
This algorithm searches the mutational dataset (input) for mutational
hotspot regions on each chromosome:

1.	Starting at the mutation with the lowest chromosomal position
(primary mutation), using a modified rank and recovery system,
the algorithm searches for the closest neighboring mutation.

2.	If the neighboring mutation is less than one amplicon, in distance,
away from the primary mutation, the neighboring mutation is included
within the hotspot region.
   a.	This rank and recovery system is repeated, integrating mutations
   into the hotspot region until the neighboring mutation is greater
   than or equal to the length of one amplicon in distance,
   from the primary mutation.
   b.	Once neighboring mutations equal or exceed one amplicon in distance
   from the primary mutation, incorporation into the hotspot region,
   halts incorporation.

3.	For hotspots within the one amplicon range, from the lowest to highest
mutation location, this area is covered by a single amplicon and added to
an amplicon pool, with a unique ID.
   a.	The center of these single amplicons is then defined by the weighted
   distribution of mutations.

4.	For all hotspots larger than one amplicon, the algorithm examines
5 potential amplicons at each covered mutation in the hotspot:
   a.	one amplicon directly upstream of the primary mutation
   b.	one amplicon directly downstream of the primary mutation
   c.	one amplicon including the mutation at the end of the read and
   base pairs (amplicon length - 1) upstream
   d.	one amplicon including the mutation at the beginning of the read and
   base pairs (amplicon length - 1) downstream
   e.	one amplicon with the mutation directly in the center.

5.	All amplicons generated for each hotspot region of interest, are assigned a
unique ID and added to the amplicon pool.

The mutation dataset should include two columns containing the chromosome and
genomic position, the columns should be names "chr" and "pos" respectively.
Optionally the gene names for each mutation may be included under a
column names "gene".
}
\examples{

data("mutation_data")
amp_pool(mutation_data, 100)


}
