% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/emptyDropsCellRanger.R
\name{emptyDropsCellRanger}
\alias{emptyDropsCellRanger}
\alias{emptyDropsCellRanger,ANY-method}
\alias{emptyDropsCellRanger,SummarizedExperiment-method}
\title{CellRanger's emptyDrops variant}
\usage{
emptyDropsCellRanger(m, ...)

\S4method{emptyDropsCellRanger}{ANY}(
  m,
  n.expected.cells = 3000,
  max.percentile = 0.99,
  max.min.ratio = 10,
  umi.min = 500,
  umi.min.frac.median = 0.01,
  cand.max.n = 20000,
  ind.min = 45000,
  ind.max = 90000,
  round = TRUE,
  niters = 10000,
  BPPARAM = SerialParam()
)

\S4method{emptyDropsCellRanger}{SummarizedExperiment}(m, ..., assay.type = "counts")
}
\arguments{
\item{m}{A numeric matrix-like object containing counts, where columns represent barcoded droplets and rows represent features.
The matrix should only contain barcodes for an individual sample, prior to any filtering for barcodes.
Alternatively, a \linkS4class{SummarizedExperiment} containing such an object.}

\item{...}{Further arguments to pass to individual methods.
Specifically, for the SummarizedExperiment method, further arguments to pass to the ANY method.}

\item{n.expected.cells}{An integer scalar specifying the number of expected cells in a sample. 
Corresponds to the \code{nExpectedCells} argument in \pkg{STARsolo}.}

\item{max.percentile}{A numeric scalar between 0 and 1 used to define the maximum UMI count in the simple filtering algorithm. 
Corresponds to the \code{maxPercentile} argument in \pkg{STARsolo}.}

\item{max.min.ratio}{An integer scalar specifying the ratio of the maximum and minimum UMI count in the simple filtering algorithm. 
Corresponds to the \code{maxMinRatio} argument in \pkg{STARsolo}.}

\item{umi.min}{An integer scalar specifying the minimum UMI count for inclusion of a barcode in the cell candidate pool. 
Corresponds to the \code{umiMin} argument in \pkg{STARsolo}.}

\item{umi.min.frac.median}{A numeric scalar between 0 and 1 used to define the minimum UMI count for inclusion of a barcode in the cell candidate pool.
Specifically, the minimum is defined as \code{umi.min.frac.median} times the median UMI count of the real cells assigned by the simple filtering algorithm. 
Corresponds to the \code{umiMinFracMedian} argument in \pkg{STARsolo}.}

\item{cand.max.n}{An integer scalar specifying the maximum number of barcodes that can be included in the cell candidate pool. 
In effect, this applies a minimum threshold that is defined as the \code{cand.max.n}-th largest UMI count among all cells that are \emph{not} selected by the simple filtering algorithm. 
Corresponds to the \code{candMaxN} in \pkg{STARsolo}.}

\item{ind.min}{An integer scalar specifying the lowest UMI count ranking for inclusion of a barcode in the ambient profile. 
Corresponds to the \code{indMin} argument in \pkg{STARsolo}.}

\item{ind.max}{An integer scalar specifying the highest UMI count ranking for inclusion of a barcode in the ambient profile. 
Corresponds to the \code{indMin} argument in \pkg{STARsolo}.}

\item{round}{A logical scalar indicating whether to check for non-integer values in \code{m} and, if present, round them for ambient profile estimation (see \code{?\link{ambientProfileEmpty}}) and the multinomial simulations.}

\item{niters}{An integer scalar specifying the number of iterations to use for the Monte Carlo p-value calculations.}

\item{BPPARAM}{A \linkS4class{BiocParallelParam} object indicating whether parallelization should be used.}

\item{assay.type}{String or integer specifying the assay of interest.}
}
\value{
A \linkS4class{DataFrame} with the same fields as that returned by \code{\link{emptyDrops}}.
}
\description{
An approximate implementation of the \code{--soloCellFilter EmptyDrops_CR} filtering approach, 
which itself was reverse-engineered from the behavior of CellRanger 3.
}
\details{
\code{emptyDropsCellRanger} splits each sample's barcodes into three subsets.
\enumerate{
\item The first subset contains barcodes that are selected by the \dQuote{simple filtering algorithm}, which are regarded as high quality cells without any further filtering.
The minimum threshold \eqn{T} for this subset is defined by taking the \code{max.percentile} percentile of the top \code{n.expected.cells} barcodes,
and then dividing by the \code{max.min.ratio} to obtain a minimum UMI count.
(This is closely related to the algorithm used by \code{\link{defaultDrops}}.)
All barcodes identified in this manner will have an FDR of zero.
\item The second subset contains the ambient pool and is defined as all barcodes with rankings between \code{ind.min} and \code{ind.max}. 
The barcodes that fall in this category will be used to compute the ambient profile.
None of these barcodes are considered to be potential cells.
\item The third subset contains the pool of barcodes that are potential cells, i.e., cell candidates.
This is defined as all barcodes with total counts below \eqn{T} and higher than all of the thresholds defined by \code{umi.min}, \code{umi.min.frac.median} and \code{cand.max.n}.
Only the barcodes within this subset will be tested for signficant deviations from the ambient profile, i.e., FDR is not \code{NaN}.
}

As of time of writing, the arguments in \pkg{STARsolo} have a one-to-one correspondence with the arguments in \code{emptyDropsCellRanger}. 
All parameter defaults are set as the same as those used in STARsolo 2.7.9a.

The main differences between \code{emptyDropsCellRanger} and \code{emptyDrops} are:
\itemize{
\item \code{emptyDropsCellRanger} does not use the knee point to identify \dQuote{presumed real} cells,
instead relying on a threshold based on the expected number of cells.
\item \code{emptyDropsCellRanger} takes barcodes whose total count ranks within a certain range - by default, \eqn{(45,000, 90,000]} - to compute the ambient profile.
In contrast, \code{emptyDrops} only defines the upper bound using \code{lower} or \code{by.rank}.
\item \code{emptyDropsCellRanger} defines a cell candidate pool according to three parameters, \code{umi.min}, \code{umi.min.frac.median} and \code{cand.max.n}.
In \code{emptyDrops}, this is only defined by \code{lower}.
}
}
\examples{
# Mocking up some data:
set.seed(0)
my.counts <- DropletUtils:::simCounts(nempty=100000, nlarge=2000, nsmall=1000)

# Identify likely cell-containing droplets.
out <- emptyDropsCellRanger(my.counts)
out

is.cell <- out$FDR <= 0.01
sum(is.cell, na.rm=TRUE)

# Subsetting the matrix to the cell-containing droplets.
# (using 'which()' to handle NAs smoothly).
cell.counts <- my.counts[,which(is.cell),drop=FALSE]
dim(cell.counts)

}
\references{
Kaminow B, Yunusov D, Dobin A (2021).
STARsolo: accurate, fast and versatile mapping/quantification of single-cell and single-nucleus RNA-seq data.
\url{https://www.biorxiv.org/content/10.1101/2021.05.05.442755v1}
}
\seealso{
\code{\link{emptyDrops}}, for the original implementation.
}
\author{
Dongze He, Rob Patro
}
