% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/barcodeRanks.R
\name{barcodeRanks}
\alias{barcodeRanks}
\alias{barcodeRanks,ANY-method}
\alias{barcodeRanks,SummarizedExperiment-method}
\title{Calculate barcode ranks}
\usage{
barcodeRanks(m, ...)

\S4method{barcodeRanks}{ANY}(
  m,
  lower = 100,
  exclude.from = 50,
  window = 1,
  gradient.threshold = -1,
  fit.bounds = NULL,
  df = 20,
  ...,
  BPPARAM = SerialParam()
)

\S4method{barcodeRanks}{SummarizedExperiment}(m, ..., assay.type = "counts")
}
\arguments{
\item{m}{A numeric matrix-like object containing UMI counts, where columns represent barcoded droplets and rows represent genes.
Alternatively, a \linkS4class{SummarizedExperiment} containing such a matrix.}

\item{...}{For the generic, further arguments to pass to individual methods.

For the SummarizedExperiment method, further arguments to pass to the ANY method.}

\item{lower}{A numeric scalar specifying the lower bound on the total UMI count, 
at or below which all barcodes are assumed to correspond to empty droplets and excluded from knee/inflection point identification.}

\item{exclude.from}{An integer scalar specifying the number of highest ranking barcodes to exclude from knee/inflection point identification.}

\item{window}{Numeric scalar specifying the length of the window (in log10 units) for knee/inflection point identification.
Larger values improve stability of estimates at the cost of sensitivity to changes in the curve.}

\item{gradient.threshold}{Numeric scalar specifying the maximum threshold on the gradient for identifying potential elbow points.
Lower values increase the stringency of elbow point identification.}

\item{fit.bounds, df}{Deprecated and ignored.}

\item{BPPARAM}{A \linkS4class{BiocParallelParam} object specifying how parallelization should be performed.}

\item{assay.type}{Integer or string specifying the assay containing the count matrix.}
}
\value{
A \linkS4class{DataFrame} where each row corresponds to a column of \code{m}, and containing the following fields:
\describe{
\item{\code{rank}:}{Numeric, the rank of each barcode (averaged across ties).}
\item{\code{total}:}{Numeric, the total counts for each barcode.}
}

The metadata contains \code{knee}, a numeric scalar containing the total count at the knee point;
and \code{inflection}, a numeric scalar containing the total count at the inflection point.
}
\description{
Compute barcode rank statistics and identify the knee and inflection points on the total count curve.
}
\details{
Analyses of droplet-based scRNA-seq data often show a plot of the log-total count against the log-rank of each barcode where the highest ranks have the largest totals.
This is equivalent to a transposed empirical cumulative density plot with log-transformed axes, which focuses on the barcodes with the largest counts.
To create this plot, the \code{barcodeRanks} function will compute these ranks for all barcodes in \code{m}.
Barcodes with the same total count receive the same average rank to avoid problems with discrete runs of the same total.

The function will also identify the inflection and knee points on the curve for downstream use.
Both of these points correspond to a sharp transition between two components of the total count distribution, 
presumably reflecting the difference between empty droplets with little RNA and cell-containing droplets with much more RNA.
Only points with total counts above \code{lower} will be considered for knee/inflection point identification.
Similarly, the first \code{exclude.from} points will be ignored to avoid instability at the start of the curve.

The actual identification of the knee/inflection points is based on a simple curve-tracing algorithm.
We trace a window of fixed length \code{window} through the curve, and for each window, we consider the straight line connecting its ends: 
\itemize{
\item To find the knee, we filter for windows where the midpoint of the window lies above the end-connecting line.
Of these, we select the window with the shortest end-connecting line, i.e., the strongest curvature.
The midpoint of that window is defined as the knee.
\item To find the inflection, we pick the window with the lowest (i.e., most negative) gradient of the end-connecting line.
The midpoint of that window is defined as the inflection.
\item In cases with multiple knee/inflection points, we aim to report the earlier values, i.e., those with higher log-totals.
This is achieved by ignoring all windows after the first one that contains an \dQuote{elbow} point in the curve.
A window contains an elbow if its midpoint lies below the end-connecting line and the gradient is less than \code{gradient.threshold}.
}
}
\examples{
# Mocking up some data: 
set.seed(2000)
my.counts <- DropletUtils:::simCounts()

# Computing barcode rank statistics:
br.out <- barcodeRanks(my.counts)
names(br.out)

# Making a plot.
plot(br.out$rank, br.out$total, log="xy", xlab="Rank", ylab="Total")
o <- order(br.out$rank)
abline(h=metadata(br.out)$knee, col="dodgerblue", lty=2)
abline(h=metadata(br.out)$inflection, col="forestgreen", lty=2)
legend("bottomleft", lty=2, col=c("dodgerblue", "forestgreen"), 
    legend=c("knee", "inflection"))

}
\seealso{
\code{\link{emptyDrops}}, where this function is used.
}
\author{
Aaron Lun
}
