% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/sparse_matrix.R
\name{make_sparse_matrix}
\alias{make_sparse_matrix}
\title{Convert the Output of \code{kallisto bus} into Gene by Gell Matrix}
\usage{
make_sparse_matrix(
  bus_path,
  tr2g,
  est_ncells,
  est_ngenes,
  whitelist = NULL,
  gene_count = TRUE,
  TCC = TRUE,
  single_gene = TRUE,
  verbose = TRUE,
  progress_unit = 5e+06
)
}
\arguments{
\item{bus_path}{Path to the sorted text \code{bus} output file.}

\item{tr2g}{A Data frame with columns \code{gene} and \code{transcript}, in
the same order as in the transcriptome index for \code{kallisto}. This
argument can be missing or is ignored if only the TCC matrix, not the gene
count matrix, is made.}

\item{est_ncells}{Estimated number of cells; providing this argument will
speed up computation as it minimizes memory reallocation as vectors grow.}

\item{est_ngenes}{Estimated number of genes or equivalence classes.}

\item{whitelist}{A character vector with valid cell barcodes. This is an
optional argument, that defaults to \code{NULL}. When it is \code{NULL},
all cell barcodes present that have some UMI assignable to genes or ECs will
be included in the sparse matrix whether they are known to be valid or not.
Barcodes with only UMIs that are not assignable to genes or ECs will still be
excluded.}

\item{gene_count}{Logical, whether the gene count matrix should be returned.}

\item{TCC}{Logical, whether the TCC matrix should be returned.}

\item{single_gene}{Logical, whether to use single gene mode. In single gene
mode, only UMIs that can be uniquely mapped to one gene are kept. Without
single gene mode, UMIs mapped to multiple genes will be evenly distributed to
those genes.}

\item{verbose}{Whether to display progress.}

\item{progress_unit}{How many iteration to print one progress update when
reading in the \code{kallisto bus} file.}
}
\value{
If both gene count and TCC matrices are returned, then this function
returns a list with two matrices, each with genes/equivalence classes in the
rows and barcodes in the columns. If only one of gene count and TCC matrices
is returned, then a \code{dgCMatrix} with genes/equivalence classes in the
rows and barcodes in the columns. These matrices are unfiltered. Please filter
the empty droplets before downstream analysis.
}
\description{
This function takes the output file of \code{kallisto bus}, after being
sorted and converted into text with \code{bustools}. See vignettes on the
\href{https://bustools.github.io/BUS_notebooks_R/}{website of this package} for a
tutorial. The \code{bustools} output has 4 columns: barcode, UMI, equivalence
class, and counts. This function converts that file into a sparse matrix that
can be used in downstream analyses.
}
\details{
This function can generate both the gene count matrix and the transcript
compatibility count (TCC) matrix. The TCC matrix has barcodes in the columns
and equivalence classes in the rows. See
\href{https://genomebiology.biomedcentral.com/articles/10.1186/s13059-016-0970-8}{Ntranos et al. 2016}
for more information about the RCC matrix.

For 10x data sets, you can find a barcode whitelist file that comes with
CellRanger installation. You don't need to run CellRanger to get that. An
example path to get the whitelist file is
\code{cellranger-2.1.0/cellranger-cs/2.1.0/lib/python/cellranger/barcodes/737K-august-2016.txt}
for v2 chemistry.
}
\examples{
# Load toy example for testing
toy_path <- system.file("testdata", package = "BUSpaRse")
load(paste(toy_path, "toy_example.RData", sep = "/"))
out_fn <- paste0(toy_path, "/output.sorted.txt")
# With whitelist
m <- make_sparse_matrix(out_fn, tr2g_toy, 10, 3, whitelist = whitelist,
  gene_count = TRUE, TCC = FALSE, single_gene = TRUE,
  verbose = FALSE)
}
\seealso{
\code{\link{EC2gene}}
}
