% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/tximeta.R
\name{tximeta}
\alias{tximeta}
\title{Import transcript quantification with metadata}
\usage{
tximeta(
  coldata,
  type = NULL,
  txOut = TRUE,
  skipMeta = FALSE,
  skipSeqinfo = FALSE,
  useHub = TRUE,
  markDuplicateTxps = FALSE,
  cleanDuplicateTxps = FALSE,
  customMetaInfo = NULL,
  skipFtp = FALSE,
  ...
)
}
\arguments{
\item{coldata}{a data.frame with at least two columns (others will propogate to object):
\itemize{
\item \code{files} - character, paths of quantification files
\item \code{names} - character, sample names
if \code{coldata} is a vector, it is assumed to be the paths of quantification files
and unique sample names are created
}}

\item{type}{what quantifier was used, see \code{\link[tximport:tximport]{tximport::tximport()}}}

\item{txOut}{whether to output transcript-level data.
\code{tximeta} is designed to have transcript-level output
with salmon, so default is \code{TRUE},
and it's recommended to use \code{summarizeToGene}
following \code{tximeta} for gene-level summarization.
For an alevin file, \code{tximeta} will import the
gene level counts ignoring this argument (alevin
produces only gene-level quantification).}

\item{skipMeta}{whether to skip metadata generation
(e.g. to avoid errors if not connected to internet).
This calls \code{tximport} directly and so either
\code{txOut=TRUE} or \code{tx2gene} should be specified.}

\item{skipSeqinfo}{whether to skip the addition of Seqinfo,
which requires an internet connection to download the
relevant chromosome information table from UCSC}

\item{useHub}{whether to first attempt to download a TxDb/EnsDb
object from AnnotationHub, rather than creating from a
GTF file from FTP (default is TRUE). If FALSE, it will
force \code{tximeta} to download and parse the GTF}

\item{markDuplicateTxps}{whether to mark the status
(\code{hasDuplicate}) and names of duplicate transcripts
(\code{duplicates}) in the rowData of the SummarizedExperiment output.
Subsequent summarization to gene level will keep track
of the number of transcripts sets per gene (\code{numDupSets})}

\item{cleanDuplicateTxps}{whether to try to clean
duplicate transcripts (exact sequence duplicates) by replacing
the transcript names that do not appear in the GTF
with those that do appear in the GTF}

\item{customMetaInfo}{the relative path to a custom metadata
information JSON file, relative to the paths in \code{files} of
\code{coldata}. For example, \code{customMetaInfo="meta_info.json"}
would indicate that in the same directory as the quantification
files in \code{files}, there are custom metadata information
JSON files. These should contain the SHA-256 hash of the
reference transcripts with the \code{index_seq_hash} tag
(see details in vignette).}

\item{skipFtp}{whether to avoid \verb{ftp://} in case of
firewall, default is FALSE}

\item{...}{arguments passed to \code{tximport}}
}
\value{
a SummarizedExperiment with metadata on the \code{rowRanges}.
(if the hashed digest in the salmon or Sailfish index does not match
any known transcriptomes, or any locally saved \code{linkedTxome},
\code{tximeta} will just return a non-ranged SummarizedExperiment)
}
\description{
\code{tximeta} leverages the digest of the reference transcripts that were indexed
in order to identify metadata from the output of quantification tools.
A computed digest (a hash value) can be used to uniquely identify the collection
of reference sequences, and associate the dataset with other useful metadata.
After identification, tximeta uses a number of core Bioconductor packages (GenomicFeatures,
ensembldb, AnnotationHub, Seqinfo, BiocFileCache) to automatically
populate metadata for the user.
}
\details{
Most of the code in tximeta works to add metadata and transcript ranges
when the quantification was performed with salmon or related tools. However,
tximeta can be used with any quantification type that is supported
by \code{\link[tximport:tximport]{tximport::tximport()}}, where it will return an non-ranged SummarizedExperiment.
For other quantification tools see also the \code{customMetaInfo} argument below.
This behavior can also be triggered with \code{skipMeta=TRUE}.

tximeta performs a lookup of the digest (or hash value) of the index
stored in an auxilary information directory of the quantification tool's output
against a database of known transcriptomes, which is stored within the tximeta
package (\code{extdata/hashtable.csv}) and is continually updated to match Ensembl
and GENCODE releases, with updates pushed to Bioconductor current release branch.
In addition, tximeta performs a lookup of the digest against a
locally stored table of linkedTxome references, see \code{\link[=makeLinkedTxome]{makeLinkedTxome()}}.
If tximeta detects a match in either source, it will automatically populate
the transcript locations, the transcriptome release,
the genome with correct chromosome lengths, and connect the SE object to locally
cached derived metadata. tximeta also facilitates automatic summarization of
transcript-level quantifications to the gene-level via \verb{summarizeToGene`` without the need to  manually build the correct }tx2gene` table for the reference used for indexing.

tximeta on the first run will ask where the \code{\link[BiocFileCache:BiocFileCache-class]{BiocFileCache::BiocFileCache()}}
location for this package (\emph{tximeta}) should be kept, either using a default location or a temporary
directory. At any point, the user can specify a location using
\code{\link[=setTximetaBFC]{setTximetaBFC()}} and this choice will be saved for future sessions.
Multiple users can point to the same BiocFileCache, such that
transcript databases (TxDb or EnsDb) associated with certain salmon indices
and linkedTxomes can be accessed by different users without additional
effort or time spent downloading and building the relevant TxDb / EnsDb.
Note that, if the TxDb or EnsDb is present in AnnotationHub, tximeta will
use this object instead of downloading and building a TxDb/EnsDb from GTF
(to disable this set \code{useHub=FALSE}).

In order to allow that multiple users can read and write to the
same location, one should set the BiocFileCache directory to
have group write permissions (g+w).
}
\examples{

# point to a salmon quantification file:
dir <- system.file("extdata/salmon_dm", package="tximportData")
files <- file.path(dir, "SRR1197474", "quant.sf") 
coldata <- data.frame(files, names="SRR1197474", condition="A", stringsAsFactors=FALSE)

# normally we would just run the following which would download the appropriate metadata
# se <- tximeta(coldata)

# for this example, we instead point to a local path where the GTF can be found
# by making a linkedTxome:
indexDir <- file.path(dir, "Dm.BDGP6.22.98_salmon-0.14.1")
dmFTP <- "ftp://ftp.ensembl.org/pub/release-98/fasta/drosophila_melanogaster/"
fastaFTP <- paste0(
  dmFTP,
  c("cdna/Drosophila_melanogaster.BDGP6.22.cdna.all.fa.gz",
    "ncrna/Drosophila_melanogaster.BDGP6.22.ncrna.fa.gz")
)
gtfPath <- file.path(dir, "Drosophila_melanogaster.BDGP6.22.98.gtf.gz")
makeLinkedTxome(indexDir=indexDir, source="LocalEnsembl", organism="Drosophila melanogaster",
                release="98", genome="BDGP6.22", fasta=fastaFTP, gtf=gtfPath, write=FALSE)
se <- tximeta(coldata)

# to clear the entire linkedTxome table
# (don't run unless you want to clear this table!)
# bfcloc <- getTximetaBFC()
# bfc <- BiocFileCache(bfcloc)
# bfcremove(bfc, bfcquery(bfc, "linkedTxomeTbl")$rid)

}
