% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/RNAimport.R
\name{RNAimport}
\alias{RNAimport}
\title{Import and organise sRNAseq & mRNAseq data sets}
\usage{
RNAimport(
  input = c("sRNA", "mRNA"),
  directory,
  samples,
  analysisType = "mobile",
  annotation,
  idattr = "Name",
  FPKM = FALSE,
  featuretype = "mRNA"
)
}
\arguments{
\item{input}{string; define type of dataset.
"sRNA" for sRNAseq data and "mRNA" for mRNAseq data.}

\item{directory}{path; directory containing of sample folders generated by
\code{ShortStack}}

\item{samples}{character; vector naming samples correlating
to outputted folders within the \code{directory} path.}

\item{analysisType}{character; either "core" or "mobile" to represent the sRNA
analysis workflow. Where the "core" sRNA analysis imports all reads (unique
& multi-map), while "mobile" sRNA analysis imports only uniquely aligned read
counts. Only for sRNA data, default is "mobile".}

\item{annotation}{path; directory to genome annotation (GFF) file used for
pre-processing. Only for mRNA data.}

\item{idattr}{character; GFF attribute to be used as feature ID containing
mRNA names. Several GFF lines with the same feature ID will be considered as
parts of the same  feature. The feature ID is used to identity the counts in
the output table. Default is "Name". Only for mRNA data.}

\item{FPKM}{logical; calculate the FPKM for each sample. Default is FALSE.}

\item{featuretype}{character; type of feature. Default is "mRNA",
only for mRNA data.}
}
\value{
\strong{For sRNAseq:}
A dataframe where rows represent sRNA clusters and columns represent
replicate information extracted from the ShortStack output. Replicate
information includes Dicercall, Counts, and MajorRNA sequence. Each replicate
information is distinguishable as the replicate name is joined as a suffix to
each column name. For example, for a sample called "Sample1", the columns will
include DicerCall_Sample1, Count_Sample1, MajorRNA_Sample1 and RPM_Sample1.

The breakdown of each column:
\itemize{
\item \code{Locus} : sRNA cluster locus
\item \code{chr} : Chromosome
\item \code{start} : start coordinate of cluster
\item \code{end} : end coordinate of cluster
\item \code{Cluster} : name of cluster
\item \code{DicerCall_} : the size in nucleotides of most abundant sRNA in the cluster
\item \code{Count_} :  number of uniquely aligned sRNA-seq reads that overlap the locus
\item \code{MajorRNA_} : RNA sequence of the most abundant sRNA in the cluster
\item \code{RPM_} : reads per million
\item \code{FPKM_} : Fragments Per Kilobase of transcript per Million mapped reads (only if option activated)
}

\strong{For mRNAseq:}
A dataframe where rows represent genes and columns represent replicate
information extracted from HTseq result. Replicate information includes Counts
and FPKM.  For example, for a sample called "Sample1", the columns will
include Count_Sample1, and FPKM_Sample1.

The breakdown of each column:
\itemize{
\item \code{mRNA} : Name of mRNA
\item \code{Locus}: Genomic loci of mRNA
\item \code{chr} : Chromosome
\item \code{start} : start coordinate
\item \code{end} : end coordinate
\item \code{width}: width in nucleotides of regions
\item \code{Count_} : number of uniquely aligned mRNA-seq reads that overlap the locus
\item \code{FPKM_} : Fragments Per Kilobase of transcript per Million mapped reads
}
}
\description{
Load and organise either sRNAseq or mRNAseq pre-processing
results into a single dataframe containing all experimental replicates
specified where rows represent either a sRNA cluster
(ie. sRNA producing-locus) or gene, respectively. Based on using the
mobileRNA pre-processing method (See \code{\link[=mapRNA]{mapRNA()}}).
}
\details{
The \code{RNAimport()} function requires the user to supply a directory path and
a character vector. The path must be to the pre-processing output.

Following the \code{mobileRNA} method, for sRNA analysis, the path will be to the
\verb{2_alignment_results} folder. While for mRNA analysis, the path will be to
the \verb{2_raw_counts} folder. Both folders are generated by the
\code{\link[=mapRNA]{mapRNA()}} function. The vector should contain strings
that represent and mirror the names of the sample replicate folders in the
above directory.

Together this information allows the function to extract the information
stored in "Result.txt" files of each sample.
}
\examples{
\dontrun{
# import sRNAseq data
df_sRNA <- RNAimport(input = "sRNA",
                     directory = "./analysis/sRNA_mapping_results",
                     samples = c("heterograft_1", "heterograft_2",
                     "heterograft_3","selfgraft_1" , "selfgraft_2" ,
                     "selfgraft_3"))


# The output of this function can be explored in the data object sRNA_data
data("sRNA_data")
head(sRNA_data)


# import sRNAseq data
df_mRNA <- RNAimport(input = "mRNA",
                     directory = "./analysis/mRNA_mapping_results",
                     samples = c("heterograft_1", "heterograft_2",
                     "heterograft_3","selfgraft_1" , "selfgraft_2" ,
                     "selfgraft_3"), 
                     annotation = "./merged_annotation.gff3")


}

}
\references{
ShortStack \url{https://github.com/MikeAxtell/ShortStack},
HISAT2 \url{https://anaconda.org/bioconda/hisat2},
HTSeq \url{https://htseq.readthedocs.io/en/master/install.html},
SAMtools \url{https://anaconda.org/bioconda/samtools}
}
