% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/find_ORFs.R
\name{findORFsFasta}
\alias{findORFsFasta}
\title{Finds Open Reading Frames in fasta files.}
\usage{
findORFsFasta(
  filePath,
  startCodon = startDefinition(1),
  stopCodon = stopDefinition(1),
  longestORF = TRUE,
  minimumLength = 0,
  is.circular = FALSE
)
}
\arguments{
\item{filePath}{(character) Path to the fasta file. Can be both uppercase or
lowercase. Or a already loaded R object of either types:
"BSgenome" or "DNAStringSet" with named sequences}

\item{startCodon}{(character vector) Possible START codons to search for.
Check \code{\link{startDefinition}} for helper function. Note that it is
case sensitive, so "atg" would give 0 hits for a sequence with only capital
"ATG" ORFs.}

\item{stopCodon}{(character vector) Possible STOP codons to search for.
Check \code{\link{stopDefinition}} for helper function. Note that it is
case sensitive, so "tga" would give 0 hits for a sequence with only capital
"TGA" ORFs.}

\item{longestORF}{(logical) Default TRUE. Keep only the longest ORF per
unique stopcodon: (seqname, strand, stopcodon) combination, Note: Not longest
per transcript! You can also use function
\code{\link{longestORFs}} after creation of ORFs for same result.}

\item{minimumLength}{(integer) Default is 0. Which is START + STOP = 6 bp.
Minimum length of ORF, without counting 3bps for START and STOP codons.
For example minimumLength = 8 will result in size of ORFs to be at least
START + 8*3 (bp) + STOP = 30 bases. Use this param to restrict search.}

\item{is.circular}{(logical) Whether the genome in filePath is circular.
Prokaryotic genomes are usually circular. Be carefull if you want to
extract sequences, remember that seqlengths must be set, else it does not
know what last base in sequence is before loop ends!}
}
\value{
(GRanges) object of ORFs mapped from fasta file. Positions are
relative to the fasta file.
}
\description{
Should be used for procaryote genomes or transcript sequences as fasta.
Makes no sence for eukaryote whole genomes, since those contains splicing
(use findMapORFs for spliced ranges).
Searches through each fasta header and reports all ORFs found for BOTH
sense (+) and antisense strand (-) in all frames. Name of the header will
be used as seqnames of reported ORFs.
Each fasta header is treated separately, and name of the sequence will
be used as seqname in returned GRanges object. This supports circular
genomes.
}
\details{
Remember if you have a fasta file of transcripts (transcript coordinates),
delete all negative stranded ORFs afterwards by:
orfs <- orfs[strandBool(orfs)] # negative strand orfs make no sense then.
Seqnames are created from header by format: >name info, so name must be
first after "biggern than" and space between name and info.
Also make sure your fasta file is valid (no hidden spaces etc),
 as this might break the coordinate system!
}
\examples{
# location of the example fasta file
example_genome <- system.file("extdata/references/danio_rerio", "genome_dummy.fasta",
 package = "ORFik")
orfs <- findORFsFasta(example_genome)
# To store ORF sequences (you need indexed genome .fai file):
fa <- FaFile(example_genome)
names(orfs) <- paste0("ORF_", seq.int(length(orfs)), "_", seqnames(orfs))
orf_seqs <- getSeq(fa, orfs)
# You sequences (fa), needs to have isCircular(fa) == TRUE for it to work
# on circular wrapping ranges!

# writeXStringSet(DNAStringSet(orf_seqs), "orfs.fasta")
}
\seealso{
Other findORFs: 
\code{\link{findMapORFs}()},
\code{\link{findORFs}()},
\code{\link{findUORFs}()},
\code{\link{startDefinition}()},
\code{\link{stopDefinition}()}
}
\concept{findORFs}
