% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/encodeSeqShape.R
\name{encodeSeqShape}
\alias{encodeSeqShape}
\title{Encode k-mer DNA sequence and n-th order DNA Shape features}
\usage{
encodeSeqShape(fastaFileName, shapeMatrix, featureNames, normalize)
}
\arguments{
\item{fastaFileName}{A character name of the input fasta format file,
including full path to file if it is located outside the current working
directory.}

\item{shapeMatrix}{A matrix containing DNAshape prediction result}

\item{featureNames}{A vector containing a combination of user-defined
sequence and shape parameters. The parameters can be any combination of
"k-mer", "n-shape", "n-MGW", "n-ProT", "n-Roll", "n-HelT" (k, n are
integers)}

\item{normalize}{A logical indicating whether to perform
normalization. Default to TRUE.}
}
\value{
featureVector A matrix containing encoded features. Sequence
features are represented as binary numbers, while shape features are
represented as real numbers.
}
\description{
DNAshapeR can be used to generate feature vectors for a user-defined model.
These models can be based on DNA sequence (1-mer, 2-mer, 3-mer) or DNA
shape (MGW, Roll, ProT, HelT) features or any combination thereof. Sequence
is encoded as four binary features (i.e., 0001 for adenine, 0010 for
cytosine, 0100 for guanine, and 1000 for thymine, for encoding of 1-mers)
at each nucleotide position (Zhou, et al., 2015). Encoding of 2-mers and
3-mers (16 and 64 binary features at each position, respectively) is also
supported. Shape features include first and second order (or higher order)
values for the four structural parameters MGW, Roll, ProT and HelT. The
second order shape features are product terms of values for the same
category of shape features at adjacent positions. The function allows to
generate any subset of these features, e.g. a given shape category or first
order shape features, and any desired combination of shape and sequence
features. Feature encoding returns a feature matrix for a dataset of
multiple sequences, in which each sequence generates a concatenated feature
vector. The output of this function can be used directly for any statistical
machine learning method.
}
\examples{
fn <- system.file("extdata", "CGRsample_short.fa", package = "DNAshapeR")
pred <- getShape(fn)
featureNames <- c("1-shape")
featureVector <- encodeSeqShape(fn, pred, featureNames)
}
\author{
Tsu-Pei Chiu
}
\keyword{core}
