% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/regenrichClasses.R
\name{RegenrichSet}
\alias{RegenrichSet}
\title{RegenrichSet object creator}
\usage{
RegenrichSet(
  expr,
  colData,
  rowData = NULL,
  method = c("Wald_DESeq2", "LRT_DESeq2", "limma", "LRT_LM"),
  minMeanExpr = NULL,
  design,
  reduced,
  contrast,
  coef = NULL,
  name,
  fitType = c("parametric", "local", "mean"),
  sfType = c("ratio", "poscounts", "iterate"),
  betaPrior,
  minReplicatesForReplace = 7,
  useT = FALSE,
  minmu = 0.5,
  parallel = FALSE,
  BPPARAM = bpparam(),
  altHypothesis = c("greaterAbs", "lessAbs", "greater", "less"),
  listValues = c(1, -1),
  cooksCutoff,
  independentFiltering = TRUE,
  alpha = 0.1,
  filter,
  theta,
  filterFun,
  addMLE = FALSE,
  blind = FALSE,
  ndups = 1,
  spacing = 1,
  block = NULL,
  correlation,
  weights = NULL,
  proportion = 0.01,
  stdev.coef.lim = c(0.1, 4),
  trend = FALSE,
  robust = FALSE,
  winsor.tail.p = c(0.05, 0.1),
  reg = TFs$TF_name,
  networkConstruction = c("COEN", "GRN", "new"),
  topNetPercent = 5,
  directed = FALSE,
  rowSample = FALSE,
  softPower = NULL,
  networkType = "unsigned",
  TOMDenom = "min",
  RsquaredCut = 0.85,
  edgeThreshold = NULL,
  K = "sqrt",
  nbTrees = 1000,
  importanceMeasure = "IncNodePurity",
  trace = FALSE,
  minR = 0.3,
  enrichTest = c("FET", "GSEA"),
  namedScoresCutoffs = 0.05,
  minSize = 5,
  maxSize = 5000,
  pvalueCutoff = 0.05,
  qvalueCutoff = 0.2,
  regAltName = NULL,
  universe = NULL,
  nperm = 10000
)
}
\arguments{
\item{expr}{matrix or data.frame, expression profile of a set of
genes or a set of proteins. If the \code{method = 'Wald_DESeq2' or
'LRT_DESeq2'}
only non-negative integer matrix (read counts by RNA sequencing) is
accepted.}

\item{colData}{data frame, sample phenotype data. 
The rows of colData must correspond to the columns of expr.}

\item{rowData}{NULL or data frame, information of each row/gene. 
Default is NULL, which will generate a DataFrame of three columns, i.e.,
"gene", "p", and "logFC".}

\item{method}{either 'Wald_DESeq2', 'LRT_DESeq2', 'limma', or 'LRT_LM'
for the differential expression analysis.
\itemize{
\item When method = 'Wald_DESeq2', the Wald test in DESeq2 package is used;
\item When method = 'LRT_DESeq2', the likelihood ratio test (LRT) in DESeq2
package is used;
\item When method = 'limma', the `ls` method and empirical Bayes method in
limma package are used to calculate moderated t-statistics and differential
p-values;
\item When method = 'LRT_LM', a likelihood ratio test is performed for each
row of `expr` to compare two linear model specified by `design` and
`reduced` arguments. In this case, the fold changes are not calculated
but set to 0.
}}

\item{minMeanExpr}{numeric, the cutoff of gene average expression for
pre-filtering. The rows of `expr` with everage expression < minMeanExpr is
removed. The higher `minMeanExpr` is, the more genes are not included for
testing.}

\item{design}{either model formula or model matrix. For method =
'LRT_DESeq2' or
'LRT_LM', the design is the full model formula/matrix. For method =
'limma',
and if design is a formula, the model matrix is constructed using
model.matrix(design, colData), so the name of each term in the design
formula must
be included in the column names of `colData`.}

\item{reduced}{The argument is used only when method = 'LRT_DESeq2' or
'LRT_LM', it is a reduced formula/matrix to compare against.
If the design is a model matrix, `reduced` must also be a model matrix.}

\item{contrast}{The argument is used only when method = 'LRT_DESeq2',
'Wald_DESeq2', or 'limma'. \cr
When method = 'LRT_DESeq2', or 'Wald_DESeq2', it specifies what comparison
to extract from the `DESeqDataSet` object to build a results table
(when method = 'LRT_DESeq2', this does not affect the value of `stat`,
`pvalue`, or `padj`). \cr
It can be one of following three formats:
\itemize{
\item a character vector with exactly three elements: the name of a
factor in the
design formula, the name of the numerator level for the fold change,
and the
name of the denominator level for the fold change;
\item a list of 1 or 2 character vector(s): the first element specifies
the names
of the fold changes for the numerator, and the second element (optional)
specifies the
names of the fold changes for the denominator. These names should be
elements
of \code{getResultsNames(design, colData)};
\item a numeric contrast vector with one element for each element in
\code{getResultsNames(design, colData)}.\cr
}

When method = 'limma', It can be one of following two formats:
\itemize{
\item a numeric matrix with rows corresponding to coefficients in
design matrix and
columns containing contrasts;
\item a numeric vector if there is only one contrast. Each element of
the vector
corresponds to coefficients in design matrix. This is similar to the
third
format of contrast when method = 'LRT_DESeq2', or 'Wald_DESeq2'.
}}

\item{coef}{The argument is used only when method = 'limma'. (Vector of)
column
number or column name specifying which coefficient or contrast of the
linear model
is of interest. Default is NULL.}

\item{name}{The argument is used only when method = 'LRT_DESeq2' or
'Wald_DESeq2'.
the name of the individual effect (coefficient) for building a results
table.
Use this argument rather than contrast for continuous variables,
individual
effects or for individual interaction terms. The value provided to
name must
be an element of \code{getResultsNames(design, colData)}.}

\item{fitType}{either 'parametric', 'local', or 'mean' for the type of
fitting
of dispersions to the mean intensity. This argument is used only when
method =
'Wald_DESeq2' or 'LRT_DESeq2'. See \code{\link{DESeq}} from DESeq2
package
for more details. Default is 'parametric'.}

\item{sfType}{either 'ratio', 'poscounts', or 'iterate' for the type
of size
factor estimation. This argument is used only when method = either
'Wald_DESeq2' or 'LRT_DESeq2'. See \code{\link{DESeq}} from DESeq2
package
for more details. Default is 'ratio'.}

\item{betaPrior}{This argument is used only when method = either
'Wald_DESeq2' or 'LRT_DESeq2'. See \code{\link{DESeq}} from DESeq2
package
for more details.}

\item{minReplicatesForReplace}{This argument is used only when method
= either
'Wald_DESeq2' or 'LRT_DESeq2'. See \code{\link{DESeq}} from DESeq2
package
for more details. Default is 7.}

\item{useT}{This argument is used only when method = either
'Wald_DESeq2' or 'LRT_DESeq2'. See \code{\link{DESeq}} from DESeq2
package
for more details. Default is FALSE,}

\item{minmu}{This argument is used only when method = either
'Wald_DESeq2' or 'LRT_DESeq2'. See \code{\link{DESeq}} from DESeq2
package
for more details. Default is 0.5.}

\item{parallel}{whether computing (only for differential analysis
with method = "Wald_DESeq2" or "LRT_DESeq2") is parallel (default
is FALSE).}

\item{BPPARAM}{parameters for parallel computing (default is
\code{bpparam()}).}

\item{altHypothesis}{= c('greaterAbs', 'lessAbs', 'greater', 'less').
This argument is used only when method = either
'Wald_DESeq2' or 'LRT_DESeq2'. See \code{\link{results}} from DESeq2
package
for more details. Default is 'greaterAbs'.}

\item{listValues}{This argument is used only when method = either
'Wald_DESeq2' or 'LRT_DESeq2'. See \code{\link{results}} from DESeq2
package
for more details. Default is c(1, -1),}

\item{cooksCutoff}{theshold on Cook's distance, such that if one or
more
samples for a row have a distance higher, the p-value for the row is
set to NA.
This argument is used only when method = either
'Wald_DESeq2' or 'LRT_DESeq2'. See \code{\link{results}} from DESeq2
package
for more details.}

\item{independentFiltering}{logical, whether independent filtering
should be
applied automatically. This argument is used only when method = either
'Wald_DESeq2' or 'LRT_DESeq2'. See \code{\link{results}} from DESeq2
package
for more details. Default is TRUE.}

\item{alpha}{the significance cutoff used for optimizing the independent
filtering.
This argument is used only when method = either
'Wald_DESeq2' or 'LRT_DESeq2'. See \code{\link{results}} from DESeq2
package
for more details. Default is 0.1,}

\item{filter}{the vector of filter statistics over which the independent
filtering is optimized. By default the mean of normalized counts is used.
This argument is used only when method = either
'Wald_DESeq2' or 'LRT_DESeq2'. See \code{\link{results}} from DESeq2
package
for more details.}

\item{theta}{the quantiles at which to assess the number of rejections
from
independent filtering. This argument is used only when method = either
'Wald_DESeq2' or 'LRT_DESeq2'. See \code{\link{results}} from DESeq2
package
for more details.}

\item{filterFun}{an optional custom function for performing independent
filtering
and p-value adjustment. This argument is used only when method = either
'Wald_DESeq2' or 'LRT_DESeq2'. See \code{\link{results}} from DESeq2
package
for more details.}

\item{addMLE}{if betaPrior=TRUE was used, whether the 'unshrunken' maximum
likelihood estimates (MLE) of log2 fold change should be added as a column
to the results table. This argument is used only when method = either
'Wald_DESeq2' or 'LRT_DESeq2'. See \code{\link{results}} from DESeq2
package
for more details. Default is FALSE.}

\item{blind}{logical, whether to blind the transformation to the
experimental
design. This argument is used only when method = either
'Wald_DESeq2' or 'LRT_DESeq2'. See \code{\link{vst}} from DESeq2 package
for
more details. Default is FALSE, which is different from the default of
vst function.}

\item{ndups}{positive integer giving the number of times each distinct
probe is
printed on each array. This argument is used only when method = 'limma'.
See \code{\link{lmFit}} from limma package for more details. Default is 1.}

\item{spacing}{positive integer giving the spacing between duplicate
occurrences of
the same probe, spacing=1 for consecutive rows. This argument is used only
when method = 'limma'. See \code{\link{lmFit}} from limma package for
more details. Default is 1.}

\item{block}{vector or factor specifying a blocking variable on the arrays.
Has length equal to the number of arrays. Must be NULL if ndups > 2.
This argument is used only when method = 'limma'. See \code{\link{lmFit}}
from limma package for more details. Default is NULL.}

\item{correlation}{the inter-duplicate or inter-technical replicate
correlation.
The correlation value should be estimated using the
\code{\link{duplicateCorrelation}}
function. This argument is used only when method = 'limma'.
See \code{\link{lmFit}}
from limma package for more details.}

\item{weights}{non-negative precision weights. Can be a numeric matrix of
individual weights of same size as the object expression matrix, or a
numeric
vector of array weights with length equal to ncol of the expression matrix,
or a numeric vector of gene weights with length equal to nrow of the
expression
matrix. This argument is used only when method = 'limma' or 'LRT_LM'.
See \code{\link{lmFit}} from limma package for more details. Default
is NULL.}

\item{proportion}{numeric value between 0 and 1, assumed proportion of
genes which
are differentially expressed. This argument is used only when method =
'limma'.
See \code{\link{eBayes}} from limma package for more details. Default is
0.01.}

\item{stdev.coef.lim}{numeric vector of length 2, assumed lower and
upper limits
for the standard deviation of log2-fold-changes for differentially
expressed
genes. This argument is used only when method = 'limma'.
See \code{\link{eBayes}}
from limma package for more details. Default is c(0.1, 4).}

\item{trend}{logical, should an intensity-trend be allowed for the prior
variance?
This argument is used only when method = 'limma'. See \code{\link{eBayes}}
from limma package for more details. Default is FALSE, meaning that the
prior
variance is constant.}

\item{robust}{logical, should the estimation of df.prior and var.prior be
robustified against outlier sample variances? This argument is used only
when method = 'limma'. See \code{\link{eBayes}}
from limma package for more details. Default is FALSE.}

\item{winsor.tail.p}{numeric vector of length 1 or 2, giving left and right
tail proportions of x to Winsorize. Used only when method = 'limma' and
robust=TRUE. See \code{\link{eBayes}}
from limma package for more details. Default is c(0.05,0.1)}

\item{reg}{a vector of regulator names (ID). By default, these are
transcription
(co-)factors defined by three literatures/databases, namely RegNet,
TRRUST, and Marbach2016. The type (for example ENSEMBL gene ID, Entrez
gene ID,
or gene symble/name) of names or IDs of these regulators must be the
same as the type of names or IDs in the regulator-target network.}

\item{networkConstruction}{the method to construct this network.
Possible can be:\cr
'COEN', coexpression network;\cr
'GRN', gene regulatory network by random forest;\cr
'new' (default), meaning a network provided by user, rather than
infered based
on the expression data.\cr}

\item{topNetPercent}{numeric, what percentage of the top edges in the
full
network is ratained. Default is 5, meaning top 5\% of edges. This value
must
be between 0 and 100.}

\item{directed}{logical, whether the network is directed. Default is
FALSE.}

\item{rowSample}{logic, if TRUE, each row represents a sample.
Otherwise, each column represents a sample. Default is FALSE.}

\item{softPower}{numeric, a soft power to achieve scale free topology.
If not provided, the parameter will be picked automatically by
\code{\link{plotSoftPower}} function.}

\item{networkType}{network type. Allowed values are (unique abbreviations
of)
'unsigned' (default), 'signed', 'signed hybrid'.
See \code{\link{adjacency}}.}

\item{TOMDenom}{a character string specifying the TOM variant to be used.
Recognized values are 'min' giving the standard TOM described in Zhang
and Horvath (2005), and 'mean' in which the min function in the
denominator is replaced by mean. The 'mean' may produce better results
but at this time should be considered experimental.}

\item{RsquaredCut}{desired minimum scale free topology fitting index R^2.
Default is 0.85.}

\item{edgeThreshold}{numeric, the threshold to remove the low weighted
edges, Default is NULL, which means no edges will be removed.}

\item{K}{integer or character. The number of features in each tree,
can be either a integer number, `sqrt`, or `all`.
`sqrt` denotes sqrt(the number of `reg`), `all`
means the number of `reg`. Default is `sqrt`.}

\item{nbTrees}{integer. The number of trees. Default is 1000.}

\item{importanceMeasure}{character. importanceMeasure can be `\%IncMSE`
or `IncNodePurity`, corresponding to type = 1 and 2 in
\code{\link{importance}}
function, respectively. Default is `IncNodePurity`(decrease in node
impurity),
which is faster than `\%IncMSE` (decrease in accuracy).}

\item{trace}{logical. To show the progress or not (default).}

\item{minR}{numeric. The minimum correlation coefficient of
prediction is to
control model accuracy. Default is 0.3.}

\item{enrichTest}{character, specifying the enrichment analysis method,
which
is either `FET` (Fisher's exact test) or `GSEA` (gene set enrichment
analysis).}

\item{namedScoresCutoffs}{numeric, the significance cutoff for the
differential
analysis p value. Default is 0.05.}

\item{minSize}{The minimum number (default 5) of target genes.}

\item{maxSize}{The maximum number (default 5000) of target genes.}

\item{pvalueCutoff}{numeric, the significance cutoff for adjusted
enrichment p value.
This is used for obtaining the `topResult` slot in the final `Enrich`
object. Default is 0.05.}

\item{qvalueCutoff}{numeric, the significance cutoff of enrichment
q-value.
Default is 0.2.}

\item{regAltName}{alternative name for regulator. Default is NULL.}

\item{universe}{a vector of charactors. Background target genes.}

\item{nperm}{integer, number of permutations. The minimial possible
nominal p-value is about 1/nperm. Default is 10000.}
}
\value{
an object of RegenrichSet class.
}
\description{
This is `RegenrichSet` object creator function.
There are four types of parameters in this function.\cr
First, parameters to provide raw data and sample information;\cr
`expr` and `colData`.\cr\cr
Second, parameters to perform differential expression analysis;\cr
`method`, `minMeanExpr`, `design`, `reduced`, `contrast`,
`coef`, `name`, `fitType`, `sfType`, `betaPrior`, `minReplicatesForReplace`,
`useT`, `minmu`, `parallel`, `BPPARAM` (also for network inference),
`altHypothesis`, `listValues`, `cooksCutoff`, `independentFiltering`,
`alpha`, `filter`, `theta`, `filterFun`, `addMLE`, `blind`, `ndups`,
`spacing`, `block`, `correlation`, `weights`, `proportion`,
`stdev.coef.lim`, `trend`, `robust`, and `winsor.tail.p`.\cr\cr
Thrid, parameters to perform regulator-target network inference;\cr
`reg`, `networkConstruction`, `topNetPercent`, `directed`, `rowSample`,
`softPower`, `networkType`, `TOMDenom`, `RsquaredCut`, `edgeThreshold`,
`K`, `nbTrees`, `importanceMeasure`, `trace`,
`BPPARAM` (also for  differential expression analysis), and `minR`.\cr\cr
Fourth, parameters to perform enrichment analysis:\cr
`enrichTest`, `namedScoresCutoffs`, `minSize`, `maxSize`, `pvalueCutoff`,
`qvalueCutoff`, `regAltName`, `universe`, and `nperm`.\cr\cr
}
\examples{
# library(RegEnrich)
data("Lyme_GSE63085")
data("TFs")

data = log2(Lyme_GSE63085$FPKM + 1)
colData = Lyme_GSE63085$sampleInfo

# Take first 2000 rows for example
data1 = data[seq(2000), ]

design = model.matrix(~0 + patientID + week, data = colData)

# Initializing a 'RegenrichSet' object
object = RegenrichSet(expr = data1,
                      colData = colData,
                      method = 'limma', minMeanExpr = 0,
                      design = design,
                      contrast = c(rep(0, ncol(design) - 1), 1),
                      networkConstruction = 'COEN',
                      enrichTest = 'FET')
object
}
