% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/mint.block.plsda.R
\name{mint.block.plsda}
\alias{mint.block.plsda}
\title{NP-integration with Discriminant Analysis}
\usage{
mint.block.plsda(
  X,
  Y,
  indY,
  study,
  ncomp = 2,
  design,
  scale = TRUE,
  tol = 1e-06,
  max.iter = 100,
  near.zero.var = FALSE,
  all.outputs = TRUE
)
}
\arguments{
\item{X}{A named list of data sets (called 'blocks') measured on the same samples.
Data in the list should be arranged in samples x variables, with samples
order matching in all data sets.}

\item{Y}{A factor or a class vector indicating the discrete outcome of each
sample.}

\item{indY}{To be supplied if Y is missing, indicates the position of the
matrix / vector response in the list \code{X}}

\item{study}{Factor, indicating the membership of each sample to each of the
studies being combined}

\item{ncomp}{the number of components to include in the model. Default to 2.
Applies to all blocks.}

\item{design}{numeric matrix of size (number of blocks in X) x (number of
blocks in X) with values between 0 and 1. Each value indicates the strenght
of the relationship to be modelled between two blocks; a value of 0
indicates no relationship, 1 is the maximum value. Alternatively, one of
c('null', 'full') indicating a disconnected or fully connected design,
respecively, or a numeric between 0 and 1 which will designate all
off-diagonal elements of a fully connected design (see examples in
\code{block.splsda}). If \code{Y} is provided instead of \code{indY}, the
\code{design} matrix is changed to include relationships to \code{Y}.}

\item{scale}{Logical. If scale = TRUE, each block is standardized to zero
means and unit variances (default: TRUE)}

\item{tol}{Positive numeric used as convergence criteria/tolerance during the
iterative process. Default to \code{1e-06}.}

\item{max.iter}{Integer, the maximum number of iterations. Default to  100.}

\item{near.zero.var}{Logical, see the internal \code{\link{nearZeroVar}}
function (should be set to TRUE in particular for data with many zero
values). Setting this argument to FALSE (when appropriate) will speed up the
computations. Default value is FALSE.}

\item{all.outputs}{Logical. Computation can be faster when some specific
(and non-essential) outputs are not calculated. Default = \code{TRUE}.}
}
\value{
\code{mint.block.plsda} returns an object of class
\code{"mint.plsda", "block.plsda"}, a list that contains the following
components:

\item{X}{the centered and standardized original predictor matrix.}
\item{Y}{the centered and standardized original response vector or matrix.}
\item{ncomp}{the number of components included in the model for each block.}
\item{mode}{the algorithm used to fit the model.} \item{mat.c}{matrix of
coefficients from the regression of X / residual matrices X on the
X-variates, to be used internally by \code{predict}.} \item{variates}{list
containing the \eqn{X} and \eqn{Y} variates.} \item{loadings}{list
containing the estimated loadings for the variates.} \item{names}{list
containing the names to be used for individuals and variables.}
\item{nzv}{list containing the zero- or near-zero predictors information.}
\item{tol}{the tolerance used in the iterative algorithm, used for
subsequent S3 methods} \item{max.iter}{the maximum number of iterations,
used for subsequent S3 methods} \item{iter}{Number of iterations of the
algorithm for each component}
Note that the argument 'scheme' has now been hardcoded to 'horst' and 'init' to 'svd.single'.
}
\description{
Function to integrate data sets measured on the same samples (N-integration)
and to combine multiple independent studies measured on the same variables
or predictors (P-integration) using variants of multi-group and generalised
PLS-DA for supervised classification.
}
\details{
The function fits multi-group generalised PLS models with a specified number
of \code{ncomp} components. A factor indicating the discrete outcome needs
to be provided, either by \code{Y} or by its position \code{indY} in the
list of blocks \code{X}.

\code{X} can contain missing values. Missing values are handled by being
disregarded during the cross product computations in the algorithm
\code{block.pls} without having to delete rows with missing data.
Alternatively, missing data can be imputed prior using the
\code{\link{impute.nipals}} function.

The type of algorithm to use is specified with the \code{mode} argument.
Four PLS algorithms are available: PLS regression \code{("regression")}, PLS
canonical analysis \code{("canonical")}, redundancy analysis
\code{("invariant")} and the classical PLS algorithm \code{("classic")} (see
References and more details in \code{?pls}).
}
\examples{

data(breast.TCGA)

# for the purpose of this example, we consider the training set as study1 and
# the test set as another independent study2.
study = c(rep("study1",150), rep("study2",70))

mrna = rbind(breast.TCGA$data.train$mrna, breast.TCGA$data.test$mrna)
mirna = rbind(breast.TCGA$data.train$mirna, breast.TCGA$data.test$mirna)
data = list(mrna = mrna, mirna = mirna)

Y = c(breast.TCGA$data.train$subtype, breast.TCGA$data.test$subtype)

res = mint.block.plsda(data,Y,study=study, ncomp=2)

res
}
\references{
On multi-group PLS:

Rohart F, Eslami A, Matigian, N, Bougeard S, Lê Cao K-A (2017). MINT: A
multivariate integrative approach to identify a reproducible biomarker
signature across multiple experiments and platforms. BMC Bioinformatics
18:128.

Eslami, A., Qannari, E. M., Kohler, A., and Bougeard, S. (2014). Algorithms
for multi-group PLS. J. Chemometrics, 28(3), 192-201.

On multiple integration with PLSDA:

Singh A., Gautier B., Shannon C., Vacher M., Rohart F., Tebbutt S. and Lê
Cao K.A. (2016). DIABLO: multi omics integration for biomarker discovery.
BioRxiv available here:
\url{http://biorxiv.org/content/early/2016/08/03/067611} Tenenhaus A.,
Philippe C., Guillemot V, Lê Cao K.A., Grill J, Frouin V. Variable selection
for generalized canonical correlation analysis. \emph{Biostatistics}. kxu001

Gunther O., Shin H., Ng R. T. , McMaster W. R., McManus B. M. , Keown P. A.
, Tebbutt S.J. , Lê Cao K-A. , (2014) Novel multivariate methods for
integration of genomics and proteomics data: Applications in a kidney
transplant rejection study, OMICS: A journal of integrative biology, 18(11),
682-95.

mixOmics article:

Rohart F, Gautier B, Singh A, Lê Cao K-A. mixOmics: an R package for 'omics
feature selection and multiple data integration. PLoS Comput Biol 13(11):
e1005752
}
\seealso{
\code{\link{spls}}, \code{\link{summary}}, \code{\link{plotIndiv}},
\code{\link{plotVar}}, \code{\link{predict}}, \code{\link{perf}},
\code{\link{mint.block.spls}}, \code{\link{mint.block.plsda}},
\code{\link{mint.block.splsda}} and http://www.mixOmics.org/mixMINT for more
details.
}
\author{
Florian Rohart, Benoit Gautier, Kim-Anh Lê Cao, Al J Abadi
}
\keyword{multivariate}
\keyword{regression}
