% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/patientRisk.R
\name{patientRisk}
\alias{patientRisk}
\title{patientRisk}
\usage{
patientRisk(
  seData,
  selectedGenes,
  time,
  status,
  group.vector,
  method = NULL,
  nboot = 50,
  cut_time = 10
)
}
\arguments{
\item{seData}{SummarizedExperiment object with the normalized expression 
data and the phenotypic data in colData. Phenotypic colData must contain 
the samples name in the first column and two columns with time and status.}

\item{selectedGenes}{Vector containing the genes to be used. Expected to be 
in the same format as the rows of the assay(seData). Usually this vector is 
the result of running prefilterSAM().}

\item{time}{SummarizedExperiment colData column name containing the survival 
time in years for each sample in numeric format.}

\item{status}{SummarizedExperiment colData column name containing the status 
(censored 0 and not censored 1) for each sample.}

\item{group.vector}{A numeric vector specifying predefined risk groups for 
the patients. This is optional.}

\item{method}{A character string specifying the method for defining risk 
groups, the default method is \code{"class.probs"}. Possible options are:
 - \code{"min.pval"}: Define risk groups based on the minimum p-value.
 - \code{"med.pval"}: Define risk groups based on the median p-value.
 - \code{"class.probs"}: Defines risk groups based on the classification 
 probabilities from the model.}

\item{nboot}{An integer specifying the number of bootstrap iterations for 
risk score calculation. Default is 50.}

\item{cut_time}{A numeric value specifying the cutoff time (in years) for 
survival analysis. All events beyond this time are treated as censored 
(default = 10 years).}
}
\value{
A list containing the following elements:
\itemize{
 \item{\code{cv_risk_score}: Risk score prediction for the training set 
 using a double nested crossvalidated strategy.}
 \item{\code{cv_normalized_risk}: Normalized risk score in the 
 interval (0,100).}
 \item{\code{table_genes_selected}: Data frame with the following columns: 
 The names for the genes selected by the Cox regression, the beta 
 coefficients for the optimal multivariate Cox regression fitted to the 
 training set, the Hazard Ratio for each gene and the p-value for the 
 univariate log-rank statistical test. Genes are shown by descending order 
 of the HR index.}
 \item{\code{table_genes_selected_extended}: Table with the same format as 
 table_genes_selected. A search for local minima within a 5\% range of the 
 selected minimum is performed. The goal is expanding the list of 
 significant genes to improve biological interpretability, since the lasso 
 penalty drastically reduces the number of significant genes.}
 \item{\code{model.optimalLambda}: The fitted model for the optimal 
 regularization parameter.}
 \item{\code{groups}: Vector of classification of patients in two risk 
 groups, high (2) or low (1).}
 \item{\code{riskThresholds}: Thresholds that allows to stratify the test 
 patients in three groups according to the predicted risk score: low, 
 intermediate and high risk.}
 \item{\code{range.risk}: Range of the unscaled risk score in the 
 training set.}
 \item{\code{list.models}: List of models tested for different values of the
  regularization parameter.}
 \item{\code{evaluation.models}: Data frame that provides several metrics 
 for each model evaluated. The lambda column provides the regularization 
 parameter for the multivariate Cox regression adjusted, the number_features 
 gives the number of genes selected by this model, c.index and se.c.index 
 the concordance index and the standard deviation for the risk prediction 
 and finally, the p_value_c.index and the logrank_p_value give the p-values 
 for the the concordance index and the log-rank statistics respectively. 
 Models are shown by ascending order of the log-rank p-value and the best 
 one is marked with two asterisks.}
 \item{\code{betasplot}: Dataset used to create the plot of genes ranked 
 according to the regression coefficients in the multivariate Cox model.}
 \item{\code{plot_values}: A list containing Kaplan-Meier fit results, 
 logrank p-value, and hazard ratio.}
 \item{\code{membership_prob}: If method "class.probs" is selected a table 
 with two columns is returned. The first one is the probability of 
 classification to the low risk group while the second one is the 
 membership probability to the high risk group.}
 }
}
\description{
This function selects a subset of good risk markers and 
estimates a multivariate risk score based on the UNICOX algorithm. The 
patients are stratified into two or more prognostic groups based on the 
risk score. The Cox regression is trained using a ten-fold double nested 
crossvalidation strategy to avoid overfitting.
}
\details{
A multivariate Cox regression is trained to select a subset of 
genes significantly associated with the risk and to estimate a risk score 
based on these risk markers. 
The algorithm considered is based on UNICOX, a regularized multivariate Cox 
regression model (see Tibshirani et al., 2009 for more details). 
In this predictor, the variables are penalized individually using an 
\eqn{L_1} norm term which allow us to keep more relevant genes correlated 
with risk than in Lasso. The Lasso model selects only one representative 
gene randomly from the set of correlated genes. The optimal value for the 
lambda parameter as well as the risk score are estimated using a double 
nested crossvalidation strategy. Finally, the risk score allow us to 
stratify the whole set of patients according to their risks. 
Three algorithms are implemented to estimate the optimal threshold that 
classifies the patients in risk groups. "min.pval" determines the optimal 
threshold by minimization of the log-rank p-value statistics, that is by 
maximization of the separability between the K-M curves for the high and 
low risk groups, see (Martinez-Romero et al., 2018). When several local 
minima arise this may be sample dependent and unstable. To avoid this 
problem, "med.pval" estimates the optimal threshold as the median of the 
lower 10th percentile logrank p-values. The lower 10th percentile selects 
the smallest values from the p-value distribution corresponding to 
intermediate risk patients that are on the boundary between both groups. 
This interval is more robust than a single minimum and provides good 
experimental results for a large variety of problems tested. The median 
threshold in this interval may change from one iteration to another because 
the distribution of p-values for patients with intermediate risk may change 
due to sample variations. Finally, "class.probs" implements a bootstrap 
strategy for the patients corresponding to the lower 10th percentile 
p-values and estimates a robust threshold to stratify the patients. 
It estimates also a membership probability of classification.
}
\examples{
data(seBRCA)

# genePheno ---
data(ex_genePheno)

# Survival times should be provided in YEARS
time <- 'time'
status <- 'status'
geneList <- names(ex_genePheno$genes)

set.seed(5)
ex_patientRisk <- patientRisk(seBRCA, geneList, time, status, 
                              method = "class.probs", 
                              nboot = 10)

# NOTE: For consistent results with the vignettes and example data, use 
# default parameters (e.g., nboot = 50).

# Generate the plots again
# plotLogRank(ex_patientRisk)
# plotSigmoid(ex_patientRisk)
# plotLambda(ex_patientRisk)
# plotBetas(ex_patientRisk)
# plotKM(ex_patientRisk)


}
\references{
\itemize{
  \item{\insertRef{martinezromero2018}{asuri}} 
  \item{\insertRef{BuenoFortes2023}{asuri}}
}
}
