% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/detectAnomaly.R, R/plot.detectAnomalyObject.R
\name{detectAnomaly}
\alias{detectAnomaly}
\alias{plot.detectAnomalyObject}
\title{PCA Anomaly Scores via Isolation Forests with Visualization}
\usage{
detectAnomaly(
  reference_data,
  query_data = NULL,
  ref_cell_type_col,
  query_cell_type_col = NULL,
  cell_types = NULL,
  pc_subset = 1:5,
  n_tree = 500,
  anomaly_treshold = 0.6,
  assay_name = "logcounts",
  ...
)

\method{plot}{detectAnomalyObject}(
  x,
  cell_type = NULL,
  pc_subset = NULL,
  data_type = c("query", "reference"),
  ...
)
}
\arguments{
\item{reference_data}{A \code{\linkS4class{SingleCellExperiment}} object containing numeric expression matrix for the reference cells.}

\item{query_data}{An optional \code{\linkS4class{SingleCellExperiment}} object containing numeric expression matrix for the query cells.
If NULL, then the isolation forest anomaly scores are computed for the reference data. Default is NULL.}

\item{ref_cell_type_col}{A character string specifying the column name in the reference dataset containing cell type annotations.}

\item{query_cell_type_col}{A character string specifying the column name in the query dataset containing cell type annotations.}

\item{cell_types}{A character vector specifying the cell types to include in the plot. If NULL, all cell types are included.}

\item{pc_subset}{A numeric vector specifying the indices of the PCs to be included in the plots. If NULL, all PCs
in \code{reference_mat_subset} will be included.}

\item{n_tree}{An integer specifying the number of trees for the isolation forest. Default is 500}

\item{anomaly_treshold}{A numeric value specifying the threshold for identifying anomalies, Default is 0.6.}

\item{assay_name}{Name of the assay on which to perform computations. Default is "logcounts".}

\item{...}{Additional arguments.}

\item{x}{A list object containing the anomaly detection results from the \code{detectAnomaly} function.
Each element of the list should correspond to a cell type and contain \code{reference_mat_subset}, \code{query_mat_subset},
\code{var_explained}, and \code{anomaly}.}

\item{cell_type}{A character string specifying the cell type for which the plots should be generated. This should
be a name present in \code{x}. If NULL, the "Combined" cell type will be plotted. Default is NULL.}

\item{data_type}{A character string specifying whether to plot the "query" data or the "reference" data. Default is "query".}
}
\value{
A list containing the following components for each cell type and the combined data:
\item{anomaly_scores}{Anomaly scores for each cell in the query data.}
\item{anomaly}{Logical vector indicating whether each cell is classified as an anomaly.}
\item{reference_mat_subset}{PCA projections of the reference data.}
\item{query_mat_subset}{PCA projections of the query data (if provided).}
\item{var_explained}{Proportion of variance explained by the retained principal components.}

The S3 plot method returns a \code{ggplot} object representing the PCA plots with anomalies highlighted.
}
\description{
This function detects anomalies in single-cell data by projecting the data onto a PCA space and using an isolation forest
algorithm to identify anomalies.

The S3 plot method generates faceted scatter plots for specified principal component (PC) combinations
within an anomaly detection object. It allows visualization of the relationship between specified
PCs and highlights anomalies detected by the Isolation Forest algorithm.
}
\details{
This function projects the query data onto the PCA space of the reference data. An isolation forest is then built on the
reference data to identify anomalies in the query data based on their PCA projections. If no query dataset is provided by the user,
the anomaly scores are computed on the reference data itself. Anomaly scores for the data with all combined cell types are also
provided as part of the output.

The S3 plot method extracts the specified PCs from the given anomaly detection object and generates
scatter plots for each pair of PCs. It uses \code{ggplot2} to create a faceted plot where each facet represents
a pair of PCs. Anomalies are highlighted in red, while normal points are shown in black.
}
\examples{
# Load data
data("reference_data")
data("query_data")

# Store PCA anomaly data
anomaly_output <- detectAnomaly(reference_data = reference_data,
                                query_data = query_data,
                                ref_cell_type_col = "expert_annotation",
                                query_cell_type_col = "SingleR_annotation",
                                pc_subset = 1:5,
                                n_tree = 500,
                                anomaly_treshold = 0.6)

# Plot the output for a cell type
plot(anomaly_output,
     cell_type = "CD4",
     pc_subset = 1:5,
     data_type = "query")

}
\references{
\itemize{
  \item Liu, F. T., Ting, K. M., & Zhou, Z. H. (2008). Isolation forest. In 2008 Eighth IEEE International Conference on Data Mining (pp. 413-422). IEEE.
  \item \href{https://cran.r-project.org/web/packages/isotree/isotree.pdf}{isotree: Isolation-Based Outlier Detection}
}
}
\seealso{
\code{\link{plot.detectAnomalyObject}}

\code{\link{detectAnomaly}}
}
\author{
Anthony Christidis, \email{anthony-alexander_christidis@hms.harvard.edu}
}
\keyword{internal}
