% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/regulonPruning.R
\name{pruneRegulon}
\alias{pruneRegulon}
\title{Prune regulons for true transcription factor - regulatory elements - target genes relationships}
\usage{
pruneRegulon(
  regulon,
  expMatrix = NULL,
  peakMatrix = NULL,
  exp_assay = "logcounts",
  peak_assay = "PeakMatrix",
  test = c("chi.sq", "binom"),
  clusters = NULL,
  exp_cutoff = 1,
  peak_cutoff = 0,
  regulon_cutoff = 0.05,
  p_adj = TRUE,
  prune_value = "pval",
  aggregateCells = FALSE,
  useDim = "IterativeLSI_ATAC",
  cellNum = 10,
  BPPARAM = BiocParallel::SerialParam(progressbar = TRUE)
)
}
\arguments{
\item{regulon}{A dataframe informing the gene regulatory relationship with the \code{tf} column
representing transcription factors, \code{idxATAC} corresponding to the index in the peakMatrix and
\code{target} column representing target genes}

\item{expMatrix}{A SingleCellExperiment object or matrix containing gene expression with
genes in the rows and cells in the columns}

\item{peakMatrix}{A SingleCellExperiment object or matrix containing peak accessibility with
peaks in the rows and cells in the columns}

\item{exp_assay}{String indicating the name of the assay in expMatrix for gene expression}

\item{peak_assay}{String indicating the name of the assay in peakMatrix for chromatin accessibility}

\item{test}{String indicating whether \code{binom} or \code{chi.sq} test should be performed}

\item{clusters}{A vector corresponding to the cluster labels of the cells if
cluster-specific joint probabilities are also required. If left \code{NULL}, joint probabilities
are calculated for all cells}

\item{exp_cutoff}{A scalar indicating the minimum gene expression above which
gene is considered active. Default value is 1. Applied to both transcription
factors and target genes.}

\item{peak_cutoff}{A scalar indicating the minimum peak accessibility above which peak is
considered open. Default value is 0}

\item{regulon_cutoff}{A scalar indicating the maximal value for p-value for a tf-idxATAC-target trio
to be retained in the pruned regulon.}

\item{p_adj}{A logical indicating whether p adjustment should be performed}

\item{prune_value}{String indicating whether to filter regulon based on \code{pval} or \code{padj}.}

\item{aggregateCells}{A logical to indicate whether to aggregate cells into groups determined by cellNum. Use
option to overcome data sparsity if needed}

\item{useDim}{String indicating the name of the dimensionality reduction matrix in expMatrix used for cell aggregation}

\item{cellNum}{An integer specifying the number of cells per cluster for cell aggregation. Default is 10.}

\item{BPPARAM}{A BiocParallelParam object specifying whether calculation should be parallelized.
Default is set to BiocParallel::MulticoreParam()}
}
\value{
A DataFrame of pruned regulons with p-values indicating the probability of independence
either for all cells or for individual clusters, z-score statistics for binomial tests or chi-square statistics
for chi-square test and q-adjusted values.
}
\description{
Prune regulons for true transcription factor - regulatory elements - target genes relationships
}
\details{
The function prunes the network by performing tests of independence on the observed number of cells
jointly expressing transcription factor (TF), regulatory element (RE) and target gene (TG) vs
the expected number of cells if TF/RE and TG are independently expressed.

In other words, if no regulatory relationship exists, the expected probability of cells expressing all
three elements is P(TF, RE) * P(TG), that is, the product of (1) proportion of cells both expressing transcription factor
and having accessible corresponding regulatory element, and (2) proportion of cells expressing
target gene. The expected number of cells expressing all three elements is therefore n*P(TF, RE)*P(TG),
where n is the total number of cells. However, if a TF-RE-TG relationship exists,
we expect the observed number of cells jointly having all three elements (TF, RE, TG) to deviate from
the expected number of cells predicted from an independent relationship.

If the user provides cluster assignment, the tests of independence are performed on a per-cluster basis
in addition to providing all cells statistics. This enables pruning by cluster, and thus yields cluster-specific
gene regulatory relationships.

We implement two tests, the binomial test and the chi-square test.

In the binomial test, the expected probability is P(TF, RE) * P(TG), and the number of trials is the number of cells,
and the observed successes is the number of cells jointly expressing all three elements.

In the chi-square test, the expected probability for having all 3 elements active is also P(TF, RE) * P(TG) and
the probability otherwise is 1- P(TF, RE) * P(TG). The observed cell count for the active category is the number of cells
jointly expressing all three elements, and the cell count for the inactive category is n - n_triple.
}
\examples{
# create a mock SingleCellExperiment object for gene expMatrixession matrix
set.seed(1000)
gene_sce <- scuttle::mockSCE()
gene_sce <- scuttle::logNormCounts(gene_sce)
rownames(gene_sce) <- paste0('Gene_',1:2000)

# create a mock SingleCellExperiment object for peak matrix
peak_gr <- GRanges(seqnames = 'chr1',
                  ranges = IRanges(start = seq(from = 1, to = 10000, by = 100), width = 100))
peak_counts <- matrix(sample(x = 0:4, size = ncol(gene_sce)*length(peak_gr), replace = TRUE),
                     nrow = length(peak_gr), ncol=ncol(gene_sce))
peak_sce <- SingleCellExperiment(list(counts = peak_counts), colData = colData(gene_sce))
rownames(peak_sce) <- paste0('Peak_',1:100)

# create a mock regulon
regulon <- data.frame(tf = c(rep('Gene_1',10), rep('Gene_2',10)),
                     idxATAC = sample(1:100, 20),
                     target = c(paste0('Gene_', sample(3:2000,10)),
                                paste0('Gene_',sample(3:2000,10))))

# prune regulon
pruned.regulon <- pruneRegulon(expMatrix = gene_sce,
exp_assay = 'logcounts', peakMatrix = peak_sce, peak_assay = 'counts',
regulon = regulon, clusters = gene_sce$Treatment, regulon_cutoff = 0.5)

# add weights with cell aggregation
gene_sce <- scater::runPCA(gene_sce)
pruned.regulon <- pruneRegulon(expMatrix = gene_sce, exp_assay = 'logcounts',
peakMatrix = peak_sce, peak_assay = 'counts', regulon = regulon,
clusters = gene_sce$Treatment, regulon_cutoff = 0.5,
aggregateCells=TRUE, cellNum=3, useDim = 'PCA')

}
\author{
Xiaosai Yao, Tomasz Wlodarczyk
}
