% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/filter_mut.R
\name{filter_mut}
\alias{filter_mut}
\title{Filter your mutation data}
\usage{
filter_mut(
  mutation_data,
  vaf_cutoff = 1,
  snv_in_germ_mnv = FALSE,
  rm_abnormal_vaf = FALSE,
  custom_filter_col = NULL,
  custom_filter_val = NULL,
  custom_filter_rm = FALSE,
  regions = NULL,
  regions_filter,
  allow_half_overlap = FALSE,
  rg_sep = "\\t",
  is_0_based_rg = TRUE,
  rm_filtered_mut_from_depth = FALSE,
  return_filtered_rows = FALSE
)
}
\arguments{
\item{mutation_data}{Your mutation data. This can be a data frame or a
GRanges object.}

\item{vaf_cutoff}{Filter out ostensibly germline variants using a cutoff for
variant allele fraction (VAF). Any variant with a \code{vaf} larger than
the cutoff will be filtered. The default is 1 (no filtering). It is
recommended to use a value of 0.01 (i.e. 1\%) as a conservative approach
to retain only somatic variants.}

\item{snv_in_germ_mnv}{Filter out snv variants that overlap with
germline mnv variants within the same samples IF they show the same
variation at the same position. mnv variants will be
considered germline if their vaf > vaf_cutoff. Default is FALSE.
Ex. Position 101-103 MNV is CAG > TGG. SNV at position 101 C>T will
be filtered out but SNV at position 101 C>A will not be filtered out.
Helps identify sequencing artifacts generated by N-calls in MNVs.}

\item{rm_abnormal_vaf}{A logical value. If TRUE, rows in
\code{mutation_data} with a variant allele fraction (VAF) between 0.05 and
0.45 or between 0.55 and 0.95 will be removed. We expect variants to have a
VAF ~0. 0.5, or 1, reflecting rare somatic mutations, heterozygous germline
mutations, and homozygous germline mutations, respectively. Default is
FALSE.}

\item{custom_filter_col}{The name of the column in mutation_data to apply a
custom filter to. This column will be checked for specific values, as
defined by \code{custom_filter_val}. If any row in this column contains
one of the specified values, that row will either be flagged in the
\code{filter_mut column} or, if specified by \code{custom_filter_rm},
removed from mutation_data.}

\item{custom_filter_val}{A set of values used to filter rows in
\code{mutation_data} based on \code{custom_filter_col}. If a row in
\code{custom_filter_col} matches any value in \code{custom_filter_val},
it will either be set to TRUE in the \code{filter_mut} column or removed,
depending on \code{custom_filter_rm}.}

\item{custom_filter_rm}{A logical value. If TRUE, rows in custom_filter_col
that match any value in custom_filter_val will be removed from the
mutation_data. If FALSE, \code{filter_mut} will be set to TRUE for those
rows.}

\item{regions}{Remove rows that are within/outside of specified regions.
\code{regions} can be either a file path, a data frame, or a GRanges object
containing the genomic ranges by which to filter. File paths will be read
using the rg_sep. Users can also choose from the built-in TwinStrand's
Mutagenesis Panels by inputting "TSpanel_human",  "TSpanel_mouse", or
"TSpanel_rat". Required columns for the regions file are "contig", "start",
and "end". In a GRanges object, the required columns are "seqnames",
"start", and "end".}

\item{regions_filter}{Specifies how the provided \code{regions} should be
applied to \code{mutation_data}. Acceptable values are "remove_within" or
"keep_within". If set to "remove_within", records that fall within the
specified regions wil be removed from mutation_data. If set to
"keep_within", only records within the specified regions will be kept in
mutation_data, and all other records will be removed.}

\item{allow_half_overlap}{A logical value. If TRUE, records that start or
end in your \code{regions}, but extend outside of them in either direction
will be included in the filter. If FALSE, only records that start and end
within the \code{regions} will be included in the filter. Default is FALSE.}

\item{rg_sep}{The delimiter for importing the custom_regions. The default is
tab-delimited "\\t".}

\item{is_0_based_rg}{A logical variable. Indicates whether the position
coordinates in \code{regions} are 0 based (TRUE) or 1 based (FALSE).
If TRUE, positions will be converted to 1-based (start + 1).
Need not be supplied for TSpanels. Default is TRUE.}

\item{rm_filtered_mut_from_depth}{A logical value. If TRUE, the function
will subtract the \code{alt_depth} of records that were flagged by the
\code{filter_mut} column from their \code{total_depth}. This will treat
flagged variants as No-calls. This will not apply to variants flagged as
germline by the \code{vaf_cutoff}. However, if the germline variant
has additional filters applied, then the subtraction will still occur.
If FALSE, the \code{alt_depth} will be retained in the
\code{total_depth} for all variants.  Default is FALSE.}

\item{return_filtered_rows}{A logical value. If TRUE, the function will
return both the filtered mutation data and the records that were
removed/flagged in a seperate data frame. The two dataframes will be
returned inside a list, with names \code{mutation_data} and
\code{filtered_rows}. Default is FALSE.}
}
\value{
A data frame or a list of two data frames, depending on the
value of \code{return_filtered_rows}. If \code{return_filtered_rows} is
FALSE (default), a data frame of the same structure as \code{mutation_data}
is returned, with an additional column, \code{filter_mut}, indicating
whether each record has been flagged for filtering (TRUE) or not (FALSE).
If \code{return_filtered_rows} is TRUE, a list containing two data frames
is returned. The first data frame, named \code{mutation_data}, is the
filtered mutation data as described above. The second data frame,
named \code{filtered_rows}, contains all records that were either
removed from \code{mutation_data} or flagged with \code{filter_mut == TRUE}.
}
\description{
This function creates a \verb{filter_mut`` column that will be read by the \code{calculate_mf} function and other downstream functions. Variants with }filter_mut == TRUE`` will be excluded from group mutation
counts. This function may also remove records upon on user specification.
Running this function again on the same data will not overide the previous
filters. To reset previous filters, set the filter_mut column values to
FALSE.
}
\examples{
# Mutation data is just for example purposes. It does not reflect real data.
mutation_data <- readRDS(system.file("extdata", "Example_files",
                                     "simple_mutation_data.rds",
                                     package = "MutSeqR"))
  # In this example, we will apply the following filters:
  # 1) Filter out putative germline variants using a VAF cutoff of 0.01
  # 2) Flag snv variants that overlap with germline mnv variants and
  # 3) Subtract the alt_depth of these variants from their total_depth
  #    (treat them as No-calls).
  filter_example <- filter_mut(
    mutation_data = mutation_data,
    vaf_cutoff = 0.01,
    snv_in_germ_mnv = TRUE,
    rm_filtered_mut_from_depth = TRUE,
    return_filtered_rows = FALSE
  )
 # Flagging germline mutations...
 # Found 15 germline mutations.
 # Flagging SNVs overlapping with germline MNVs...
 # Found 1 SNVs overlapping with germline MNVs.
 # Removing filtered mutations from the total_depth...
 # Filtering complete.
}
