`getAMR` returns a `GRanges` object with all the aberrantly methylated regions (AMRs) for all samples in a data set.
getAMR(
data.ranges,
data.samples = NULL,
ramr.method = "IQR",
iqr.cutoff = 5,
pval.cutoff = 0.05,
qval.cutoff = NULL,
merge.window = 300,
min.cpgs = 7,
min.width = 1,
exclude.range = NULL,
cores = max(1, parallel::detectCores() - 1),
verbose = TRUE,
...
)
A `GRanges` object with genomic locations and corresponding beta values included as metadata.
A character vector with sample names (a subset of metadata column names). If `NULL` (the default), then all samples (metadata columns) are included in the analysis.
A character scalar: when ramr.method is "IQR" (the default), the filtering based on interquantile range is used (`iqr.cutoff` value is then used as a threshold). When "beta" or "wbeta" - filtering based on fitting non-weighted (`EnvStats::ebeta`) or weighted (`ExtDist::eBeta`) beta distributions, respectively, is used, and `pval.cutoff` or `qval.cutoff` (if not `NULL`) is used as a threshold. For "wbeta", weights directly correlate with bin contents (number of values per bin) and inversly - with the distances from the median value, thus narrowing the estimated distribution and emphasizing outliers.
A single integer >= 1. Methylation beta values differing from the median value by more than `iqr.cutoff` interquartile ranges are considered to be significant (the default: 5).
A numeric scalar (the default: 5e-2). Bonferroni correction of `pval.cutoff` by the length of the `data.samples` object is used to calculate `qval.cutoff` if the latter is `NULL`.
A numeric scalar. Used as a threshold for filtering based on fitting non-weighted or weighted beta distributions: all p-values lower than `qval.cutoff` are considered to be significant. If `NULL` (the default), it is calculated using `pval.cutoff`
A positive integer. All significant (survived the filtering stage) `data.ranges` genomic locations within this distance will be merged to create AMRs (the default: 300).
A single integer >= 1. All AMRs containing less than `min.cpgs` significant genomic locations are filtered out (the default: 7).
A single integer >= 1 (the default). Only AMRs with the width of at least `min.width` are returned.
A numeric vector of length two. If not `NULL` (the default), all `data.ranges` genomic locations with their median methylation beta value within the `exclude.range` interval are filtered out.
A single integer >= 1. Number of processes for parallel computation (the default: all but one cores). Results of parallel processing are fully reproducible when the same seed is used (thanks to doRNG).
boolean to report progress and timings (default: TRUE).
Further arguments to be passed to `EnvStats::ebeta` or `ExtDist::eBeta` functions.
The output is a `GRanges` object that contains all the aberrantly methylated regions (AMRs) for all `data.samples` samples in `data.ranges` object. The following metadata columns may be present:
`revmap` -- integer list of significant CpGs (`data.ranges` genomic locations) that are included in this AMR region
`ncpg` -- number of significant CpGs within this AMR region
`sample` -- contains an identifier of a sample to which corresponding AMR belongs
`dbeta` -- average deviation of beta values for significant CpGs from their corresponding median values
`pval` -- geometric mean of p-values for significant CpGs
`xiqr` -- average IQR-normalised deviation of beta values for significant CpGs from their corresponding median values
In the provided data set, `getAMR` compares methylation beta values of each sample with other samples to identify rare long-range methylation aberrations. For `ramr.method=="IQR"`: for every genomic location (CpG) in `data.ranges` the IQR-normalized deviation from the median value is calculated, and all CpGs with such normalized deviation not smaller than the `iqr.cutoff` are retained. For `ramr.method=="*beta"`: parameters of beta distribution are estimated by means of `EnvStats::ebeta` or `ExtDist::eBeta` functions, and then used to calculate the probability values, followed by the filtering when all CpGs with p-values not greater than `qval.cutoff` are retained. Another filtering is then performed to exclude all CpGs within `exclude.range`. Next, the retained (significant) CpGs are merged within the window of `merge.window`, and final filtering is applied to AMR genomic ranges (by `min.cpgs` and `min.width`).
plotAMR
for plotting AMRs, getUniverse
for info on enrichment analysis, simulateAMR
and
simulateData
for the generation of simulated test data sets,
and `ramr` vignettes for the description of usage and sample data.
data(ramr)
getAMR(ramr.data, ramr.samples, ramr.method="beta",
min.cpgs=5, merge.window=1000, qval.cutoff=1e-3, cores=2)
#> Identifying AMRs
#> [7.610s]
#> GRanges object with 22 ranges and 5 metadata columns:
#> seqnames ranges strand | revmap ncpg
#> <Rle> <IRanges> <Rle> | <list> <integer>
#> [1] chr1 2443577-2453006 * | 2722,2723,2724,... 30
#> [2] chr1 1589891-1590941 * | 1459,1460,1461,... 10
#> [3] chr1 1589891-1590941 * | 1459,1460,1461,... 10
#> [4] chr1 1589891-1590941 * | 1459,1460,1461,... 10
#> [5] chr1 874697-877876 * | 165,166,167,... 13
#> ... ... ... ... . ... ...
#> [18] chr1 1709203-1715039 * | 1595,1596,1597,... 20
#> [19] chr1 1709203-1715039 * | 1595,1596,1597,... 20
#> [20] chr1 566172-569687 * | 17,18,19,... 15
#> [21] chr1 1138931-1146903 * | 726,727,728,... 27
#> [22] chr1 1160713-1165393 * | 789,790,791,... 15
#> sample dbeta pval
#> <character> <numeric> <numeric>
#> [1] sample25 -0.484617 1.26772e-19
#> [2] sample33 0.489562 1.27388e-07
#> [3] sample34 0.503785 5.93348e-08
#> [4] sample35 0.512633 3.30858e-08
#> [5] sample44 0.451475 1.30458e-07
#> ... ... ... ...
#> [18] sample79 0.455621 2.80743e-23
#> [19] sample80 0.453449 3.24593e-23
#> [20] sample95 0.498337 4.60616e-171
#> [21] sample98 0.380622 3.95738e-232
#> [22] sample98 -0.514668 4.27297e-18
#> -------
#> seqinfo: 24 sequences from hg19 genome; no seqlengths