`getAMR` returns a `GRanges` object with all the aberrantly methylated regions (AMRs) for all samples in a data set.

getAMR(
  data.ranges,
  data.samples = NULL,
  ramr.method = "IQR",
  iqr.cutoff = 5,
  pval.cutoff = 0.05,
  qval.cutoff = NULL,
  merge.window = 300,
  min.cpgs = 7,
  min.width = 1,
  exclude.range = NULL,
  cores = max(1, parallel::detectCores() - 1),
  verbose = TRUE,
  ...
)

Arguments

data.ranges

A `GRanges` object with genomic locations and corresponding beta values included as metadata.

data.samples

A character vector with sample names (a subset of metadata column names). If `NULL` (the default), then all samples (metadata columns) are included in the analysis.

ramr.method

A character scalar: when ramr.method is "IQR" (the default), the filtering based on interquantile range is used (`iqr.cutoff` value is then used as a threshold). When "beta" or "wbeta" - filtering based on fitting non-weighted (`EnvStats::ebeta`) or weighted (`ExtDist::eBeta`) beta distributions, respectively, is used, and `pval.cutoff` or `qval.cutoff` (if not `NULL`) is used as a threshold. For "wbeta", weights directly correlate with bin contents (number of values per bin) and inversly - with the distances from the median value, thus narrowing the estimated distribution and emphasizing outliers.

iqr.cutoff

A single integer >= 1. Methylation beta values differing from the median value by more than `iqr.cutoff` interquartile ranges are considered to be significant (the default: 5).

pval.cutoff

A numeric scalar (the default: 5e-2). Bonferroni correction of `pval.cutoff` by the length of the `data.samples` object is used to calculate `qval.cutoff` if the latter is `NULL`.

qval.cutoff

A numeric scalar. Used as a threshold for filtering based on fitting non-weighted or weighted beta distributions: all p-values lower than `qval.cutoff` are considered to be significant. If `NULL` (the default), it is calculated using `pval.cutoff`

merge.window

A positive integer. All significant (survived the filtering stage) `data.ranges` genomic locations within this distance will be merged to create AMRs (the default: 300).

min.cpgs

A single integer >= 1. All AMRs containing less than `min.cpgs` significant genomic locations are filtered out (the default: 7).

min.width

A single integer >= 1 (the default). Only AMRs with the width of at least `min.width` are returned.

exclude.range

A numeric vector of length two. If not `NULL` (the default), all `data.ranges` genomic locations with their median methylation beta value within the `exclude.range` interval are filtered out.

cores

A single integer >= 1. Number of processes for parallel computation (the default: all but one cores). Results of parallel processing are fully reproducible when the same seed is used (thanks to doRNG).

verbose

boolean to report progress and timings (default: TRUE).

...

Further arguments to be passed to `EnvStats::ebeta` or `ExtDist::eBeta` functions.

Value

The output is a `GRanges` object that contains all the aberrantly methylated regions (AMRs) for all `data.samples` samples in `data.ranges` object. The following metadata columns may be present:

  • `revmap` -- integer list of significant CpGs (`data.ranges` genomic locations) that are included in this AMR region

  • `ncpg` -- number of significant CpGs within this AMR region

  • `sample` -- contains an identifier of a sample to which corresponding AMR belongs

  • `dbeta` -- average deviation of beta values for significant CpGs from their corresponding median values

  • `pval` -- geometric mean of p-values for significant CpGs

  • `xiqr` -- average IQR-normalised deviation of beta values for significant CpGs from their corresponding median values

Details

In the provided data set, `getAMR` compares methylation beta values of each sample with other samples to identify rare long-range methylation aberrations. For `ramr.method=="IQR"`: for every genomic location (CpG) in `data.ranges` the IQR-normalized deviation from the median value is calculated, and all CpGs with such normalized deviation not smaller than the `iqr.cutoff` are retained. For `ramr.method=="*beta"`: parameters of beta distribution are estimated by means of `EnvStats::ebeta` or `ExtDist::eBeta` functions, and then used to calculate the probability values, followed by the filtering when all CpGs with p-values not greater than `qval.cutoff` are retained. Another filtering is then performed to exclude all CpGs within `exclude.range`. Next, the retained (significant) CpGs are merged within the window of `merge.window`, and final filtering is applied to AMR genomic ranges (by `min.cpgs` and `min.width`).

See also

plotAMR for plotting AMRs, getUniverse for info on enrichment analysis, simulateAMR and simulateData for the generation of simulated test data sets, and `ramr` vignettes for the description of usage and sample data.

Examples

  data(ramr)
  getAMR(ramr.data, ramr.samples, ramr.method="beta",
         min.cpgs=5, merge.window=1000, qval.cutoff=1e-3, cores=2)
#> Identifying AMRs
#>  [7.610s]
#> GRanges object with 22 ranges and 5 metadata columns:
#>        seqnames          ranges strand |             revmap      ncpg
#>           <Rle>       <IRanges>  <Rle> |             <list> <integer>
#>    [1]     chr1 2443577-2453006      * | 2722,2723,2724,...        30
#>    [2]     chr1 1589891-1590941      * | 1459,1460,1461,...        10
#>    [3]     chr1 1589891-1590941      * | 1459,1460,1461,...        10
#>    [4]     chr1 1589891-1590941      * | 1459,1460,1461,...        10
#>    [5]     chr1   874697-877876      * |    165,166,167,...        13
#>    ...      ...             ...    ... .                ...       ...
#>   [18]     chr1 1709203-1715039      * | 1595,1596,1597,...        20
#>   [19]     chr1 1709203-1715039      * | 1595,1596,1597,...        20
#>   [20]     chr1   566172-569687      * |       17,18,19,...        15
#>   [21]     chr1 1138931-1146903      * |    726,727,728,...        27
#>   [22]     chr1 1160713-1165393      * |    789,790,791,...        15
#>             sample     dbeta         pval
#>        <character> <numeric>    <numeric>
#>    [1]    sample25 -0.484617  1.26772e-19
#>    [2]    sample33  0.489562  1.27388e-07
#>    [3]    sample34  0.503785  5.93348e-08
#>    [4]    sample35  0.512633  3.30858e-08
#>    [5]    sample44  0.451475  1.30458e-07
#>    ...         ...       ...          ...
#>   [18]    sample79  0.455621  2.80743e-23
#>   [19]    sample80  0.453449  3.24593e-23
#>   [20]    sample95  0.498337 4.60616e-171
#>   [21]    sample98  0.380622 3.95738e-232
#>   [22]    sample98 -0.514668  4.27297e-18
#>   -------
#>   seqinfo: 24 sequences from hg19 genome; no seqlengths