A comprehensive guide to using the ramr package for detection of rare aberrantly methylated regions (epimutations).
is an R package for detection of low-frequency
aberrant methylation events (epimutations) in large data sets obtained
by methylation profiling using array or high-throughput methylation
sequencing. In addition, package provides functions to visualize found
aberrantly methylated regions (AMRs), to generate sets of all possible
regions to be used as reference sets for enrichment analysis, and to
generate biologically relevant test data sets for performance evaluation
of AMR/DMR search algorithms.
Current Features
- Identification of aberrantly methylated regions (AMRs, i.e., epimutations)
- AMR visualization
- Generation of reference sets for third-party analyses (e.g., enrichment)
- Generation of test data sets for performance evaluation of algorithms for search of differentially (DMR) or aberrantly (AMR) methylated regions
Major improvements
v1.16 [BioC 3.21]
- Major rewrite of
functions, which are now much faster (C/C++, OpenMP threads) and more robust (correctly deal with methylation sequencing data that often contains 0 and 1 values). - Old functions
as they were described in theramr
paper are now obsolete, but kept under different names (getAMR.obsolete
, respectively) for consistency. - Cleaner and more robust AMR plotting.
Reading data
methods operate on objects of the class
. The input object for AMR search must in addition
contain metadata columns with sample beta values. A typical input object
looks like this:
GRanges object with 383788 ranges and 845 metadata columns:
seqnames ranges strand | GSM1235534 GSM1235535 GSM1235536 ...
<Rle> <IRanges> <Rle> | <numeric> <numeric> <numeric> ...
cg13869341 chr1 15865 * | 0.801634776091808 0.846486905008704 0.86732154737116 ...
cg24669183 chr1 534242 * | 0.834138820071765 0.861974610731835 0.832557979806823 ...
cg15560884 chr1 710097 * | 0.711275180750356 0.70461945838556 0.699487225634589 ...
cg01014490 chr1 714177 * | 0.0769098196182058 0.0569443780518647 0.0623154673389864 ...
cg17505339 chr1 720865 * | 0.876413362222415 0.885593263385521 0.877944732153869 ...
... ... ... ... . ... ... ... ...
cg05615487 chr22 51176407 * | 0.84904178467798 0.836538383875097 0.81568519870099 ...
cg22122449 chr22 51176711 * | 0.882444486059592 0.870804215405886 0.859269224277308 ...
cg08423507 chr22 51177982 * | 0.886406345093286 0.882430879852752 0.887241923657461 ...
cg19565306 chr22 51222011 * | 0.0719084295670266 0.0845209871264646 0.0689074604483659 ...
cg09226288 chr22 51225561 * | 0.724145303755024 0.696281176451351 0.711459675603635 ...
package is supplied with a sample data, which was
simulated using GSE51032 data set as described in the ramr
reference paper. Sample data set ramr.data
contains beta
values for 10000 CpGs and 100 samples (ramr.samples
), and
carries 6 unique (ramr.tp.unique
) and 15 non-unique
) true positive AMRs containing at least
10 CpGs with their beta values increased/decreased by 0.5
#> $`chr1:2269871-2271665`
plotAMR(data.ranges=ramr.data, amr.ranges=ramr.tp.nonunique[c(1,6,11)])
#> Plotting 1 genomic ranges
#> 100%
#> [0.107s]
#> $`chr1:874697-877876`
The input (or template) object may be obtained using data from various sources. Here we provide two examples:
Using data from NCBI GEO
The following code pulls (NB: very large) raw files from NCBI GEO
database, performes normalization and creates GRanges
object for further analysis using ramr
requirements: 22GB of disk space, 64GB of RAM)
# destination for temporary files
dest.dir <- tempdir()
# downloading and unpacking raw IDAT files
suppl.files <- getGEOSuppFiles("GSE51032", baseDir=dest.dir, makeDirectory=FALSE, filter_regex="RAW")
untar(rownames(suppl.files), exdir=dest.dir, verbose=TRUE)
idat.files <- list.files(dest.dir, pattern="idat.gz$", full.names=TRUE)
sapply(idat.files, gunzip, overwrite=TRUE)
# reading IDAT files
geo.idat <- read.metharray.exp(dest.dir)
colnames(geo.idat) <- gsub("(GSM\\d+).*", "\\1", colnames(geo.idat))
# processing raw data
genomic.ratio.set <- preprocessQuantile(geo.idat, mergeManifest=TRUE, fixOutliers=TRUE)
# creating the GRanges object with beta values
data.ranges <- granges(genomic.ratio.set)
data.betas <- getBeta(genomic.ratio.set)
sample.ids <- colnames(geo.idat)
mcols(data.ranges) <- data.betas
# data.ranges and sample.ids objects are now ready for AMR search using ramr
Using Bismark cytosine report files
# file.list is a user-defined character vector with full file names of Bismark cytosine report files
# sample.ids is a user-defined character vector holding sample names
# methylation context string, defines if the reads covering both strands will be merged
context <- "CpG"
# fitting beta distribution (filtering using ramr.method "beta" or "wbeta") requires
# that most of the beta values are not equal to 0 or 1
min.beta <- 0.001
max.beta <- 0.999
# reading and uniting methylation values
meth.data.raw <- methRead(as.list(file.list), as.list(sample.ids), assembly="hg19", header=TRUE,
context=context, resolution="base", treatment=rep(0,length(sample.ids)),
meth.data.utd <- unite(meth.data.raw, destrand=isTRUE(context=="CpG"))
# creating the GRanges object with beta values
data.ranges <- GRanges(meth.data.utd)
data.betas <- percMethylation(meth.data.utd)/100
data.betas[data.betas<min.beta] <- min.beta
data.betas[data.betas>max.beta] <- max.beta
mcols(data.ranges) <- data.betas
# data.ranges and sample.ids objects are now ready for AMR search using ramr
Simulating data
provides methods to create sets of random AMRs and
to generate biologically relevant methylation beta values using real
data sets as templates. The following code provides an example, however
it is recommended to use a real experimental data (e.g. GSE51032) to
create a test data set for assessing the performance of
or other AMR/DMR search engines. The results of data
simulation are fully reproducible when the same seed has been set (at a
cost of serial random number generation).
# set the seed if reproducible results required
# unique random AMRs
amrs.unique <-
simulateAMR(template.ranges=ramr.data, nsamples=25, regions.per.sample=2,
min.cpgs=5, merge.window=1000, dbeta=0.2)
# non-unique AMRs outside of regions with unique AMRs
amrs.nonunique <-
simulateAMR(template.ranges=ramr.data, nsamples=4, exclude.ranges=amrs.unique,
sample.names=sprintf("sample%02i", c(2, 4, 8, 16)),
samples.per.region=2, min.cpgs=5, merge.window=1000)
# random noise outside of AMR regions
noise <-
simulateAMR(ramr.data, nsamples=25, regions.per.sample=20,
exclude.ranges=c(amrs.unique, amrs.nonunique),
min.cpgs=1, max.cpgs=1, merge.window=1, dbeta=0.5)
# "smooth" methylation data without AMRs (negative control)
smooth.data <-
simulateData(template.ranges=ramr.data, nsamples=25)
#> `geom_line()`: Each group consists of only one observation.
#> ℹ Do you need to adjust the group aesthetic?
# can we find them?
found <- getAMR(
compute="beta+binom", compute.estimate="amle", compute.weights="logInvDist",
combine.min.cpgs=5, combine.threshold=1e-2, combine.window=1000
AMR identification
This code shows how to do basic analysis with ramr
provided data files:
# identify AMRs
amrs <- getAMR(
data.ranges=ramr.data, data.samples=ramr.samples, compute="beta+binom",
combine.min.cpgs=5, combine.threshold=1e-2, combine.window=1000
The results of parallel AMR search are fully reproducible (do not depend on the seed).
AMR annotation and enrichment analysis
If necessary, AMRs can be annotated to known genomic elements using R
library annotatr
1 or tested for potential enrichment in
epigenetic or other marks using R library LOLA
# annotating AMRs using R library annotatr
annotation.types <- c("hg19_cpg_inter", "hg19_cpg_islands", "hg19_cpg_shores",
"hg19_cpg_shelves", "hg19_genes_intergenic", "hg19_genes_promoters",
"hg19_genes_5UTRs", "hg19_genes_firstexons", "hg19_genes_3UTRs")
annotations <- build_annotations(genome='hg19', annotations=annotation.types)
Other information
Citing the ramr
Oleksii Nikolaienko, Per Eystein Lønning, Stian Knappskog, ramr: an R/Bioconductor package for detection of rare aberrantly methylated regions, Bioinformatics, 2021;, btab586, https://doi.org/10.1093/bioinformatics/btab586
The data underlying ramr
Replication Data for: “ramr: an R package for detection of rare aberrantly methylated regions, https://doi.org/10.18710/ED8HSD
Session Info
