A brief guide to RRBS

What is RRBS?

Typically, RRBS samples are generated by digesting genomic DNA with the restriction endonuclease MspI. This is followed by end-repair, A-tailing, adapter ligation and finally bisulfite conversion. Often, the library is also size-selected for fragments between 40 and 220 bp in length. This fragment size has been shown to be plentiful in the sample and yield information on the vast majority of CpG islands (CGIs) in the human or mouse genome. Fig. 1 shows that quite a few MspI-MspI fragments (generated in silico for the mouse genome) are even shorter than 40 bp. Since the size selection process is not as good as it is in theory, often a sizeable number of fragments below 40 bp can end up in the RRBS library.

In silico MspI fragment-length distribution for the human genome

The fairly small fragment size of RRBS fragments can become a potential problem especially for sequencing reads with high read length (e.g. > 75 bp or > 100 bp). If the read length is longer than the MspI-MspI fragment itself, the sequencing read may continue to read into the adapter sequence on the 3' end:

Adapter readthrough: read continues past the second MspI site into the adapter

Such read-through adapter contamination may result in a lower mapping efficiency if the read does not align at all, or it may lead to false alignments which can result in incorrect methylation calls. As a simple rule, the longer the read length the higher the proportion of reads with adapter contamination. If such adapter contamination is not spotted and removed appropriately, a longer read length is most likely resulting in a lower mapping efficiency!

If the read length is longer than the MspI fragment one will also read (and perform a methylation call) for a cytosine that has been filled in with a predefined methylation state during the end-repair step. This is discussed further in Directional libraries and Non-directional & paired-end.

Single-end or paired-end?

It seems to be a common misconception, that paired-end reads yield methylation results for both the forward and the reverse strand. In reality, a paired-end read results from PCR amplification of either the original top strand (OT), or the original bottom strand (OB). Thus, the other ends that are sequenced in the second round are sequences from the strand complementary to OT (CTOT), or complementary to OB (CTOB). These complementary strands are also informative for the same strand as their partner reads. As a consequence, paired-end reads that overlap in the middle yield redundant methylation information for the same strand:

Paired-end overlap producing redundant methylation calls

Granted, the paired-end nature might result in a somewhat increased mapping efficiency of paired-end reads over single-end reads. However, in addition to reading into the adapter on the other side, paired-end reads face the additional problem of generating potentially redundant methylation information. Redundant methylation calls need to be discarded if positions are filtered for a certain coverage by independent reads, since regions of overlap for paired-end reads would be over-represented. In short, because of the redundant overlapping parts paired-end RRBS reads are not simply 'twice as many reads therefore twice as many methylation calls'. Single-end experiments with the same number of reads as both paired-end reads added together are more likely to yield more genuine methylation information, as long as the read length is long enough to allow for a fair mapping efficiency for single-end reads (40–50 bp reads are probably long enough to get mapping efficiencies in the range of 60 to 70%).

In a nutshell

RRBS reads suffer disproportionally from problems associated with long read lengths because they are, compared to other -Seq applications, size-selected for rather short fragment sizes. The following sections cover further aspects to consider when analysing RRBS samples, and how Trim Galore handles read-length-related problems and experimentally introduced biases.

Directional libraries. OT / OB strands only.
Non-directional & paired-end. All four bisulfite strands.
QC measures for RRBS. What to check, and how Trim Galore helps.

A brief guide to RRBS

What is RRBS?

Single-end or paired-end?

Other read length effects

In a nutshell