QC measures for RRBS

As for almost all high throughput sequencing applications, some quality control on the data is recommended — it often points directly to the next steps that need to be taken (e.g. with FastQC). As outlined in A brief guide to RRBS, Directional libraries, and Non-directional & paired-end, thorough QC and taking appropriate steps to remove problems is absolutely critical for proper analysis of RRBS libraries since they are susceptible to a variety of errors or biases that one could probably get away with in other sequencing applications. In summary, the examples discussed here were:

poor qualities. Affect mapping, may lead to incorrect methylation calls and/or mis-mapping.
adapter contamination. May lead to low mapping efficiencies, or, if mapped, may result in incorrect methylation calls and/or mis-mapping.
positions filled in during end-repair will infer the methylation state of the cytosine used for the fill-in reaction but not of the true genomic cytosine.
paired-end RRBS libraries (especially with long read length) yield redundant methylation information if the read pairs overlap.
RRBS libraries with long read lengths suffer more from all of the above due to the short size-selected fragment size.

How Trim Galore addresses each issue

Issue	What `trim_galore` does	Flag(s)
Poor 3' qualities	BWA-style Phred trimming from the 3' end	`-q` (default 20); `--2colour N` for 2-colour instruments
Adapter contamination	Auto-detected adapter, semi-global alignment, 1 bp default stringency	(auto) or `--illumina` / `--nextera` / `--small_rna` / `--stranded_illumina` / `--bgiseq`
End-repair fill-in (R1, 3' end)	2 bp clip on adapter-trimmed reads	`--rrbs`
End-repair fill-in (R2, 5' end, directional)	Auto 2 bp 5' clip on R2	`--rrbs` (paired-end, directional) auto-sets `--clip_R2 2`
CTOT/CTOB start bias (non-directional)	Strip first 2 bp if read starts with CAA/CGA	`--non_directional`
Read pairs that became too short	Drop pair (or rescue singletons)	`--length`, `--retain_unpaired`, `--length_1`, `--length_2`
2-colour no-signal G-runs	Sequence-based poly-G trim	`--poly_g` (auto-detected on 2-colour data)

Exploiting the filled-in position to determine the bisulfite conversion efficiency

If the end-repair was performed using unmethylated cytosines, this position can theoretically be exploited as a built-in bisulfite conversion efficiency control. To assess the bisulfite conversion efficiency one needs to identify sequences that contain the adapter on their 3' end, and count up the number of times the filled in position (marked in RED) was not converted to T.

Bisulfite conversion efficiency: counting filled-in C vs T at the read-end fill-in position

The percentage of non-bisulfite converted cytosines at the fill-in position can be calculated as:

% non-conversion = CX / (TY + CX) * 100

(with X and Y being the number of times these residues were observed).

To make this calculation more reliable one can vary the required overlaps with the adapter sequence, e.g. requiring 3, 5, 7 bp etc., before the non-conversion rate of the filled-in position is determined.

Recommended downstream tooling

After trimming with Trim Galore, the standard RRBS workflow is:

Alignment with Bismark.
Methylation extraction with bismark_methylation_extractor.
Deduplication as appropriate (e.g. UmiBam for UMI libraries).
Reporting with MultiQC, which natively parses the Trim Galore text and JSON reports.