Skip to content

Concordance across reads

Methylation is typically thought to be laid down in longer stretches rather than appearing sporadically at individual cytosines. As a consequence, several CpGs within a given read are often found either completely methylated or completely unmethylated. A mixed methylation pattern may be used to assess the concordance of the methylation signal, which could be a proxy of the fidelity of maintenance methylation etc.

The script methylation_consistency takes in a BAM file generated by Bismark and splits it into three smaller BAM files, based on the consistency/concordance of the methylation calls within the file. The reads are split into those which show consistent methylation, un-methylation and mixed methylation, and only methylation in CpG context is considered. In its default condition, reads are considered as:

unmethylated  0 -  10% methylated (inclusive)
mixed:       10 -  90% methylated
methylated:  90 - 100% methylated (inclusive)

Methylation Concordance Plot

A consistency report looks roughly like this:

Total paired-end records     -  478498
-------------------------------------------------
All methylated    [ >= 90% ] -  10304 (2.15%)
All unmethylated  [ <= 10% ] -  31806 (6.65%)
Mixed methylation [ 10-90% ] -  30176 (6.31%)
Too few CpGs   [min-count 5] -  406212 (84.89%)

By default, only reads with 5 or more CpGs are taken into consideration (this can be changed via the flag --min-count). The thresholds for un-methylated, mixed methylated, and methylated can be set with --lower_threshold [percentage] and --upper_threshold [percentage]. methylation_consistency works for both single-end and paired-end Bismark BAM files, which are auto-detected by default. For paired-end files, all CpGs of both R1 and R2 are taken into account. Overlap detection/exclusion can take place afterwards at the methylation extraction stage, if desired.