SNPsplit

Full list of options for SNPsplit

Note

USAGE: SNPsplit [options] --snp_file <SNP.file.gz> [input file(s)]

Input file(s)

Mapping output file in SAM or BAM format. SAM files (ending in .sam) will first be converted to BAM files.

--snp_file

Mandatory file specifying SNP positions to be considered, may be a plain text file of gzip compressed. Currently, the SNP file is expected to be in the following format:

   SNP-ID     Chromosome  Position    Strand   Ref/SNP
 33941939           9             68878541       1           T/G

Only the information contained in fields 'Chromosome', 'Position' and 'Ref/SNP base' are being used for analysis. The genome referred to as 'Ref' will be used as genome 1, the genome containing the 'SNP' base as genome 2.

--single_end

Manually sets data to single-end. Skips AUTO-DETECT

--paired

Paired-end mode. (Default: AUTO-DETECT)

-o/--outdir <dir>

Write all output files into this directory. By default the output files will be written into the same folder as the input file(s). If the specified folder does not exist, SNPsplit will attempt to create it first. The path to the output folder can be either relative or absolute.

--singletons

If the allele-tagged paired-end file also contains singleton alignments (which is the default for e.g. TopHat), these will be written out to extra files (ending in _st.bam) instead of writing everything to combined paired-end and singleton files. Default: OFF.

--no_sort

This option skips the sorting step if BAM files are already sorted by read name (e.g. Hi-C files generated by HiCUP). Please note that setting --no_sort for unsorted paired-end files will break the tagging process!

--hic

Assumes Hi-C data processed with HiCUP as input, i.e. the input BAM file is paired-end and Reads 1 and 2 follow each other. Thus, this option also sets the flags --paired and --no_sort. Default: OFF.

--bisulfite

Assumes Bisulfite-Seq data processed with Bismark as input. In paired-end mode (--paired), Read 1 and Read 2 of a pair are expected to follow each other in consecutive lines. SNPsplit will run a quick check at the start of a run to see if the provided file appears to be a Bismark file, and set the flags --bisulfite and/or --paired automatically. In addition it will perform a quick check to see if a paired-end file appears to have been positionally sorted, and if not will set the flag --no_sort.

--samtools-path

The path to your Samtools installation, e.g. /home/user/samtools/. Does not need to be specified explicitly if Samtools is in the PATH environment already.

SNPsplit-sort specific options (tag2sort):

--sam

The output will be written out in SAM format instead of BAM (default). SNPsplit will attempt to use the path to Samtools that was specified with --samtools_path, or, if it hasn't been specified, attempt to find Samtools in the PATH environment. If no installation of Samtools can be found, the SAM output will be compressed with GZIP instead (yielding a .sam.gz output file).

--skip_tag2sort

Carry out the allele-tagging process, and exit afterwards. This might be desirable when using SNPsplit in pipelining systems, such as Nextflow, when a deduplication step is to be added following allele tagging.

--conflicting/--weird

Reads or read pairs that were classified as 'Conflicting' (XX:Z:CF) will be written to an extra file (ending in .conflicting.bam) instead of being simply skipped. Reads may be classified as 'Conflicting' if a single read contains SNP information for both genomes at the same time, or if the SNP position was deleted from the read. Read-pairs are considered 'Conflicting' if either read is was tagged with the XX:Z:CF flag. Default: OFF.

--help

Displays this help information and exits

--verbose

(Very!) verbose output (for debugging)

--version

Displays version information and exits