Utilities

mpileupCounts

mpileupCounts uses samtools to access input BAM and calculates statistics for reads pileup at each position in the specified region, returns counts in RCK format.

Arguments

    usage: granite mpileupCounts [-h] -i INPUTFILE -o OUTPUTFILE -r REFERENCE
                                 [--region REGION] [--MQthr MQTHR] [--BQthr BQTHR]

    optional arguments:
      -i INPUTFILE, --inputfile INPUTFILE
                            input file in BAM format
      -o OUTPUTFILE, --outputfile OUTPUTFILE
                            output file to write results as RCK format (TSV), use
                            .rck as extension
      -r REFERENCE, --reference REFERENCE
                            reference file in FASTA format
      --region REGION       region to be analyzed [e.g. chr1:1-10000000,
                            1:1-10000000, chr1, 1], chromosome name must match the
                            reference
      --MQthr MQTHR         minimum mapping quality for an alignment to be used
                            (>=) [0]
      --BQthr BQTHR         minimum base quality for a base to be considered (>=)
                            [13]

toBig

toBig converts counts from bgzip and tabix indexed RCK format into BIG format. Positions are “called” by read counts or allelic balance for single or multiple files (joint calls) in specified regions. Positions “called” are set to True (or 1) in BIG binary structure.

Arguments

    usage: granite toBig [-h] [-i INPUTFILE [INPUTFILE ...]] -o OUTPUTFILE -r
                         REGIONFILE -f CHROMFILE [--ncores NCORES] --fithr FITHR
                         [--rdthr RDTHR] [--abthr ABTHR]

    optional arguments:
      -f FILE, --file FILE  file to be used to call positions. To do joint calling
                            specify multiple files as: "-f file_1 -f file_2 -f ...".
                            Expected bgzip and tabix indexed RCK file
      -o OUTPUTFILE, --outputfile OUTPUTFILE
                            output file to write results as BIG format (binary
                            hdf5), use .big as extension
      -r REGIONFILE, --regionfile REGIONFILE
                            file containing regions to be used [e.g.
                            chr1:1-10000000, 1:1-10000000, chr1, 1] listed as a
                            column, chromosomes names must match the reference
      -c CHROMFILE, --chromfile CHROMFILE
                            chrom.sizes file containing chromosomes size
                            information
      --ncores NCORES       number of cores to be used if multiple regions are
                            specified [1]
      --fithr FITHR         minimum number of files with at least "--rdthr" for
                            the alternate allele or having the variant, "calls" by
                            allelic balance, to jointly "call" position (>=)
      --rdthr RDTHR         minimum number of alternate reads to count the file in
                            "--fithr", if not specified "calls" are made by
                            allelic balance (>=)
      --abthr ABTHR         minimum percentage of alternate reads compared to
                            reference reads to count the file in "--fithr" when
                            "calling" by allelic balance (>=) [15]

Examples

toBig can be used to calculate positions to blacklist for common variants by using unrelated samples. This command will set to True in BIG structure positions with allelic balance for alternate allele equal/higher than --abthr in more that --fithr samples (joint calling).

granite toBig -f file -f file -f file -f file -f ... -o file.out.big -c file.chrom.sizes -r file.regions --fithr <int> --abthr <int>

Absolute reads count can be used instead of allelic balance to call positions. This command will set to True in BIG structure positions with reads count for alternate allele equal/higher than --rdthr in more that --fithr samples (joint calling).

granite toBig -f file -f file -f file -f file -f ... -o file.out.big -c file.chrom.sizes -r file.regions --fithr <int> --rdthr <int>

rckTar

rckTar creates a tar archive from bgzip and tabix indexed RCK files. Creates an index file for the archive.

Arguments

    usage: granite rckTar [-h] -t TTAR -f FILE

    optional arguments:
      -t TTAR, --ttar TTAR  target tar to write results, use .rck.tar as extension
      -f FILE, --file FILE  file to be archived. Specify multiple files as: "-f
                            SampleID_1.rck.gz -f SampleID_2.rck.gz -f ...". Files
                            order is maintained while creating the index