Variant Filtering

whiteList

whiteList allows to select and filter-in a subset of variants from input VCF file based on specified annotations and positions. The software can use provided VEP, ClinVar or SpliceAI annotations. Positions can be also specfied as a BED format file.

Arguments

    usage: granite whiteList [-h] -i INPUTFILE -o OUTPUTFILE [--SpliceAI SPLICEAI]
                             [--SpliceAItag SPLICEAITAG] [--CLINVAR]
                             [--CLINVARonly CLINVARONLY [CLINVARONLY ...]]
                             [--CLINVARtag CLINVARTAG] [--VEP] [--VEPtag VEPTAG]
                             [--VEPrescue VEPRESCUE [VEPRESCUE ...]]
                             [--VEPremove VEPREMOVE [VEPREMOVE ...]]
                             [--VEPsep VEPSEP] [--BEDfile BEDFILE]

    optional arguments:
      -i INPUTFILE, --inputfile INPUTFILE
                            input VCF file
      -o OUTPUTFILE, --outputfile OUTPUTFILE
                            output file to write results as VCF, use .vcf as
                            extension
      --SpliceAI SPLICEAI   threshold to whitelist variants by SpliceAI delta
                            scores value (>=)
      --SpliceAItag SPLICEAITAG
                            by default the program will search for SpliceAI delta
                            scores (DS_AG, DS_AL, DS_DG, DS_DL) to calculate the
                            max delta score for the variant. If a max value is
                            already defined, use this parameter to specify the TAG
                            | TAG field to be used
      --CLINVAR             flag to whitelist all variants with a ClinVar entry
                            [ALLELEID]
      --CLINVARonly CLINVARONLY [CLINVARONLY ...]
                            ClinVar "CLNSIG" terms or keywords to be saved. Sets
                            for whitelist only ClinVar variants with specified
                            terms or keywords
      --CLINVARtag CLINVARTAG
                            by default the program will search for ClinVar
                            "ALLELEID" TAG, use this parameter to specify a
                            different TAG to be used
      --VEP                 use VEP "Consequence" annotations to whitelist exonic
                            and relevant variants (removed by default variants in
                            intronic, intergenic, or regulatory regions)
      --VEPtag VEPTAG       by default the program will search for "CSQ" TAG
                            (CSQ=<values>), use this parameter to specify a
                            different TAG to be used (e.g. VEP)
      --VEPrescue VEPRESCUE [VEPRESCUE ...]
                            additional terms to overrule removed flags to rescue
                            and whitelist variants
      --VEPremove VEPREMOVE [VEPREMOVE ...]
                            additional terms to be removed
      --VEPsep VEPSEP       by default the program expects "&" as separator for
                            subfields in VEP (e.g.
                            intron_variant&splice_region_variant), use this
                            parameter to specify a different separator to be used
      --BEDfile BEDFILE     BED format file with positions to whitelist

Examples

Whitelists variants with ClinVar entry. If available, ClinVar annotation must be provided in INFO column.

granite whiteList -i file.vcf -o file.out.vcf --CLINVAR

Whitelists only “Pathogenic” and “Likely_pathogenic” variants with ClinVar entry. ClinVar “CLNSIG” annotation must be provided in INFO column.

granite whiteList -i file.vcf -o file.out.vcf --CLINVAR --CLINVARonly Pathogenic

Whitelists variants based on SpliceAI annotations. This filters in variants with SpliceAI score equal/higher than --SpliceAI. If available SpliceAI annotation must be provided in INFO column.

granite whiteList -i file.vcf -o file.out.vcf --SpliceAI <float>

Whitelists variants based on VEP “Consequence” annotations. This withelists exonic and functional relevant variants by removing variants flagged as “intron_variant”, “intergenic_variant”, “downstream_gene_variant”, “upstream_gene_variant”, “regulatory_region_”, “non_coding_transcript_”. It is possible to specify additional terms to remove using --VEPremove and terms to rescue using --VEPrescue. To use VEP, annotation must be provided for each variant in INFO column.

granite whiteList -i file.vcf -o file.out.vcf --VEP
granite whiteList -i file.vcf -o file.out.vcf --VEP --VEPremove <str> <str>
granite whiteList -i file.vcf -o file.out.vcf --VEP --VEPrescue <str> <str>
granite whiteList -i file.vcf -o file.out.vcf --VEP --VEPrescue <str> <str> --VEPremove <str>

Whitelists variants based on positions specified as a BED format file.

granite whiteList -i file.vcf -o file.out.vcf --BEDfile file.bed

Combine the above filters.

granite whiteList -i file.vcf -o file.out.vcf --BEDfile file.bed --VEP --VEPrescue <str> <str> --CLINVAR --SpliceAI <float>

blackList

blackList allows to filter-out variants from input VCF file based on positions set in BIG format file and/or provided population allele frequency.

Arguments

    usage: granite blackList [-h] -i INPUTFILE -o OUTPUTFILE [-b BIGFILE]
                             [--aftag AFTAG] [--afthr AFTHR]

    optional arguments:
      -i INPUTFILE, --inputfile INPUTFILE
                            input VCF file
      -o OUTPUTFILE, --outputfile OUTPUTFILE
                            output file to write results as VCF, use .vcf as
                            extension
      -b BIGFILE, --bigfile BIGFILE
                            BIG format file with positions set for blacklist
      --aftag AFTAG         TAG (TAG=<float>) or TAG field to be used to filter by
                            population allele frequency
      --afthr AFTHR         threshold to filter by population allele frequency
                            (<=) [1]

Examples

Blacklist variants based on position set to True in BIG format file.

granite blackList -i file.vcf -o file.out.vcf -b file.big

Blacklist variants based on population allele frequency. This filters out variants with allele frequency higher than --afthr. Allele frequency must be provided for each variant in INFO column.

granite blackList -i file.vcf -o file.out.vcf --afthr <float> --aftag tag

Combine the two filters.

granite blackList -i file.vcf -o file.out.vcf --afthr <float> --aftag tag -b file.big

cleanVCF

cleanVCF allows to clean INFO field of input VCF file. The software can remove a list of TAG from INFO field, or can be used to clean VEP annotations.

Arguments

    usage: granite cleanVCF [-h] -i INPUTFILE -o OUTPUTFILE [-t TAG] [--VEP]
                            [--VEPtag VEPTAG]
                            [--VEPrescue VEPRESCUE [VEPRESCUE ...]]
                            [--VEPremove VEPREMOVE [VEPREMOVE ...]]
                            [--VEPsep VEPSEP] [--SpliceAI SPLICEAI]
                            [--SpliceAItag SPLICEAITAG]

    optional arguments:
      -i INPUTFILE, --inputfile INPUTFILE
                            input VCF file
      -o OUTPUTFILE, --outputfile OUTPUTFILE
                            output file to write results as VCF, use .vcf as
                            extension
      -t TAG, --tag TAG     TAG to be removed from INFO field. Specify multiple
                            TAGs as: "-t TAG -t TAG -t ..."
      --VEP                 clean VEP "Consequence" annotations (removed by
                            default terms for intronic, intergenic, or regulatory
                            regions from annotations)
      --VEPtag VEPTAG       by default the program will search for "CSQ" TAG
                            (CSQ=<values>), use this parameter to specify a
                            different TAG to be used (e.g. VEP)
      --VEPrescue VEPRESCUE [VEPRESCUE ...]
                            additional terms to overrule removed flags to rescue
                            annotations
      --VEPremove VEPREMOVE [VEPREMOVE ...]
                            additional terms to be removed from annotations
      --VEPsep VEPSEP       by default the program expects "&" as separator for
                            subfields in VEP (e.g.
                            intron_variant&splice_region_variant), use this
                            parameter to specify a different separator to be used
      --SpliceAI SPLICEAI   threshold to save intronic annotations, from VEP
                            "Consequence", for variants by SpliceAI delta scores
                            value (>=)
      --SpliceAItag SPLICEAITAG
                            by default the program will search for SpliceAI delta
                            scores (DS_AG, DS_AL, DS_DG, DS_DL) to calculate the
                            max delta score for the variant. If a max value is
                            already defined, use this parameter to specify the TAG
                            | TAG field to be used

Examples

Remove tag from INFO field.

granite cleanVCF -i file.vcf -o file.out.vcf -t tag

Clean VEP based on VEP “Consequence” annotations. This removes annotations flagged as “intron_variant”, “intergenic_variant”, “downstream_gene_variant”, “upstream_gene_variant”, “regulatory_region_”, “non_coding_transcript_”. It is possible to specify additional terms to remove using --VEPremove and terms to rescue using --VEPrescue. VEP annotation must be provided for each variant in INFO column.

granite cleanVCF -i file.vcf -o file.out.vcf --VEP
granite cleanVCF -i file.vcf -o file.out.vcf --VEP --VEPremove <str> <str>
granite cleanVCF -i file.vcf -o file.out.vcf --VEP --VEPrescue <str> <str>
granite cleanVCF -i file.vcf -o file.out.vcf --VEP --VEPrescue <str> <str> --VEPremove <str>

The program also accepts a SpliceAI threshold that will rescue annotations for “intron_variant” by SpliceAI. SpliceAI annotation must be provided in INFO column.

granite cleanVCF -i file.vcf -o file.out.vcf --VEP --SpliceAI <float>

Combine the above filters.

granite cleanVCF -i file.vcf -o file.out.vcf -t tag --VEP --VEPrescue <str> <str> --SpliceAI <float>

geneList

geneList allows to filter VEP annotations from input VCF file using a list of genes. If a transcript is not mapping to any of the genes in the list, the transcript is removed from VEP annotation in INFO field. If all transcripts are removed, the VEP tag is removed from INFO field for the variant.

Arguments

    usage: granite geneList [-h] -i INPUTFILE -o OUTPUTFILE -g GENESLIST
                            [--VEPtag VEPTAG]

    optional arguments:
      -i INPUTFILE, --inputfile INPUTFILE
                            input VCF file
      -o OUTPUTFILE, --outputfile OUTPUTFILE
                            output file to write results as VCF, use .vcf as
                            extension
      -g GENESLIST, --geneslist GENESLIST
                            text file listing ensembl gene (ENSG) IDs for all
                            genes to save annotations for, IDs must be listed as a
                            column
      --VEPtag VEPTAG       by default the program will search for "CSQ" TAG
                            (CSQ=<values>), use this parameter to specify a
                            different TAG to be used (e.g. VEP)