Inheritance Mode Callers¶
novoCaller¶
novoCaller is a Bayesian calling algorithm for de novo mutations. The model uses read-level information both in pedigree (trio) and unrelated samples to rank and assign a probabilty to each call. The software represents an updated and improved implementation of the original algorithm described in Mohanty et al. 2019.
Arguments¶
usage: granite novoCaller [-h] -i INPUTFILE -o OUTPUTFILE -u UNRELATEDFILES -t
TRIOFILES [--ppthr PPTHR] [--afthr AFTHR]
[--aftag AFTAG] [--bam] [--MQthr MQTHR]
[--BQthr BQTHR] [--ADthr ADTHR]
optional arguments:
-i INPUTFILE, --inputfile INPUTFILE
input VCF file
-o OUTPUTFILE, --outputfile OUTPUTFILE
output file to write results as VCF, use .vcf as
extension
-u UNRELATEDFILES, --unrelatedfiles UNRELATEDFILES
TSV index file containing SampleID<TAB>Path/to/file
for unrelated files used to train the model (BAM or
bgzip and tabix indexed RCK)
-t TRIOFILES, --triofiles TRIOFILES
TSV index file containing SampleID<TAB>Path/to/file
for family files, the PROBAND must be listed as FIRST
(BAM or bgzip and tabix indexed RCK)
--ppthr PPTHR threshold to filter by posterior probabilty for de
novo calls (>=) [0]
--afthr AFTHR threshold to filter by population allele frequency
(<=) [1]
--aftag AFTAG TAG (TAG=<float>) or TAG field to be used to filter by
population allele frequency
--bam by default the program expect bgzip and tabix indexed
RCK files for "--triofiles" and "--unrelatedfiles",
add this flag if files are in BAM format instead
(SLOWER)
--MQthr MQTHR (only with "--bam") minimum mapping quality for an
alignment to be used (>=) [0]
--BQthr BQTHR (only with "--bam") minimum base quality for a base to
be considered (>=) [0]
--ADthr ADTHR threshold to filter by alternate allele depth in
parents. This will ignore and set to "0" the posterior
probability for variants with a number of alternate
reads in parents higher than specified value
Input¶
novoCaller accepts files in VCF format as input. Files must contain genotype information for trio in addition to standard VCF columns. Column IDs for trio must match the sample IDs provided together with the list of RCK/BAM files (--triofiles
).
Required VCF format structure:
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT PROBAND_ID MOTHER_ID FATHER_ID ...
Output¶
novoCaller generates output in VCF format. Two new tags are used to report additional information for each call. RSTR stores reads counts by strand at position for reference and alternate alleles. novoPP stores posterior probabilty calculated for the call. Variants are sorted by posterior probability in desceding order.
RSTR tag definition (FORMAT):
##FORMAT=<ID=RSTR,Number=4,Type=Integer,Description="Read counts by strand for ref and alt alleles (Rf,Af,Rr,Ar)">
novoPP tag definition (INFO):
##INFO=<ID=novoPP,Number=1,Type=Float,Description="Posterior probability from novoCaller">
note: novoCaller model assumptions do not apply to unbalanced chromosomes (e.g. sex and mithocondrial chromosomes), therefore the model does not assign a posterior probabilty. When filtering by posterior probabilty (--ppthr
), these variants are treated as if their posterior probabilty was 0.
Examples¶
Calls de novo variants. This will return the calls ranked and sorted by calculated posterior probabilty.
granite novoCaller -i file.vcf -o file.out.vcf -u file.unrelatedfiles -t file.triofiles
It is possible to filter-out variants with posterior probabilty lower than --ppthr
.
granite novoCaller -i file.vcf -o file.out.vcf -u file.unrelatedfiles -t file.triofiles --ppthr <float>
It is possible to filter-out variants with population allele frequency higher than --afthr
. Allele frequency must be provided for each variant in INFO column.
granite novoCaller -i file.vcf -o file.out.vcf -u file.unrelatedfiles -t file.triofiles --afthr <float> --aftag tag
Filters can be combined.
granite novoCaller -i file.vcf -o file.out.vcf -u file.unrelatedfiles -t file.triofiles --afthr <float> --aftag tag --ppthr <float>
comHet¶
comHet is a calling algorithm for compound heterozygous mutations. The model uses genotype-level information in pedigree (trio) and VEP-based annotations to call possible compound heterozygous pairs. VEP annotations are used to assign variants to genes and transcripts, genotype information allows to refine calls based on inheritance mode. Calls are further flagged as “Phased” or “Unphased”, where “Phased” means that genotype information supports in-trans inheritance for alternate alleles from parents.
Arguments¶
usage: granite comHet [-h] -i INPUTFILE -o OUTPUTFILE --trio TRIO [TRIO ...]
[--VEPtag VEPTAG] [--sep SEP] [--filter_cmpHet]
[--allow_undef] [--SpliceAItag SPLICEAITAG] [--impact]
optional arguments:
-i INPUTFILE, --inputfile INPUTFILE
input VCF file
-o OUTPUTFILE, --outputfile OUTPUTFILE
output file to write results as VCF, use .vcf as
extension
--trio TRIO [TRIO ...]
list of sample IDs for trio, PROBAND is required and
must be listed FIRST (e.g. --trio PROBAND_ID
[PARENT_ID] [PARENT_ID])
--VEPtag VEPTAG by default the program will search for "CSQ" TAG
(CSQ=<values>), use this parameter to specify a
different TAG to be used (e.g. VEP)
--sep SEP by default the program uses "&" as separator for
subfields in annotating VCF (e.g.
ENST00000643759&ENST00000643774), use this parameter
to specify a different separator to be used
--filter_cmpHet by default the program returns all variants in the
input VCF file. This flag will produce a shorter
output containing only variants that are potential
compound heterozygous
--allow_undef by default the program ignores variants with undefined
genotype in parents. This flag extends the output to
include these cases
--SpliceAItag SPLICEAITAG
by default the program will search for SpliceAI delta
scores (DS_AG, DS_AL, DS_DG, DS_DL) to calculate the
max delta score for the variant. If a max value is
already defined, use this parameter to specify the TAG
| TAG field to be used
--impact use VEP "IMPACT" or "Consequence" terms to assign an
impact to potential compound heterozygous. If
available, SpliceAI and ClinVar "CLNSIG" information
is used together with VEP
Input¶
comHet accepts files in VCF format as input. Files must contain genotype information for trio members to be used in addition to standard VCF columns. Column IDs for trio must match the sample IDs provided as argument (--trio
). Proband genotype information is mandatory. If available, parents information will be used to improve specificity by ruling-out false calls based on inheritance mode. VEP annotations for “Gene” and “Feature” are also required in INFO column for transcripts.
Required VCF format structure:
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT PROBAND_ID [MOTHER_ID] [FATHER_ID]
Output¶
comHet generates output in VCF format. The program adds a VEP-like tag to INFO field to report information for calls associated to each variant. comHet stores information for each compound heterozygous pair (cmpHet) that involves the variant.
comHet tag definition (INFO):
##INFO=<ID=comHet,Number=.,Type=String,Description="Putative compound heterozygous pairs. Subembedded:'cmpHet':Format:'phase|gene|transcript|mate_variant'">
comHet tag definition (INFO) with --impact
:
##INFO=<ID=comHet,Number=.,Type=String,Description="Putative compound heterozygous pairs. Subembedded:'cmpHet':Format:'phase|gene|transcript|impact_gene|impact_transcript|mate_variant'">
A cmpHet is defined for each gene and for each possible mate variant. Multiple cmpHets are listed separated by comma.
Example:
comHet=Phased|ENSG00000069424||STRONG_PAIR||chr1:6051661C>T,Phased|ENSG00000069424|ENST00000652845|STRONG_PAIR|STRONG_PAIR|chr1:6082358C>T,Phased|ENSG00000084636|ENST00000373672&ENST00000488897|STRONG_PAIR|STROING_PAIR|chr1:6051661G>A
All shared transcripts for a given pair are listed in transcript
field. If the pair does not share any transcript, the field is empty.
Examples¶
Calls compound heterozygous variants.
granite comHet -i file.vcf -o file.out.vcf --trio PROBAND_ID [PARENT_ID] [PARENT_ID]
It is possible to add impact information for gene (impact_gene
) and for shared transcripts (impact_transcript
). impact_gene
is the worst impact calculated at gene level while considering all its associated transcripts. impact_transcript
is the worst impact calculated considering only transcripts that are shared between the two mates, if any. VEP annotations for “IMPACT” or “Consequence” must be provided in INFO column in order to assign an impact. If available, SpliceAI and ClinVar “CLNSIG” information is used together with VEP to refine the assignment.
granite comHet -i file.vcf -o file.out.vcf --trio PROBAND_ID [PARENT_ID] [PARENT_ID] --impact
It is possible to reduce the output to only variants that are potential compound heterozygous.
granite comHet -i file.vcf -o file.out.vcf --trio PROBAND_ID [PARENT_ID] [PARENT_ID] --filter_cmpHet
Impact¶
A variant is considered to have a potential STRONG impact if VEP impact is HIGH or MODERATE, spliceAI score is >= 0.8, or ClinVar assignment is Pathogenic | Likely Pathogenic. If both variants are STRONG, the pair is assigned as a STRONG_PAIR. If only one of the two variants is STRONG, the pair is assigned as a MEDIUM_PAIR. If none of the variants is STRONG, the pair is assigned as a WEAK_PAIR.