VCF Parser

granite library can be used directly to access and manipulate information in VCF format.

Import the library

from granite.lib import vcf_parser

Usage

The library implements the objects Vcf, Header and Variant.

Vcf

This is the main object and has methods to read and write VCF format.

Initialize the object

vcf_obj = vcf_parser.Vcf('inputfile.vcf')

This will automatically read the file header into a Header object.

Read and access variants

The method parse_variants() will read the file and return a generator to Variant objects that store variants information.

for vnt_obj in vcf_obj.parse_variants():
    # do something with vnt_obj

Write to file

The method write_header(fo) allows to write header definitions and columns to specified buffer (fo).

with open('outputfile.vcf', 'w') as fo:
    vcf_obj.write_header(fo)

It is possible to write only definitions or columns respectively with the methods write_definitions(fo) and write_columns(fo).

with open('outputfile.vcf', 'w') as fo:
    vcf_obj.write_definitions(fo)
    vcf_obj.write_columns(fo)

The method write_variant(fo, Variant_obj) allows to write information from Variant object to specified buffer (fo).

with open('outputfile.vcf', 'w') as fo:
    vcf_obj.write_variant(fo, vnt_obj)

Variant

This is the object used to store information for variants in VCF format.

Attributes

CHROM <str>

Stores chromosome name (e.g. 1, chr1), as in the VCF file.

vnt_obj.CHROM
POS <int>

Stores variant position.

vnt_obj.POS
ID <str>

Stores variant ID(s), as in the VCF file.

vnt_obj.ID
REF <str>

Stores reference allele at position.

vnt_obj.REF
ALT <str>

Stores alternate allele(s) at position.

vnt_obj.ALT
QUAL <str>

Stores phred-scaled quality score for the assertion made in ALT.

vnt_obj.QUAL
FILTER <str>

Stores filter status.

vnt_obj.FILTER
INFO <str>

Additional information for the variant.

vnt_obj.INFO
FORMAT <str>

Stores specification for the genotype column(s) structure.

vnt_obj.FORMAT
IDs_genotypes <list>

Stores sample ID(s) available in the VCF as list. If multiple samples, the order from the VCF is maintained.

vnt_obj.IDs_genotypes
GENOTYPES <dict>

Stores a dictionary linking genotype(s) for the variant to corresponding sample ID(s).

# {ID_genotype: genotype, ...}
vnt_obj.GENOTYPES

Format variants

The method to_string() returns the variant representation in VCF format.

vnt_vcf <str> = vnt_obj.to_string()

The method repr() returns the variant representation as CHROM:POSREF>ALT.

vnt_repr <str> = vnt_obj.repr()

Manipulate genotype(s)

The method remove_tag_genotype(tag, sep=’:’) allows to remove a tag from FORMAT and GENOTYPES. sep is the tags separator used in format definition and genotype(s).

# remove AD tag from format definition and genotype(s)
tag = 'AD'
vnt_obj.remove_tag_genotype(tag)

The method complete_genotype(sep=’:’) fills in the trailing fields that are missing and by default dropped in GENOTYPES. sep is the tags separator used in format definition and genotype(s).

vnt_obj.complete_genotype()

The method empty_genotype(sep=’:’) returns a empty genotype based on FORMAT structure. sep is the tags separator used in format definition and genotype(s).

empty <str> = vnt_obj.empty_genotype()

The method add_tag_format(tag, sep=’:’) allows to add a tag at the end of FORMAT structure. sep is the tags separator used in format definition and genotype(s).

# add RSTR tag to format
tag = 'RSTR'
vnt_obj.add_tag_format(tag)

The method add_values_genotype(ID_genotype, values, sep=’:’) allows to add values at the end of the genotype specified by corresponding ID. sep is the tags separator used in format definition and genotype(s).

vnt_obj.add_values_genotype(ID_genotype, values)

The method get_genotype_value(ID_genotype, tag, sep=’:’) returns value for tag from the genotype specified by corresponding ID. sep is the tags separator used in format definition and genotype(s).

tag_val <str> = vnt_obj.get_genotype_value(ID_genotype, tag)

Manipulate INFO

The method remove_tag_info(tag, sep=’;’) allows to remove a tag from INFO. sep is the tags separator used in INFO.

vnt_obj.remove_tag_info(tag)

The method add_tag_info(tag_value, sep=’;’) allows to add a tag and its value at the end of INFO. sep is the tags separator used in INFO.

# add tag and value to INFO
tag_value = 'tag=value'
vnt_obj.add_tag_info(tag_value)

The method get_tag_value(tag, sep=’;’) returns the value from tag in INFO. sep is the tags separator used in INFO.

tag_val <str> = vnt_obj.get_tag_value(tag)

note: tag and ID are case sensitive.