## File Formats The program is compatible with standard BED, BAM and VCF formats (VCFv4.x). ### ReadCountKeeper (.rck) RCK is a tabular format that allows to efficiently store counts by strand (ForWard-ReVerse) for reads that support REFerence allele, ALTernate alleles, INSertions or DELetions at CHRomosome and POSition. RCK files can be further compressed with *bgzip* and indexed with *tabix* for storage, portability and faster random access. 1-based. Tabular format structure: #CHR POS COVERAGE REF_FW REF_RV ALT_FW ALT_RV INS_FW INS_RV DEL_FW DEL_RV 13 1 23 0 0 11 12 0 0 0 0 13 2 35 18 15 1 1 0 0 0 0 Commands to compress and index files: ```text bgzip PATH/TO/FILE tabix -b 2 -s 1 -e 0 -c "#" PATH/TO/FILE.gz ``` ### BinaryIndexGenome (.big) BIG is a HDF5-based binary format that stores boolean values for each genomic position as bit arrays. Each position is represented in three complementary arrays that account for SNVs (Single-Nucleotide Variants), insertions and deletions respectively. 1-based. HDF5 format structure: e.g. chr1_snv: array(bool) chr1_ins: array(bool) chr1_del: array(bool) chr2_snv: array(bool) ... ... chrM_del: array(bool) *note*: HDF5 keys are built as the chromosome name based on reference (e.g. chr1) plus the suffix specifying whether the array represents SNVs (_snv), insertions (_ins) or deletions (_del). ### Pedigree in JSON format When the program requires pedigree information, the expected format is as follows: [ { "individual": "NA12877", "sample_name": "NA12877_sample", "gender": "M", "parents": [] }, { "individual": "NA12878", "sample_name": "NA12878_sample", "gender": "F", "parents": [] }, { "individual": "NA12879", "sample_name": "NA12879_sample", "gender": "F", "parents": ["NA12878", "NA12877"] } ] where `individual` is the unique identifier for a member within the pedigree, `sample_name` is the corresponding sample ID in VCF file, and `parents` is the list of unique identifiers for the parents, if any.