Skip to content

VCF poor support for gaps

If there is a gap (coded by *) in a VCF, it is reported as such in the allele list but translated as - when genotypes are requested (either get_genotypes or as_site and consequently iter_sites).

A proper bug appears when as_site is called on an indel position where there is a gap. The alphabet is configured from get_alleles, therefore with *, so alleles coming a - are found to be invalid.

  • Change in get_alleles(): the * allele is not included anymore.
  • Change in alphabet created by as_site() and iter_sites() for indels (alphabets of type string): the - allele is always included as missing data.
Edited by Stéphane De Mita