-
Derek Conkle-Gutierrez authored
inversionfix.py now in addition to creating a temp.fasta file with the inversions undone, which the rest of biodiff then uses, but now also outputs a temp.vcf file which records the inversions, in terms of the reference positions, in vcf format. biodiff now append its udiff2vcf to temp.vcf, which is then sorted by postion (2nd column) and output to standard out. Tested with test/lambda-phage/inversion.fasta Currently the CHROM value of the inversion entries do not match the other entries in the vcf output. Next change to inversionfix.py will have it read in the querry fasta file to get the correct chromosome name, to match the rest of biodiff. Eventually inversionfix.py should use a blast of a small (100 bp) section of the querry centered around the inversion starts and ends against the reference, to more precisely find the breakpoints. The positions provided by dnadiff (nucmer) are often off by a few bases, resulting in several incorrect biodiff calls around the edges of inversions (though not nearly as many as before).
fe94f2eb