Skip to content
  • Derek Conkle-Gutierrez's avatar
    biodiff now integrates record of inversions into vcf format output · fe94f2eb
    Derek Conkle-Gutierrez authored
    inversionfix.py now in addition to creating a temp.fasta file
    with the inversions undone, which the rest of biodiff then uses,
    but now also outputs a temp.vcf file which records the inversions,
    in terms of the reference positions, in vcf format.
    
    biodiff now append its udiff2vcf to temp.vcf, which is then
    sorted by postion (2nd column) and output to standard out.
    
    Tested with test/lambda-phage/inversion.fasta
    
    Currently the CHROM value of the inversion entries do not match
    the other entries in the vcf output. Next change to inversionfix.py
    will have it read in the querry fasta file to get the correct
    chromosome name, to match the rest of biodiff.
    
    Eventually inversionfix.py should use a blast of a small (100 bp)
    section of the querry centered around the inversion starts and ends
    against the reference, to more precisely find the breakpoints.
    The positions provided by dnadiff (nucmer) are often off by a few
    bases, resulting in several incorrect biodiff calls around the edges
    of inversions (though not nearly as many as before).
    fe94f2eb