1. 14 Jun, 2018 2 commits
  2. 13 Jun, 2018 1 commit
  3. 08 Jun, 2018 1 commit
    • Derek Conkle-Gutierrez's avatar
      inversionfix.py now refines INV positions in VCF output · 60dbf8cd
      Derek Conkle-Gutierrez authored
      inversionfix.py now calls its refineposition function
      on the reference postions of inversion for VCF output
      too (this BLASTs the region around the inversion start
      and inversion stop in the refence to the querry sequence,
      and uses the breakpoints to calculate a more accurate
      position of the inversions).
      
      inversionfix.py also uses 150 bp on eihter side of the INV
      position for blasting instead of 50, to give BLAST more to
      work with when refining the positions.
      60dbf8cd
  4. 25 May, 2018 1 commit
    • Derek Conkle-Gutierrez's avatar
      inversionfix.py now blasts querry region around inversion to reference · 47b04833
      Derek Conkle-Gutierrez authored
      During the blast position refinement step, inversionfix.py was
      previously blasting the region of inversion
      (in terms of querry position) against querry region of inversion,
      which is incorrect, as the querry position is not the equivalent
      region of alingment in the reference.
      
      Now inversionfix.py takes a region of the querry sequence around
      the inversion start/stop, and blasts that against the whole
      reference fasta. Then uses the first breakpoint (start of
      querry alignment, + the start of the region splice in the querry)
      as the new inversion start/stop.
      
      This did not effect output for the test data, and now biodiff
      can run on isolate 1-0031 without error.
      47b04833
  5. 24 May, 2018 4 commits
    • Derek Conkle-Gutierrez's avatar
      inversionfix.py now uses blastn to refine position of inversions · d93b3eef
      Derek Conkle-Gutierrez authored
      This did not make the output worse, though it did not improve the
      accuracy as I had hoped. The start and end positions of the
      inversions are still often off by 1 bp. This may be an off-by-one
      error somewhere in my code, though it is applied inconsistently.
      
      Dnadiff is definatly off by 1 bp in its inversion detection.
      I created two reference querry fasta file pairs (one based on
      lambda-phage, one based on H37Rv) with 2 inversions in the same
      2 locations, and ran dnadiff on both. Their positions are off
      from eachother for the end position of the first inversion,
      and the start and end of the 2nd inversion.
      d93b3eef
    • Derek Conkle-Gutierrez's avatar
      biodiff now uses temp directory consistently · f8b5adad
      Derek Conkle-Gutierrez authored
      biodiff now uses the same temp directory for dnadiff output
      as the rest of biodiff. The same is true for the intermediate
      files made by inversionfix.py
      f8b5adad
    • Derek Conkle-Gutierrez's avatar
    • Derek Conkle-Gutierrez's avatar
      biodiff now integrates record of inversions into vcf format output · fe94f2eb
      Derek Conkle-Gutierrez authored
      inversionfix.py now in addition to creating a temp.fasta file
      with the inversions undone, which the rest of biodiff then uses,
      but now also outputs a temp.vcf file which records the inversions,
      in terms of the reference positions, in vcf format.
      
      biodiff now append its udiff2vcf to temp.vcf, which is then
      sorted by postion (2nd column) and output to standard out.
      
      Tested with test/lambda-phage/inversion.fasta
      
      Currently the CHROM value of the inversion entries do not match
      the other entries in the vcf output. Next change to inversionfix.py
      will have it read in the querry fasta file to get the correct
      chromosome name, to match the rest of biodiff.
      
      Eventually inversionfix.py should use a blast of a small (100 bp)
      section of the querry centered around the inversion starts and ends
      against the reference, to more precisely find the breakpoints.
      The positions provided by dnadiff (nucmer) are often off by a few
      bases, resulting in several incorrect biodiff calls around the edges
      of inversions (though not nearly as many as before).
      fe94f2eb
  6. 23 Apr, 2018 1 commit
  7. 21 Apr, 2018 1 commit
    • Derek Conkle-Gutierrez's avatar
      inversionfix.py now works with inversions, but not nested inversions. · e67f3206
      Derek Conkle-Gutierrez authored
      inversionfix.py interprets dnadiff output more accurately, and undoes
      inversions using reverse complementation. When the integrated biodiff
      ran on an test querry and reference with 2 known inversions,
      inversionfix.py correctly identified the inversions, and created a
      temp file with the inversion fixed it then passed to the rest of biodiff,
      resulting in a cleaner output vcf from biodiff. There were a few bases
      along the edges of the inversions that were left unaltered, resulting in
      a couple entries from biodiff, but only a few. When run on isolate 1-0007
      inversionfix.py finds and reverses some inversions.
      
      However for isolate 4-0010, which has a large inversion with several
      smaller inversions overlapping it, inversionfix.py fails to correct
      them properly. It interprets the start of a sub inversion as the end
      of the current inversion, resulting in the large inversion being
      intrpretted as a series of consecutive inversions. Thus the syntenny
      of the components is still flipped, even if their contents are now
      the right way around. I do not yet know if dnadiff records any
      information reguarding the nested structure of these inversions, or
      if I will need to find another method to find and reverse nested
      inversions. Perhaps someone has already written a paper...
      e67f3206
  8. 07 Apr, 2018 1 commit
    • Derek Conkle-Gutierrez's avatar
      inversionfix.py and biodiff should work with inversions now · 813a455a
      Derek Conkle-Gutierrez authored
      inversionfix.py now outputs temp fasta with inversions undone, and a
      temp vcf noting the inversions. I checked its output with 4-0010 and
      the temp file has its sequence inverted compared to the copy in
      $GROUPHOME/data/genomes where I expect it.
      
      biodiff now calls dnadiff and inversionfix with appropriate arguments
      for the isolate its running, and uses the temp fasta as the query.
      However biodiff still cannot finish when running on 4-0010, nor can
      the original biodiff.
      
      Next I will have inversionfix.py change the transposons, hopefully
      then biodiff will be able to run on 4-0010. If not, I will attempt
      debugging.
      
      Eventuallt I will write a script to take in udiff2vcf's standard
      output and integrate it with the temp vcf from inversionfix.py,
      and label the GAPs as insertions and deleations as appropriate.
      813a455a
  9. 03 Apr, 2018 1 commit
  10. 14 Feb, 2018 4 commits
  11. 08 Feb, 2018 4 commits
  12. 21 Feb, 2017 1 commit
  13. 17 Feb, 2017 2 commits
  14. 16 Feb, 2017 6 commits
  15. 27 Jan, 2017 6 commits
  16. 18 Aug, 2016 3 commits
  17. 28 Jul, 2016 1 commit