inversionfix.py now blasts querry region around inversion to reference

During the blast position refinement step, inversionfix.py was
previously blasting the region of inversion
(in terms of querry position) against querry region of inversion,
which is incorrect, as the querry position is not the equivalent
region of alingment in the reference.

Now inversionfix.py takes a region of the querry sequence around
the inversion start/stop, and blasts that against the whole
reference fasta. Then uses the first breakpoint (start of
querry alignment, + the start of the region splice in the querry)
as the new inversion start/stop.

This did not effect output for the test data, and now biodiff
can run on isolate 1-0031 without error.
parent d93b3eef
......@@ -5,14 +5,13 @@ from Bio import SeqIO
from Bio.Blast.Applications import NcbiblastnCommandline
def refineposition(refasta, queryseq, position, out):
def refineposition(refasta, queryseq, position):
# blast 100 bp of query sequence around the start and end or each inversion against reference, to refine position
blast_command = NcbiblastnCommandline(cmd='blastn', task='megablast', subject=refasta,
outfmt=6, subject_loc=str(position-50) + '-' + str(position+50))
joinseq = ''.join(queryseq)
blast_command = NcbiblastnCommandline(cmd='blastn', task='megablast', subject=refasta, outfmt=6)
joinseq = ''.join(queryseq[(position-50):(position+50)])
stdout, stderr = blast_command(stdin=joinseq)
qstop = stdout.rstrip('\n').split('\t')[out]
return int(qstop)
qstop = stdout.rstrip('\n').split('\t')[6]
return int(qstop) + (position-50) - 1 # -1 to put into 0 based indexing for biopython
def main():
......@@ -49,8 +48,8 @@ def main():
# refine inversion coordinates using blast
for inv in inversions:
inv['Start'] = refineposition(refasta=args.rfasta, queryseq=qseq, position=inv['Start'], out=9)
inv['End'] = refineposition(refasta=args.rfasta, queryseq=qseq, position=inv['End'], out=8)
inv['Start'] = refineposition(refasta=args.rfasta, queryseq=qseq, position=inv['Start'])
inv['End'] = refineposition(refasta=args.rfasta, queryseq=qseq, position=inv['End'])
# undo the inversions found in args.qdiff
fixedrecord = qrecord
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment