changed how reports >10bp variants from diff

Large ALT and REF column entries now say <SUBLEN= followed by
the length of the ALT or REF from udiff2vcf. This is not really,
vcf format, but it will do for now. Biodiff seems to report
variation as substitutions where possible, rather than indels.
Perhaps the vcf output should embrace that.
parent b06e560f
......@@ -23,10 +23,10 @@ def main():
# Note: All SV entrees are at the start of the input file
else: # parse other entries and adjust based on length of REF/ALT and position relative to inversions
if len(vcfbits[3]) > 10 or len(vcfbits[4]) > 10:
vcfbits[3] = '.'
vcfbits[4] = '.'
vcfbits[7] = 'SVLEN=' + str(len(vcfbits[4]))
if len(vcfbits[3]) > 10:
vcfbits[3] = '<SUBLEN=' + str(len(vcfbits[3])) + '>'
if len(vcfbits[4]) > 10:
vcfbits[4] = '<SUBLEN=' + str(len(vcfbits[4])) + '>'
intrainv = False # flag to track if variant is inside an inversion
nearinv = False # flag to track if variant is near the edge of an inversion
for inv in inversionlist:
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment