Adding pseudo tags to RATT annotations where original start codon (according to the reference) has been lost
We currently post-process RATT's annotations and add pseudo
tags when we see the genes that have premature stop codons (w/rt the reference), but we don't look for cases where the start codon is lost and the next one occurs much later.
@derekcg (hat tip to him for pointing this out) has brought up the case of hsdS' where a tandem pseudogenized duplicate undergoes recombination with the working copy to create phase variation:
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7662400/ https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4190663/
A similar hsdS' situation might also be happening in Mtb.
potential solution
We can check RATT's report log, see which genes had "Corrected start" and then check the alignment coverage w/rt the reference gene if the gene is much shorter (using the configured alignment coverage thresholds) and tag it pseudo if so.