Nucleotide Probability Scoring mechanism explanation
Hi,
I'm reading through the quality score output code that was added after issue #4. I'm trying to comprehend the math that went into creating these scores, and while I've made a fair amount of progress there are questions I have about the reasoning behind some of the calculations. I haven't found a description of these calculations in the papers attached to lamassemble (references at the bottom). Are there other papers that formed the basis of the rationale for the specifics of the probability for choosing a particular nucleotide column of the alignment? The closest I'm finding are the papers describing the phred quality score, but there are lamassemble-specific decisions that vary from that (for example, relying on the probability matrix provided in the required last-train .mat file).
References:
A pipeline for complete characterization of complex germline rearrangements from long DNA reads. Genome Med 12, 67 (2020).
Frith M. C MS, Katoh K. lamassemble: multiple alignment and consensus sequence of long reads. Methods Molecular Biol 2020. in press.
issue