Performance difference to NeMo results
Find out why there is a performance difference to the (greedy) results reported by NeMo.
The conversion of the models seems to be correct -> differences in the dataset or in post-processing steps?
Other projects like https://github.com/domcross/german-stt-evaluation or https://arxiv.org/pdf/2204.05617.pdf (table 4) again have different results...
Edited by DANBER