RetrOCR ext. tool UX design ($2578263) · Snippets · YoshiRulz / RetrOCR

workflow concept:

launch game and tool
play until dialogue
push "capture" button/hotkey
advance dialogue
repeat 2 and 3
push "tag" button
captured moment is shown (screenshot or savestate), draw boxes over glyphs
push "next" button/hotkey
repeat 6 and 7
push "map" button
glyph is shown, enter Unicode equivalent (native text)
push "next" button/hotkey
repeat 10 and 11
push "translate" button
play until dialogue
push "capture" button/hotkey
copy transcribed text into Google Translate and hope

edit 2023-08-10: I've mostly implemented the first part, glyph capture. The UX is very different from what I initially envisioned. You can see the bright-yellow OSD in one of the Yoshi's Island screenshots below. It should be extensible to variable-width text, though when I come back to this project in a few days I'd prefer to just find another game and get right into the next part, mapping. Seeing as I already spent a whole day making an on-screen keyboard for it...

edit 2024-07-16: "when I come back to this project in a few days" ...is that so? Anyway, ScHlAuChi expressed interest in the project, which lead me to finally come back and start on the latter workflow, mapping glyphs to codepoints. After much ado I have something that half works, with the caveat that the grid is hardcoded for this one textbox since I never finished that:

~~older games have nice grids:~~ update: AAAAAAAAAAAAA

but newer games, while still using sprites, have variable-width glyphs:

(at first glance it looks like it just needs aligning, but while it seems the kana are all 8 px wide and the kanji are all 12 px wide, suggesting a 4 px alignment, the small イ is a pixel narrower and throws it off)

NEVERMIND, GOT IT! https://www.nuget.org/packages/TesserNet

...but now I'm seeing that Tesseract doesn't agree with the Yoshi's Island font in the slightest

https://www.nuget.org/packages/TesseractOCR API isn't psychotic like the other Tesseract wrapper, but needs a tiny patch before I can even try hacking on a POC.

https://www.nuget.org/packages/Sdcb.PaddleOCR promising, but doesn't ship binaries for Linux

https://github.com/cyanfish/naps2/tree/master/NAPS2.Sdk hey look, not only is this PDF scanner FOSS, this one is actually modular! unfortunately, while the modules look nice, they're not published to NuGet yet. also it's just an abstraction over Tesseract.

https://scribeocr.com https://github.com/scribeocr/scribe.js pros: FOSS, not Tesseract; cons: JS, only English and other Latin-script languages atm

need to handle scrolling textboxes

Great results! Sadly not FOSS...

Just remembered the electronic jisho on the DS has OCR... The input region is fairly low-resolution, I wonder if its algorithms would work on NN-upscaled sprites?

found this, mainly intended for Chinese<->English from photographs of printed or handwritten text https://paddlepaddle.github.io/PaddleOCR/en/ppocr/overview.html

some ML solutions:

prior art (heavily reliant on Google) https://gitlab.com/spherebeaker/vgtranslate https://ztranslate.net/docs

and another existing tool, powered by manual (cutscene detection and?) localisation https://github.com/eadmaster/RetroSubs

FCEUX already has this??? https://fceux.com/web/help/TextHooker.html