On Wednesday, 19 February 2025, Andrew C via <atc-HPequipgroup=
auscal.net@groups.io> wrote:
> John - re the Google workflow
> ?
> This is new to me, and interesting, but before I head down that rabbit hole do you know how it handles non-English language?
> My current problem is a trove of vintage calculator service manuals, written in technical German and presented as hundreds of pages of image PDF.
> Goal is to extract English text - I will deal with the diagrams/schematics later.
> ?
> Do you know anything in the Google workflow that may be an advantage, or a dealbreaker??
> ?
> Thanks, just let me know what you know off the top of your head, I will dig into it myself if there are no obvious dealbreakers.
No comments about the workflow, but I'm doing this for a 1930s calculator manual in high German.
After posterisation, Tesseract with the relevant plugin is, say, 99% accurate, but it is still necessary to manually compare each character with the original, and edit the text. The edited text is fit for input to Google translate.
Slow going; improvements welcome.