On Sun, 2 Mar 2025 at 16:25, Bert via <el_bert0=[email protected]> wrote:
The one thing with image cleanup routines, they are mostly made for text based documents and are looking to clean up the text. As such, they don't have the "understanding" as to the particularities of schematics.? In the way back days, there was vectorizing tools to help engineering firms convert scanned drawings, those things on dead, ground up trees to CAD files. But they required both a lot of work to verify the exactitude of the resulting CAD file and possibly re-verify and re-authorize it.
A very useful amount of cleanup can be done simply by comparing each pixel on its own to the average of the wide neighbourhood pixels. That works equally well for text and schematics, since it has no concept of either and they are both high contrast. It does not work well with photos.
How well such text/schematic cleanup works depends on the original's quality. Unsurprisingly, higher XY resolution and uniform illumination/background are advantageous, since both allow the "slicing level" to be more easily determined and more uniformly applicable.
Provided there are sufficient pixels for the text characters to be reasonably well formed (open loops, no "spurs"), OCR is then possible. Evidence: a couple of manuals I have quickly scanned on an ordinary printer/scanner/copier, post processed into TIFFs inside a PDF file, and somebody else has OCRed.
I have no comment about vectorising; I have never had any use for such a tool. I would expect it to be as (un)successful as decompiling binary object code back into a high level language source code.