Keyboard Shortcuts
ctrl + shift + ? :
Show all keyboard shortcuts
ctrl + g :
Navigate to a group
ctrl + shift + f :
Find
ctrl + / :
Quick actions
esc to dismiss
Likes
Search
Service manual scan post processing
开云体育Adobe Photoshop ( Automate -> Photomerge ) Am 19.02.2025 um 09:59 schrieb Peter Brown:
|
I've tried a panoramic photo tool on both scans and photographs, but the results were unsatisfactory. I've had to resort to using gimp on the posterised tiff files, or just posting separate pages so that a user can do the job if it matters to them. On Wed, 19 Feb 2025 at 09:59, Peter Brown via <peter=[email protected]> wrote:
|
开云体育I do the same. It also allows one to clean up the document, particularly to remove fold shadows, spots and other minor but annoying defects.DaveD On 2/19/2025 5:09 AM, DF6NA Rainer via
groups.io wrote:
|
John - re the Google workflow
?
This is new to me, and interesting, but before I head down that rabbit hole do you know how it handles non-English language?
My current problem is a trove of vintage calculator service manuals, written in technical German and presented as hundreds of pages of image PDF.
Goal is to extract English text - I will deal with the diagrams/schematics later.
?
Do you know anything in the Google workflow that may be an advantage, or a dealbreaker??
?
Thanks, just let me know what you know off the top of your head, I will dig into it myself if there are no obvious dealbreakers.
? |
On Wednesday, 19 February 2025, Andrew C via <atc-HPequipgroup=[email protected]> wrote: > John - re the Google workflow > ? > This is new to me, and interesting, but before I head down that rabbit hole do you know how it handles non-English language? > My current problem is a trove of vintage calculator service manuals, written in technical German and presented as hundreds of pages of image PDF. > Goal is to extract English text - I will deal with the diagrams/schematics later. > ? > Do you know anything in the Google workflow that may be an advantage, or a dealbreaker?? > ? > Thanks, just let me know what you know off the top of your head, I will dig into it myself if there are no obvious dealbreakers. No comments about the workflow, but I'm doing this for a 1930s calculator manual in high German. After posterisation, Tesseract with the relevant plugin is, say, 99% accurate, but it is still necessary to manually compare each character with the original, and edit the text. The edited text is fit for input to Google translate. Slow going; improvements welcome. |
That's a great question and I honestly don't know. I would think that Google has a fairly robust language capability (they have the translate tools, after all) but I don't know how that might play into OCR.
toggle quoted message
Show quoted text
I bet that Google Translate is available as an API that you could run your OCR'd text through, but I don't have any actual knowledge about that. On 2/19/25 17:59, Andrew C via groups.io wrote:
John - re the Google workflow |
I use software called Hugin. Bit of a learning curve, but works semi-reliably after you get it down. However, I would not want to do hundreds of stitches using Hugin, so if the Adobe solution is a 1 click type of thing, that might be the way to go (I have no idea about it). I processed a bit of the 229MB pdf file you uploaded (the vol 4) and made a new folder for my processed version with some notes. Not exactly pretty but the file sizes are down and the text only pages look OK and are quite small in file size. The bulk of the file size comes from the large stitched page. The last stitched page could probably be converted to G4 and save some space. Initial page is ugly. The original files are indeed some kind of "halftone" type of black dot scheme which is less than ideal. -Michael Bierlein On Wed, Feb 19, 2025 at 03:59 Peter Brown <peter@...> wrote:
|
On Thu, 20 Feb 2025 at 07:14, Michael Bierlein via <bierl008=[email protected]> wrote:
I've tried hugin for schematics, and was unsuccessful: many of the continuous lines were no longer continuous, and it connected some lines incorrectly. What settings did you use for the stitching? |
I pretty much just follow this guide: Which can be confusing at times (wording is not totally clear at some points). Especially after you create the control points, you have to do this strange zoom, move, and crop, and hope everything turns out OK. And the preview it creates durng the process will look like crap- you have to remember to click "Calculate Optimal Size" at the end. I have definitely had Hugin not connect some schematic lines before, and some schematics have been completely unsuccessful using it, but have had successes elsewhere. To my knowledge, it hasn't connected any lines incorrectly, that could be a control points issue. -Michael On Thu, Feb 20, 2025 at 02:59 Tom Gardner via <tggzzz=[email protected]> wrote:
|
Whilst looking through the service manual for the Canon MS-800 I noticed that it listed a 128Mb SODIM as being optional and necessary for the greyscale option?
?
My machine did not have this fitted so I popped a PC66 stick in and hey presto the software at the PC end of the system now reports that 16 / 256 level grayscale is now available as a scan parameter (previously only black and white).? It also appears to have opened up a contrast option on the PC end.
?
Great that all the software appears to be resident on the hardware by default.?
?
I wonder how much they charged for the grayscale option back in the day?
?
Peter |
Nice! That is a pleasant suprise. Would expect an EPROM change might have been required. I wonder if my Minolta has a similar upgrade. I will look in to it if I ever get the SCSI communication to work. Of course, the grayscale will be much better for the image, schematic, and diagram pages. The text only pages are probably fine as black and white, but as a rule for me I always ingest as color so I don't knitpick each and every page at the scanning ingest step. -Michael On Fri, Feb 21, 2025 at 09:15 Peter Brown <peter@...> wrote:
|
Peter, Apologies about the late reply. I have a Minolta MS-6000 MKii. The thing is, SCSI was an option, and my Minolta originally did NOT have that option. I found someone parting out a SCSI equipped version and I installed the SCSI hardware on my Minolta, but it is not a completely straightforward process. I have not worked on the project for a number of years now. Any luck on getting some of the fiche scanned in grayscale? -Michael On Fri, Feb 21, 2025 at 14:47 Peter Brown <peter@...> wrote:
|
Following the unlocking of gray scale mode on the Canon MS-800, I have scanned the service manual for the 11707A test plugin.
?
I have uploaded .tiff files for each page to the temporary file section here
temporary directory for photographs and help relating to emails and posting/11707A Service PFX 1525A 11707-90007 SEP 1977 rev 1
?
This should represent 'the best the machine can do' though I do need to experiment a little with the Automatic exposure / manual exposure options
?
I would be interested in feedback from members as to how well their post processing for .pdf works with these files.?
Especially interested in ways of preserving the picture quality whilst also tidying up / compressing the pages of text
?
Peter
? |
开云体育As long as there is a "twain" windows driver for the beast you
might try NAPS2.
On 27/02/2025 15:42, Peter Brown wrote:
|
As far as I can tell all of the TWAIN files are being installed correctly, the scanner appears in the Windows hardware list but does not appear when you interrogate TWAIN devices
I do not have a good enough knowledge of how TWAIN works.?
My fear is that the TWAIN driver looks for a specific name string to be reported by the attached scanner and my scanner is not reporting with that exact string.
Still trying to find a reference that shows at a machine level how TWAIN works or how you go about writing a TWAIN driver - any pointers gratefully received.
?
Peter
? |
Any other takers for trying to post process the 256 level .tiff files? -? found at
?
files - temporary directory for photographs and help relating to emails and posting/11707A Service PFX 1525A 11707-90007 SEP 1977 rev 1
?
The files contain pages of just text, pages with text and images, and pages with circuit diagrams
?
I'll leave them up for another couple of days and then remove as they are pretty big
?
Peter |
On 2/28/25 10:00, Peter Brown wrote:
Any other takers for trying to post process the 256 level .tiff files? -? found atPeter, I'll take a whack at it. Stand by. -Dave -- Dave McGuire, AK4HZ New Kensington, PA |
I've tried "my" scancvt mechanism on file 11707A0014.tif and I'm not satisfied with the results.? The file size is 93kB, but characters "4" "6", "R", "B" and similar have the loop filled in, for reasons that aren't clear to me. Have you used a gaussian filter when generating those files, or are they the raw scanner output? On Fri, 28 Feb 2025 at 17:45, Dave McGuire via <mcguire=[email protected]> wrote: On 2/28/25 10:00, Peter Brown wrote: |
to navigate to use esc to dismiss