Keyboard Shortcuts
ctrl + shift + ? :
Show all keyboard shortcuts
ctrl + g :
Navigate to a group
ctrl + shift + f :
Find
ctrl + / :
Quick actions
esc to dismiss
Likes
Search
Service manual scan post processing
Hi Michael,
?
Many thanks for responding.? Please do take a look at the files, I have posted some pdfs here?
Files - A temporary directory for photographs and help relating to emails and posting - 8430A 08340-90021 Service vol 1 section1
but if it is of use I can add some of the raw .tiff images that the scanner produces
?
After lurking on auction sites for ages, I also have a Kodak 2400DSV-E which I believe is the same as the Minolta MS-6000 - not sure if this works electrically yet (building up an XP PC to work with it) but mechanically it seems fine.?
We have a saying here in the UK - it's like a number 10 bus, you wait around for ages then two come at once.
?
Let me know how you get on with the pdfs and if you need a few .tiff files to try?
?
Peter
?
? |
Hi Martin,
?
Thanks for the steer on ImageOptim, I will take a look.?
?
I am in touch with a service engineer who used to run a company that printed microfiche and also supplied the reader / scanners.?
He is recommending something called PaperPort - any members have experience with this package?
?
Peter |
I think you may be missing a contrast maximization step. If you zoom
in, the pages appear to be greyscale, dithered down to black and white. I don't think you're going to get much better compression than what you've got. When I'm scanning I'll scan to individual 8bit greyscale, ( or full RGB color ) TIFF files, and then push everything through a contrast enhancement step to blow out the contrast, before conversion to black and white. eg: Blacks areas are fully black, whites are fully white. G4 compression works *much* better on areas of absolute color, vs dithering. I've got some commands from NetPBM / pamthreshold that'll do it under Linux, but I don't know of a tool to conveniently do it w/ Windows. On Tue, Feb 18, 2025 at 9:45?AM Peter Brown via groups.io <peter@...> wrote:
|
On Tue, 18 Feb 2025 at 16:03, David Holland via <david.w.holland=[email protected]> wrote: I think you may be missing a contrast maximization step.? If you zoom That is highly beneficial, and is the way the scancvt script I posted earlier reduces pages to ~80kBytes for a page of text. The scancvt script determines the average colour, then quantises it into two levels. |
Nope, those appear to be dithered too.. (and show indications of JPEG
compression). I uploaded a high zoom'd screenshot of what I'm seeing.. Sorry... On Tue, Feb 18, 2025 at 11:51?AM Peter Brown via groups.io <peter@...> wrote:
|
On Tue, Feb 18, 2025 at 11:49?AM Tom Gardner via groups.io
<tggzzz@...> wrote: That is highly beneficial, and is the way the scancvt script I posted earlier reduces pages to ~80kBytes for a page of text.Thanks, I'll have to try and remember that script in the future. My current process converts things to PNM, iterates over 0.0 to 0.9 (by 0.1) into pamthreshold, then back to TIFF. I'll look through all the threshold'ed images, and pick the one that looks best most times. My process is clunky, and disk intensive. |
Hi David,
I am wondering if the claimed 600 dpi is actually the result of interpolation / dithering
I will upload a page at 600 DPI in bmp format for completeness in case the no compression .tiff actually has some compression
And then drop the resolution down to 300 DPI and save as bmp and tiff
Thanks for your help with this?
Peter |
On Tue, 18 Feb 2025 at 17:37, Peter Brown via <peter=[email protected]> wrote:
If I understand correctly, you are using a digital camera to take a picture, then processing that. Could you please upload the jpg file produced by the camera, without any other processing. |
In lieu of having a simple .jpg file, I've created something vaguely similar and processed that...
1) take your TIFF_1.tif, and remove the "speckle" by using a gaussian blur using the gimp default setting (whatever that might be). Then save that as TIFF_1.jpg.
2) use scancvt to produce b-TIFF_1.tif. That's a 5120x3865 file, 88kBytes only. IMHO it is more legible than the original TIFF_1.tif file. Processing time: 7s.
3) I have uploaded that output as tggzzz-b-TIFF_1.tif Feel free to delete it.
?
The 5120 is derived from a magic constant in the scancvt script, and can be easily changed using a text editor. Increasing it is useful when converting multi-page schematics.
?
The key to that is removing the "speckle". I used gimp, but no doubt there is a simple command line script which could be trivially inserted into an automated workflow.
?
It might be possible to avoid introducing the speckle in the first place by saving the scan as a jpg file, possibly with a low resolution. I've scaled my TIFF_1.jpg down to 1560x1178 (i.e 1/16 the pixels), used scancvt, and the result is just as legible. |
Yeah, it kind of looks like the B&W dithering is occurring in the
hardware/device software as all 4 of the BMP's are only monochrome, w/ varying levels of dithering. (The 600DPI/No Dithering read best, but I suspect photos will be completely terrible w/ that setting. ) There may simply not be a way to get greyscale out of the device's software. Tom's tggzzz test file looks pretty good w/ a gaussian blur step applied. w/ that in mind I'd probably go with this plan: Scan at as high a resolution as possible Apply Gaussian blur Reduce resolution, and save in a grayscale format. Maximize contrast still. Convert to 2 color / monochromatic format(s) Assemble into PDF How do you do that in an automated fashion under Windows? I don't know. On Tue, Feb 18, 2025 at 2:01?PM Tom Gardner via groups.io <tggzzz@...> wrote:
|
Firstly, what I did is a starting point, not a refined process.
toggle quoted message
Show quoted text
I'm not clever enough to run windows except for a couple of games, so treat the following with caution. Fundamentally for anything repetitive, I prefer a command line. I last played with msdos shell a quarter of a century ago. Using it to script GUI app was frustrating. I believe that a modern PowerShell is much improved, and MS has also a Linux subsystem inside windows, whatever that might mean. Maybe that might help. On Tuesday, 18 February 2025, David Holland via <david.w.holland=[email protected]> wrote: > Yeah, it kind of looks like the B&W dithering is occurring in the > hardware/device software as all 4 of the BMP's are only monochrome, w/ > varying levels of dithering.? ?(The 600DPI/No Dithering read best, but > I suspect photos will be completely terrible w/ that setting. ) There > may simply not be a way to get greyscale out of the device's software. > > Tom's tggzzz test file looks pretty good w/ a gaussian blur step applied. > > w/ that in mind I'd probably go with this plan: > > Scan at as high a resolution as possible > Apply Gaussian blur > Reduce resolution, and save in a grayscale format. > Maximize contrast still. > Convert to 2 color / monochromatic format(s) > Assemble into PDF > > How do you do that in an automated fashion under Windows?? I don't know. > > On Tue, Feb 18, 2025 at 2:01?PM Tom Gardner via > <tggzzz=[email protected]> wrote: >> >> In lieu of having a simple .jpg file, I've created something vaguely similar and processed that... >> 1) take your TIFF_1.tif, and remove the "speckle" by using a gaussian blur using the gimp default setting (whatever that might be). Then save that as TIFF_1.jpg. >> 2) use scancvt to produce b-TIFF_1.tif. That's a 5120x3865 file, 88kBytes only. IMHO it is more legible than the original TIFF_1.tif file. Processing time: 7s. >> 3) I have uploaded that output as tggzzz-b-TIFF_1.tif Feel free to delete it. >> >> The 5120 is derived from a magic constant in the scancvt script, and can be easily changed using a text editor. Increasing it is useful when converting multi-page schematics. >> >> The key to that is removing the "speckle". I used gimp, but no doubt there is a simple command line script which could be trivially inserted into an automated workflow. >> >> It might be possible to avoid introducing the speckle in the first place by saving the scan as a jpg file, possibly with a low resolution. I've scaled my TIFF_1.jpg down to 1560x1178 (i.e 1/16 the pixels), used scancvt, and the result is just as legible. >> > > > > > > |
Just looking through the specs for the MS800, noticed - 'Greyscale up to 256 levels *'? ?'* 128MB ram required'
Looks like there is a 144 pin SIM for memory that is currently empty
Might see if I can find the right module and see if the greyscale option then appears
?
Have put the page from the service manual that shows the processing chain into the files directory
?
Peter |
We are a Tungsten reseller and implement documents / image processing solutions. Tungsten, formerly Kofax, is a heavy hitter in that sector.? If you are just looking for image processing and output production, the document management aspect PaperPort might be a bit overkill.? Tungsten has a few imaging solutions but all rely on the same base image processing engine. Just think "small, medium, large"
?
Image processing is one complicated area, and trying to re-invent the wheel, while admirable, is a very daunting task.
?
At the very base of it, it is the image processing capabilities that you are looking for. Of course, the processing is only as good as the source electronic image. Therefore, help yourself out with as high DPI as reasonably possible.
?
It's mentioned that there are some example files somewhere, I am new here and to Groups, so if someone could point me in the right direction to get them, I could run them through Tungsten Express to see the outcome.
? |
I use ghostscript to compress PDFs, among many other pdf operations. I never notice any quality differences in the output after compression with the default settings, and I'm usually scanning music that was poorly scanned, but has a lot of fine details that need to be easily readable when reading the music afterwards. I also use ScanTailor Advanced for fixing up the initial scans - it fixes rotation / skew problems, and can do page splitting (if you scan or photograph 2 pages of a book simultaneously and want to split those images into separate PDF pages), re-adjust margins, de-speckle, and also has adjustable thresholds from converting from grayscale to true black and white - changing that threshold can have the effect of fattening or thinning all of the lines in the document. I find especially with poorly scanned music, thick lines lead to all the white space getting filled up, and it makes it much harder to read later, so adjusting the threshold at grayscale -> b/w conversion can really help thin the lines down, and open back up the whitespace, increasing legibility a lot. Both ghostscrpt and ScanTailor Advanced are open source / free. My workflow usually involves other various linux tiff / pdf utilities as well, including PDFJam which is a wrapper for some Latex PDF utilities. My typical workflow is: poorly scanned input PDF - pages not straight / skewed, margins all over the place convert pdf into multi-page tiff file run through scantailor -> produces individual tiff per page recombine those individual page tiffs into a multi-page tiff convert multipage tiff to pdf compress with ghostscript if needed use PDFJam to either "n-up" (for only 2 pages) or "pdfbook"-ize (for more than 2 pages) the resulting 8.5x11 PDF onto 11x17 paper The main command I use for ghostscrpt to compress PDF's is: gs -dBATCH -dNOPAUSE -q -sDEVICE=pdfwrite -sOutputFile=./out.pdf [list of input files - .ps or .pdf] I often see original pdfs which uncompressed can be a few hundred MB shrink down to a MB or 2, with no discernable loss of quality. On Mon, Feb 17, 2025 at 6:42?AM Peter Brown via <peter=[email protected]> wrote:
|
Hi Bert,
?
Many thanks for the kind offer.? I am still getting to grips with the MS800 so the current scans are not the best quality.?
?
The scanner (a Canon MS-800) hails from a time when post processing by PC was a lot harder.? It looks as though the designers have tried to put as much processing as possible into the onboard hardware and by default the machine sends this processed data to the attached PC.?
The machine appears to have two basic modes, a text mode and an image mode.? The image mode works OK for text based pages but generates files that are around 3 or 4Mb per page
?
Sample images are in the files section -
left hand home menu, select files
Then locate and click on A temporary directory for photographs and help relating to emails and postings The locate and click on 8430A 08340-90021 Service vol 1 section1
?
PDFs in this directory are the processed output from the scanner
?
There is a ray of hope - it looks like a gray-scale option can be installed on the machine that appears to bypass much of the onboard processing and pass raw greyscale data to the connected PC
?
?
? |
Yes, image processing has vastly changed in the passed decades. I remember the days of dedicated image processing boards and the like. It's just not like that anymore. And to be fair, coming from fiche has its own challenges if only for the source material quality and constraints.? E.g. stitching a schematic that would be over several cells on a fiche.? Also, the base image quality per fiche and / or from the underlying source document may shift, so the image cleanup settings for one page may not work for another.
?
Don't worry about the image size on the way out of the device, worry about the output size. All modern image processing systems can output an image of a different resolution / colour depth. The one rule is that you can't get back what you never had.? Of course there is a trade-off in speed depending on settings. E.g. a scanner rated at 80 pages a minute, will typically do this at 300 DPI B/W. Push the scanner to the max, usually like 600 and you get half speed. Depends on the specs of the job.?? Our rule-of-thumb is 300 DPI B/W for documents: contracts, invoices, reports, etc. and 200 grayscale (256).? If time is no object, then go as high as you can and get a coffee.
?
I would avoid at all costs getting a PDF off your scan device. Work with a TIFF, preferably using lossless or no compression if at all possible. Generally JPEG is lossy. At the base of it, in the PDF file, there are just image page objects embedded. Depending on the image processor they may not like or support extracting the page images from a PDF.
?
Modern systems, being basically unconstrained by resources will now usually work on a full colour image off the wire, convert it as best for the task at hand, e.g. OCR, indexing / validation display, final output. Perhaps that is what is missing in your current image processing pipeline.? Also, you have to consider that these are "mixed content" documents, that is drawings, written text, tables, etc. which makes it all the harder to get a single consistent profile working.? E.g. looking at one of the rougher examples, applying a despeckle profile to page 1 (the top and bottom view of the chassis and a assembly part location) I can loose some of the dots... That is NOT a good thing.
?
Hope this helps in some way. |
I have done a huge amount of scanning, converting and OCRing over the last 30+ years so I will offer a viewpoint.
?
I use the free program Irfanview to do any post processing I need, plus change resolution or type. It allows easy batch processing so you can adjust contrast, brightness, compression, crop, file type and do a rename all in the same batch over dozens or hundreds of images.? I use it on my Windows and Fedora Linux (under WINE) machines.
?
GIMP is also great for image editing but Irfanview is quick and easy to use.
?
I have several commercial and opensource OCR packages. Tesseract is hands down the best of the bunch. TIFF is the way to go for scanning format if you are going to OCR, but I have had good luck OCRing with JPG in many cases where the original text was pretty clear.
?
OCRmyPDF is good for post processing PDF files (scanned to a PDF or bunch of PDFs) to make them readable. It uses Tesseract behind the scenes. ?
?
I use an Epson GT-10000 large format scanner to reduce the need to stitch schematics or other fold-outs together, plus it has autofeed and a duplexer. I have a couple of HP photo/slide scanners but have not really been happy with microfiche scanning with these. I think you really need a purpose built microfiche scanner but I can't justify getting one.
--
T. Gerbic Central California |
to navigate to use esc to dismiss