开云体育

ctrl + shift + ? for shortcuts
© 2025 开云体育

Service manual scan post processing


 

To the best of my understanding they should be the raw scanner output.
?
The scanning program is a little obtuse though so it is possible that some additional processing has been applied, I will check again
?
Peter


 

In the scanning application, when you select either 16 level of 256 level gray scale, the software automatically
?
- locks you out of contrast adjustments, setting the value to midband
- sets the dither control to 'none' and locks you out of making changes
?
In this mode, the program only gives you access to a brightness slider and a DPI dropdown
?
This being said, I am wondering if the highest DPI setting that I am using (600 DPI) is not the maximum optical resolution but interpolated from a lower physical resolution
?
I will check this?
?
Peter
?
?


 

Looking at the service manual for the Canon MS-8000 reveals the following
?
Photosensitive element?
1) Type: CMOS CIS (image sensor
2) Density of element: 300 dpi
3) Effective elements: 3488 (effective reading length 295.3mm)? (this calculates out at 300dpi)
?
Output resolution
1} Standard: 300 x 300 dpi
2) Fine: 600 x 600 dpi
3) High speed: 200 x 200 dpi
?
So I guess that the native resolution is 300 x 300 dpi and there must be some sort of interpolation for the 600 x 600 scans
?
I will rescan a couple of representative pages at 300 x 300 dpi, 256 level gray scale? for members to take a look at.
Hopefully this will close out the topic of finding the best resolution / bit depth / post processing to use when getting scans from the archive to people
?


 

On 2/28/25 10:00, Peter Brown wrote:
Any other takers for trying to post process the 256 level .tiff files? -? found at
files - temporary directory for photographs and help relating to emails and posting/11707A Service PFX 1525A 11707-90007 SEP 1977 rev 1
The files contain pages of just text, pages with text and images, and pages with circuit diagrams
I'll leave them up for another couple of days and then remove as they are pretty big
I took a whack at it and didn't get anywhere near as much of a size reduction as you did. But, the file looks great, passes a PDF/A validator, and has been OCRed. And, no embedded ads. ;) I've put it in the directory with the TIFF files.

I applied a binary threshold to all of the text pages, and posterized the pages with graphical content at varying levels, mostly 24 and 32, to preserve the image quality.

-Dave

--
Dave McGuire, AK4HZ
New Kensington, PA


 

On 2/28/25 16:18, Peter Brown wrote:
Looking at the service manual for the Canon MS-8000 reveals the following
Photosensitive element
1) Type: CMOS CIS (image sensor
2) Density of element: 300 dpi
3) Effective elements: 3488 (effective reading length 295.3mm)? (this calculates out at 300dpi)
Output resolution
1} Standard: 300 x 300 dpi
2) Fine: 600 x 600 dpi
3) High speed: 200 x 200 dpi
So I guess that the native resolution is 300 x 300 dpi and there must be some sort of interpolation for the 600 x 600 scans
I will rescan a couple of representative pages at 300 x 300 dpi, 256 level gray scale? for members to take a look at.
Hopefully this will close out the topic of finding the best resolution / bit depth / post processing to use when getting scans from the archive to people
I sure would like to get my hands on one of those scanners.

-Dave

--
Dave McGuire, AK4HZ
New Kensington, PA


 

开云体育

I've had a play with GIMP on Linux, and I can (to my eye anyway) sharpen things up quite nicely.

But, as yet, I've not hit a file with a schematic!.

Can you tell us which file(s) have schematic diagrams, as their needs are often different to processing text.

Regards.

Dave 'KBV.



 

On 2/28/25 17:11, Dave_G0WBX via groups.io wrote:
I've had a play with GIMP on Linux, and I can (to my eye anyway) sharpen things up quite nicely.
But, as yet, I've not hit a file with a schematic!.
Can you tell us which file(s) have schematic diagrams, as their needs are often different to processing text.
I didn't bother with any sharpening; the original scans really aren't too bad. One example of a page in that document that has a schematic is page 20.

I did a quick pass in GIMP on each TIFF file, to determine the most reasonable approach on a per-page basis. Then I threw together a quick script that used ImageMagick to apply those functions per file, concatenate them to a PDF, then do the OCR pass.

Some of those pages would benefit from some manual editing, like cropping, etc, but they're perfectly readable as they are, at least to me, and having the text layer underneath the page image is good.

-Dave

--
Dave McGuire, AK4HZ
New Kensington, PA


 



On Friday, 28 February 2025, Peter Brown via <peter=[email protected]> wrote:
> So I guess that the native resolution is 300 x 300 dpi and there must be some sort of interpolation for the 600 x 600 scans
> ?
> I will rescan a couple of representative pages at 300 x 300 dpi, 256 level gray scale? for members to take a look at.
> Hopefully this will close out the topic of finding the best resolution / bit depth / post processing to use when getting scans from the archive to people

To some extent it is possible to trade XY resolution with depth resolution, and vice versa. Just make clear what has been used.

For a paper source 300dpi is more than enough. Your scans seem reasonable enough, hence my surprise that holes in letters had disappeared.

Reducing the pictures, e.g. *19.tif, to a two level bitmap is unlikely to be successful. To reduce the size, I would try reducing the XY resolution to largely remove the "grain effect", then convert to a .jpg format.


 

On 2/28/25 17:50, Tom Gardner via groups.io wrote:
Reducing the pictures, e.g. *19.tif, to a two level bitmap is unlikely to be successful. To reduce the size, I would try reducing the XY resolution to largely remove the "grain effect", then convert to a .jpg format.
Using JPEG for those will result in nasty artifacting around the text, border box, etc. I'd not recommend that at all.

-Dave

--
Dave McGuire, AK4HZ
New Kensington, PA


 

I don't want to have to dig for that so give me a link, I'll post process and get back to you.

I have long since and flat out mastered document scanning and restoration, I've done 1,000s upon 1,000s of pages and brought documents back from nothing to virtually as printed appearance.

You need Acrobat 9.xx, Photoshop 7.01 with some well configured filters, scan 600dpi .tiff-s !!ONLY!! , use Acrobat to assemble the Photoshopped.tiff-s then Clearscan OCR. Don't worry about the .tiff file sizes, ClearScan will reduce the size about 10-fold.

I use an Epson V600 that talks USB2 and the companion Epson software and a Canon DR-4010C that talks USB2 and SCSI, which I use, and it's companion driver and software. It'll duplex 24-bit color @ 600 dpi at about one second per face.

I would be interested in feedback from members as to how well their post processing for .pdf works with these files.
Especially interested in ways of preserving the picture quality whilst also tidying up / compressing the pages of text
Peter
Bill @ PEARL, Inc.


 

I should have said:

[...] scan 600dpi, ==greyscale== .tiff-s !!ONLY!!

I don't want to have to dig for that so give me a link, I'll post
process and get back to you.
I have long since and flat out mastered document scanning and
restoration, I've done 1,000s upon 1,000s of pages and brought documents
back from nothing to virtually as printed appearance.
You need Acrobat 9.xx, Photoshop 7.01 with some well configured
filters, scan 600dpi .tiff-s !!ONLY!! , use Acrobat to assemble the
Photoshopped.tiff-s then Clearscan OCR. Don't worry about the .tiff file
sizes, ClearScan will reduce the size about 10-fold.
I use an Epson V600 that talks USB2 and the companion Epson software
and a Canon DR-4010C that talks USB2 and SCSI, which I use, and it's
companion driver and software. It'll duplex 24-bit color @ 600 dpi at
about one second per face.

I would be interested in feedback from members as to how well their post
processing for .pdf works with these files.
Especially interested in ways of preserving the picture quality whilst
also tidying up / compressing the pages of text
Peter
Bill @ PEARL, Inc.


 

Don't know where I read this, but it said: better use high resolution with JPEG compression than low resolution without. For the same file size, that is.

My experience: 300dpi is largely sufficient even for schematics, with compression. If the original is bad, has exceptionally small lettering or such, I take 600dpi as a precautionary measure.

Of course JPEG does the artefacts around sharp contrasts, but as long as its perfectly readable I prefer that to humongous files that take minutes to open...

cheers
Martin

Using JPEG for those will result in nasty artifacting around the text, border box, etc. I'd not recommend that at all.


 

Liam / Bill / William: Rather than browse mailing lists please send me
my manual which I haven't received for a very long time now.



On Sat, Mar 1, 2025 at 4:39?AM Liam Perkins via groups.io
<sales@...> wrote:

I don't want to have to dig for that so give me a link, I'll post
process and get back to you.

I have long since and flat out mastered document scanning and
restoration, I've done 1,000s upon 1,000s of pages and brought documents
back from nothing to virtually as printed appearance.

You need Acrobat 9.xx, Photoshop 7.01 with some well configured
filters, scan 600dpi .tiff-s !!ONLY!! , use Acrobat to assemble the
Photoshopped.tiff-s then Clearscan OCR. Don't worry about the .tiff file
sizes, ClearScan will reduce the size about 10-fold.

I use an Epson V600 that talks USB2 and the companion Epson software
and a Canon DR-4010C that talks USB2 and SCSI, which I use, and it's
companion driver and software. It'll duplex 24-bit color @ 600 dpi at
about one second per face.

I would be interested in feedback from members as to how well their post
processing for .pdf works with these files.
Especially interested in ways of preserving the picture quality whilst
also tidying up / compressing the pages of text
Peter
Bill @ PEARL, Inc.





 

Just took a whack at it myself. It went better than my "08340-90021 Vol 4 Process Attempt" I think.
?
Peter and Dave, both of your processed files appear good to me. That final schematic (page 26 11707A0026.tif) is a rough one no matter what. Pages 14-17 are a bit tricky and the final two pages are a bit of a nightmare. The question is are were going for "archival" or "readable". My processed output is certainly not archival, but mostly readable. I would be in favor of having a heavily processed (yet readable) version that can easily be downloaded and printed, and then save/upload the raw ingested tiff's (losslessly compressed, lzw) as the "archival" masters. For a working readable copy, certain things are more important (like the parts lists and component values) whereas the the final two pages (Sales & Service Offices) are pretty useless for us, but still certainly important from an archival perspective.
?
Regarding dpi, I have found 300dpi sufficient for many things. Of course 600dpi where possible would always be preferred for true archiving. A good example of schematics that need a high dpi (600+) scan would be the 4194A schematics. Those freebie 4194A pdf's floating around are unreadable!
?
-Michael


 

On 3/1/25 01:20, Martin via groups.io wrote:
Don't know where I read this, but it said: better use high resolution with JPEG compression than low resolution without. For the same file size, that is.
Don't believe everything you read. There's a vast amount of misunderstanding and bad information about JPEG floating around; it has been that way for a very long time. This is a result of people thinking it's just fine to use technology without first learning even the most basic things about it, while at the same time cheerfully ignoring the advice of people who do know something about it.

My experience: 300dpi is largely sufficient even for schematics, with compression. If the original is bad, has exceptionally small lettering or such, I take 600dpi as a precautionary measure.
At the museum we use 600DPI as a minimum for most things. Data storage is cheap, compression works great, and cheap paper is aging quickly.

Of course JPEG does the artefacts around sharp contrasts, but as long as its perfectly readable I prefer that to humongous files that take minutes to open...
That depends on the goal, but I personally believe that when any document is scanned, it should be treated as if it is the only known copy (which it may be), and that getting it scanned may be a matter of life and death for someone in the future (which it may be). Either may be true; we have no way to know.

The artifacts introduced in JPEG compression, in particular those which result in 8x8 blocking (JPEG's DCT and other steps operate on 8x8 pixel blocks) make subsequent OCR very difficult. This can never be undone, as JPEG is a lossy compression algorithm. JPEG *discards data* in changing the nature of an image file to exploit weaknesses in the human vision system. This is detrimental to most any sort of subsequent processing that may need to be performed.

So having researched this exhaustively many years ago, and having been "all up in there" with the JPEG algorithm, I stand by my assertion that JPEG should never, ever be used for something like this. There are plenty of other compression algorithms which will result in the same, if not better image size reductions.

-Dave

--
Dave McGuire, AK4HZ
New Kensington, PA


 

While we seem to all agree that JPEG compression is not ideal, there ARE lossless options for TIFF and thus PDF formats.? While yes, storage and networking and processing and scanning time is all relatively cheap, when someone is paying all that on their own dime, there are decisions to be made.
?
Where I come from, the rule for text-ish documents is 300 B&W and for image-ish it's 200 @256, and this for financial, governmental types.
?
The one thing with image cleanup routines, they are mostly made for text based documents and are looking to clean up the text. As such, they don't have the "understanding" as to the particularities of schematics.? In the way back days, there was vectorizing tools to help engineering firms convert scanned drawings, those things on dead, ground up trees to CAD files. But they required both a lot of work to verify the exactitude of the resulting CAD file and possibly re-verify and re-authorize it.?
?
The basic rule here is that you can clean up or otherwisely "alter" a document as long as the understanding does not change.


 

On 3/2/25 11:25, Bert via groups.io wrote:
While we seem to all agree that JPEG compression is not ideal, there ARE lossless options for TIFF and thus PDF formats.
At the museum, we use ITU-T T.6 (Group 4 FAX), which is lossless and very effective.

-Dave

--
Dave McGuire, AK4HZ
New Kensington, PA


 

On Sun, 2 Mar 2025 at 16:25, Bert via <el_bert0=[email protected]> wrote:
The one thing with image cleanup routines, they are mostly made for text based documents and are looking to clean up the text. As such, they don't have the "understanding" as to the particularities of schematics.? In the way back days, there was vectorizing tools to help engineering firms convert scanned drawings, those things on dead, ground up trees to CAD files. But they required both a lot of work to verify the exactitude of the resulting CAD file and possibly re-verify and re-authorize it.

A very useful amount of cleanup can be done simply by comparing each pixel on its own to the average of the wide neighbourhood pixels. That works equally well for text and schematics, since it has no concept of either and they are both high contrast. It does not work well with photos.

How well such text/schematic cleanup works depends on the original's quality. Unsurprisingly, higher XY resolution and uniform illumination/background are advantageous, since both allow the "slicing level" to be more easily determined and more uniformly applicable.

Provided there are sufficient pixels for the text characters to be reasonably well formed (open loops, no "spurs"), OCR is then possible. Evidence: a couple of manuals I have quickly scanned on an ordinary printer/scanner/copier, post processed into TIFFs inside a PDF file, and somebody else has OCRed.

I have no comment about vectorising; I have never had any use for such a tool. I would expect it to be as (un)successful as decompiling binary object code back into a high level language source code.