Service manual scan post processing

Peter Brown All Messages By This Member #150156 I have recently been scanning sections of microfiched service manuals for a couple of group members using a Canon MS-800 There is a significant tradeoff between file size and readability (especially with circuit diagrams) To simplify the scanning process I have been acquiring everything at maximum equipment resolution but this leads to files that might be 200Mb+ per fiche These are unwieldy but get the job done ? Does anyone in the group have experience with tools that might be used to post process these scans to reduce size whilst maintaining small font fidelity? Any recommendations?
Alexandre Souza All Messages By This Member #150157 I scan a lot of manuals, pdfize it and thrown in my esquemateca (link in my sig) I use Adobe Acrobat Pro (I believe V11), using low compression and clearscan. The output is great. Remember to mark the option "full page on screen" in properties before saving. ---8<---Corte aqui---8<--- - High quality schematics and service manuals FREE scanned by me ---8<---Corte aqui---8<--- toggle quoted message Show quoted text On Mon, Feb 17, 2025 at 11:42?AM Peter Brown via <peter=[email protected]> wrote: I have recently been scanning sections of microfiched service manuals for a couple of group members using a Canon MS-800 There is a significant tradeoff between file size and readability (especially with circuit diagrams) To simplify the scanning process I have been acquiring everything at maximum equipment resolution but this leads to files that might be 200Mb+ per fiche These are unwieldy but get the job done ? Does anyone in the group have experience with tools that might be used to post process these scans to reduce size whilst maintaining small font fidelity? Any recommendations?
Peter Brown All Messages By This Member #150158 Thanks, Alexandre? I will take a look.? Seems like Acrobat Pro v11 is no longer supported? - any idea experience of their current product?
Dave Daniel All Messages By This Member #150159 I usually scan the manual pages at 1200 or 600 dpi to PDF form and save them. Then if I want to load them on a reading device or share them, I'll use Adobe to downsample the file(s) to a lower resolution, usually 300 dpi. DaveD KC0WJN toggle quoted message Show quoted text On Mon, Feb 17, 2025 at 09:42 Peter Brown via <peter=[email protected]> wrote: I have recently been scanning sections of microfiched service manuals for a couple of group members using a Canon MS-800 There is a significant tradeoff between file size and readability (especially with circuit diagrams) To simplify the scanning process I have been acquiring everything at maximum equipment resolution but this leads to files that might be 200Mb+ per fiche These are unwieldy but get the job done ? Does anyone in the group have experience with tools that might be used to post process these scans to reduce size whilst maintaining small font fidelity? Any recommendations?
Dave Daniel All Messages By This Member #150160 Check ebay, etc. to see if someone is selling an older copy. I've been using Acrobat X (10) for a long time with no proplems. DaveD KC0WJN toggle quoted message Show quoted text On Mon, Feb 17, 2025 at 10:13 Peter Brown via <peter=[email protected]> wrote: Thanks, Alexandre? I will take a look.? Seems like Acrobat Pro v11 is no longer supported? - any idea experience of their current product?
John Ackermann N8UR All Messages By This Member #150162 I'm hesitant to bring this up because I'm only just barely beginning to understand it and create a workflow, but as an alternative to Adobe, there is a Google Cloud "Vision" API that does OCR of PDF files. According to ChatGPT, it does a better job than the various open source tools would, though I don't know how it compares to Acrobat. You need a Google cloud or workspace account, and from there you set up a cloud bucket to hold the raw PDFs, and then create an API Key to the Vision API. Then a Python script can call the Google APIs to trigger conversion of the PDF to a text only document. Most of the pain is getting the bucket and API set up with the right permissions and account info. Believe it or not, I used ChatGPT to walk me through the whole process and even write the Python script! (Which I'm happy to share.) Google lets you process 1000 pages per month for free, and it's an additional $1.50/1000 pages thereafter. But I found that my Google Workspace account gave me a $300 credit, so I can do a lot of conversion before I have to pay any real money. Anyway, this may be too far down the rabbit hole, but looks like it would work well for processing large numbers of documents automatically; even at $1.50 per thousand pages, it's pretty inexpensive. John ---- toggle quoted message Show quoted text On 2/17/25 10:13, Peter Brown via groups.io wrote: Thanks, Alexandre? I will take a look. Seems like Acrobat Pro v11 is no longer supported? - any idea experience of their current product?
Harold Foster All Messages By This Member #150163 We switched from the exorbitantly priced Adobe to PDF-XChange several years ago at work (and me personally) and could not be happier.? Excellent and very functional product at 60-70 USD per user. ? ? Again, highly recommended. ? Hal
Peter Brown All Messages By This Member #150164 Hi John,?sounds interesting. Would you be interested in running a few sample pages through the process? Peter
David Holland All Messages By This Member #150165 This +1. 600dpi, G4 compression. (Not JPEG compression -- Never JPEG) I've got some terrible Linux scripts that use NETPBM/ImageMagick/Tiff tools to build PDFs for the few I've ever scanned, but the process varies greatly for every different document. On Mon, Feb 17, 2025 at 10:14?AM Dave Daniel via groups.io <kc0wjn@...> wrote: I usually scan the manual pages at 1200 or 600 dpi to PDF form and save them. Then if I want to load them on a reading device or share them, I'll use Adobe to downsample the file(s) to a lower resolution, usually 300 dpi. DaveD KC0WJN On Mon, Feb 17, 2025 at 09:42 Peter Brown via groups.io <peter@...> wrote: I have recently been scanning sections of microfiched service manuals for a couple of group members using a Canon MS-800 There is a significant tradeoff between file size and readability (especially with circuit diagrams) To simplify the scanning process I have been acquiring everything at maximum equipment resolution but this leads to files that might be 200Mb+ per fiche These are unwieldy but get the job done Does anyone in the group have experience with tools that might be used to post process these scans to reduce size whilst maintaining small font fidelity? Any recommendations?
Dave Daniel All Messages By This Member #150166 I just saw John's post. I forgot to add to my earlier post (below) that I also convert all documents that I scan to OCR format as well. I do not intend for this follow-up post about my omission to be any comment, observation, endorsement or criticism about John's post. DaveD KC0WJN toggle quoted message Show quoted text On Mon, Feb 17, 2025 at 10:14 Dave Daniel via <kc0wjn=[email protected]> wrote: I usually scan the manual pages at 1200 or 600 dpi to PDF form and save them. Then if I want to load them on a reading device or share them, I'll use Adobe to downsample the file(s) to a lower resolution, usually 300 dpi. DaveD KC0WJN 开云体育 Links: You receive all messages sent to this group. View/Reply Online (#150159) \| Reply to Group \| Reply to Sender \| Mute This Topic \| New Topic Your Subscription \| Contact Group Owner \| Unsubscribe [kc0wjn@...] _._,_._,_
Peter Brown All Messages By This Member #150167 If anyone wants to have a go, there are some sample scans here? ? Files - A temporary directory for photographs and help relating to emails and posting - 8430A 08340-90021 Service vol 1 section1 ? They are PDF'd from the scanning software with minimum compression.? The software will also produce .tiff files if these are a better place to start ? Peter ?
evan foss All Messages By This Member #150168 Why not just upload the uncompressed output to and let people compress it in the future? I think if you upload the raw jpeg files they even do that for you. toggle quoted message Show quoted text On Mon, Feb 17, 2025, 9:42 AM Peter Brown via <peter=[email protected]> wrote: I have recently been scanning sections of microfiched service manuals for a couple of group members using a Canon MS-800 There is a significant tradeoff between file size and readability (especially with circuit diagrams) To simplify the scanning process I have been acquiring everything at maximum equipment resolution but this leads to files that might be 200Mb+ per fiche These are unwieldy but get the job done ? Does anyone in the group have experience with tools that might be used to post process these scans to reduce size whilst maintaining small font fidelity? Any recommendations?
Peter Brown All Messages By This Member #150169 Hi Evan, I understand where you are coming from, best possible is best possible .. but one manual I have been asked about is 38 fiches long - almost 8Gb ? I am also wondering if current predictive tools might also be able to repair pars of the scans where low contrast causes portions of e.g. single letters to drop out. If the scans can be post processed so that all of the text information is 100% there is less case to store this at higher resolution - images are different, storing these at best possible resolution seems wise Peter
Dave Daniel All Messages By This Member #150170 John, Just so I understand, "Vision" (only) performs the OCR process step of reproducing a printed document in OCR PDF form? Or did I mis-read your post? DaveD KC0WJN toggle quoted message Show quoted text On Mon, Feb 17, 2025 at 10:28 John Ackermann N8UR via <jra=[email protected]> wrote: I'm hesitant to bring this up because I'm only just barely beginning to understand it and create a workflow, but as an alternative to Adobe, there is a Google Cloud "Vision" API that does OCR of PDF files. According to ChatGPT, it does a better job than the various open source tools would, though I don't know how it compares to Acrobat. You need a Google cloud or workspace account, and from there you set up a cloud bucket to hold the raw PDFs, and then create an API Key to the Vision API.? Then a Python script can call the Google APIs to trigger conversion of the PDF to a text only document.? Most of the pain is getting the bucket and API set up with the right permissions and account info. Believe it or not, I used ChatGPT to walk me through the whole process and even write the Python script!? (Which I'm happy to share.) Google lets you process 1000 pages per month for free, and it's an additional $1.50/1000 pages thereafter.? But I found that my Google Workspace account gave me a $300 credit, so I can do a lot of conversion before I have to pay any real money. Anyway, this may be too far down the rabbit hole, but looks like it would work well for processing large numbers of documents automatically; even at $1.50 per thousand pages, it's pretty inexpensive. John ---- On 2/17/25 10:13, Peter Brown via wrote: > Thanks, Alexandre? I will take a look. > Seems like Acrobat Pro v11 is no longer supported? - any idea experience > of their current product? >
Tom Gardner All Messages By This Member #150172 My workflow on a linux system ends up with approx 80kBytes/ page, which can be halved by using djvu format. I'm a skinflint, so it only uses free-as-in-beer software...? I've had compliments about the quality, but judge for yourself. The workflow is described for a single page. Normally there will be a multiplicity of files/pages with numeric suffixes; simply use the traditional shell script wildcards to process them all at once. 1) scan at 300dpi to produce colour jpg file, e.g. "filename.jpg" 2) normally convert filename.jpg to small tiff files using the scancvt script below "scancvt filename.jpg" . That creates variants of the input? "b-filename.tif" and "g-filename.tif". The g-filename.tif variant is better for grayscale images, but the b-filename.tif is good for black and white images and is much smaller. Select the b-* variant unless g-* is necessary. 3) use the standard command "tiff2pdf filename.tif filename.pdf" 4) concatenate all the pdf files into one using "pdfunite filename.pdf finalManual.pdf N.B.Occasionally a jpg image is required, in which case replace (2) and (3) to reduce the size using gimp and its posterising actions to produce "colour.jpg", then convert that file to a pdf using "convert colour.jpg colour.pdf" If you want to see manual created like that, see That's a 180 page file, mostly text and schematics with a small number of "photos". Average page size is around 80kbytes (pdf) 40kBytes (djvu). Ed at BAMA postprocessed the pdf to produce the djvu file, which of identical quality but half the size. I believe that is achieved by spotting common bits across each page, e.g. a letter "e". Alternatively see which contains a higher proportion of photos and colour PCB layouts. Still only 100kBytes/page for the pdf. The scancvt script is... #!/bin/bash # # Digital Camera + This Software + Printer = A Document Photocopier # # Input: ?pictures of B&W Text documents taken with a digital camera using # ? ? ? ? flash from about 3 feet away with no dark border around the page. # # Output1: b-file.tif (a very small B&W TIF file) # Output2: g-file.jpg (a alternative grayscale file) # # If input is purely black and white, ?Output1 should be better # If input is not purely black and white, Output2 may be better # # Corey Satten, corey @ , March 2007 do1 () { ? ?echo starting $1 1>&2 ? ?BASE="${1##/}"; NAME=${BASE%.[jJ][pP][gG]}; TMP1="t-$BASE"; TMP2="x-$BASE" ? ?trap 'rm -f "$TMP1" "$TMP2"; exit' 0 1 2 13 15 ? ?CGQ="-colorspace gray -quality" ? ?CGT="-compress group4 -density 480x480" ? ?convert $CGQ 99 "$1" -resize 5120x5120 "$TMP2" ? ?convert $CGQ 99 "$1" -resize 1024x1024 -negate -blur 15,15 -resize 5120x5120 "$TMP1" ? ?composite $CGQ 99 -compose plus "$TMP2" "$TMP1" "$TMP1" ? ?convert $CGQ 60 "$TMP1" -normalize -level 50,85% "g-$BASE" ? ?convert $CGT "$TMP1" -normalize -threshold 85% "b-$NAME.tif" ? ?rm -f "$TMP1" "$TMP2" } # This tries to detect multiprocessors and run 2 conversions in parallel # Move CPUS=1 after the test to effectively disable the test. CPUS=1 if [ -f /proc/cpuinfo ] ;then ? ? CPUS=`grep ^processor /proc/cpuinfo \| wc -l` ? ? if [ "$CPUS" -lt 2 ] ;then CPUS=1; fi fi for i in "$@"; do ? ?case $#/$CPUS in ? ? 0/) exit;; ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? # done ? ? 1/) do1 "$1"; shift;; ? ? ? ? ? ? ? ? ? ? ? ? ?# only one file to do ? ? /1) do1 "$1"; shift;; ? ? ? ? ? ? ? ? ? ? ? ? ?# only one cpu to use ? ? ? ) do1 "$1" & do1 "$2"; wait; shift; shift;; ?# process 2 files at once ? ?esac done exit 0
John Ackermann N8UR All Messages By This Member #150175 Yes, my project is focusing on OCR of existing scanned PDFs that lack it. I'm not doing any new scanning (yet). The Google tool can read other image formats, not just PDFs, so you could directly OCR a pile of JPGs or TIFFs or whatever. But I'm working through a stack of existing manual scans. John ---- toggle quoted message Show quoted text On 2/17/25 11:02, Dave Daniel via groups.io wrote: John, Just so I understand, "Vision" (only) performs the OCR process step of reproducing a printed document in OCR PDF form? Or did I mis-read your post? DaveD KC0WJN On Mon, Feb 17, 2025 at 10:28 John Ackermann N8UR via groups.io < groups.io> <jra@... <mailto:[email protected]>> wrote: I'm hesitant to bring this up because I'm only just barely beginning to understand it and create a workflow, but as an alternative to Adobe, there is a Google Cloud "Vision" API that does OCR of PDF files. According to ChatGPT, it does a better job than the various open source tools would, though I don't know how it compares to Acrobat. You need a Google cloud or workspace account, and from there you set up a cloud bucket to hold the raw PDFs, and then create an API Key to the Vision API.? Then a Python script can call the Google APIs to trigger conversion of the PDF to a text only document.? Most of the pain is getting the bucket and API set up with the right permissions and account info. Believe it or not, I used ChatGPT to walk me through the whole process and even write the Python script!? (Which I'm happy to share.) Google lets you process 1000 pages per month for free, and it's an additional $1.50/1000 pages thereafter.? But I found that my Google Workspace account gave me a $300 credit, so I can do a lot of conversion before I have to pay any real money. Anyway, this may be too far down the rabbit hole, but looks like it would work well for processing large numbers of documents automatically; even at $1.50 per thousand pages, it's pretty inexpensive. John ---- On 2/17/25 10:13, Peter Brown via groups.io <> wrote: > Thanks, Alexandre? I will take a look. > Seems like Acrobat Pro v11 is no longer supported? - any idea experience > of their current product? >
John Ackermann N8UR All Messages By This Member #150176 And I should also clarify, that the output of my workflow is a separate text-only file, not a PDF that includes both image and text. I am sure there's a way to combine the text and images into a new PDF, but that's not needed for my project. John ---- toggle quoted message Show quoted text On 2/17/25 11:02, Dave Daniel via groups.io wrote: John, Just so I understand, "Vision" (only) performs the OCR process step of reproducing a printed document in OCR PDF form? Or did I mis-read your post? DaveD KC0WJN On Mon, Feb 17, 2025 at 10:28 John Ackermann N8UR via groups.io < groups.io> <jra@... <mailto:[email protected]>> wrote: I'm hesitant to bring this up because I'm only just barely beginning to understand it and create a workflow, but as an alternative to Adobe, there is a Google Cloud "Vision" API that does OCR of PDF files. According to ChatGPT, it does a better job than the various open source tools would, though I don't know how it compares to Acrobat. You need a Google cloud or workspace account, and from there you set up a cloud bucket to hold the raw PDFs, and then create an API Key to the Vision API.? Then a Python script can call the Google APIs to trigger conversion of the PDF to a text only document.? Most of the pain is getting the bucket and API set up with the right permissions and account info. Believe it or not, I used ChatGPT to walk me through the whole process and even write the Python script!? (Which I'm happy to share.) Google lets you process 1000 pages per month for free, and it's an additional $1.50/1000 pages thereafter.? But I found that my Google Workspace account gave me a $300 credit, so I can do a lot of conversion before I have to pay any real money. Anyway, this may be too far down the rabbit hole, but looks like it would work well for processing large numbers of documents automatically; even at $1.50 per thousand pages, it's pretty inexpensive. John ---- On 2/17/25 10:13, Peter Brown via groups.io <> wrote: > Thanks, Alexandre? I will take a look. > Seems like Acrobat Pro v11 is no longer supported? - any idea experience > of their current product? >
John Ackermann N8UR All Messages By This Member #150178 If you send me a PDF of reasonable size, I can give it a try and send the text file output. It may take a day or three as I'm in the middle of setting up a separate workstation that will handle this (along with other tasks) and moving the tools off my desktop system. toggle quoted message Show quoted text On 2/17/25 10:36, Peter Brown via groups.io wrote: Hi John,?sounds interesting. Would you be interested in running a few sample pages through the process? Peter
Michael Bierlein Member Profile All Messages By This Member #150207 Peter, I have scanned quite a few manuals. I will take a look at the scans. I find it is always best to ingest the scans as tif (lossless compressed with lzw or something) and use a separate tool to convert to pdf as one of the final steps in processing. As a rule I never use the scanner's scanning software beyond what is absolutely necessary. Never successfully scanned a fiche, though. Have a Minolta MS-6000 with a broken SCSI interface. The MS-800 looks nice. Have been eyeing a ScanPro 2000/3000 for fiche for years but never found the right auction. As for OCR I use Tesseract. Free, very good, and easy to use. A tool like tiff2pdf like Mr. Gardner pointed out works well. I think my script uses img2pdf which is probably similar. -Michael Bierlein toggle quoted message Show quoted text On Mon, Feb 17, 2025 at 09:48 Peter Brown <peter@...> wrote: If anyone wants to have a go, there are some sample scans here? ? Files - A temporary directory for photographs and help relating to emails and posting - 8430A 08340-90021 Service vol 1 section1 ? They are PDF'd from the scanning software with minimum compression.? The software will also produce .tiff files if these are a better place to start ? Peter ?
Martin Member Profile All Messages By This Member #150210 High Peter, I'm using "ImageOptim". I think its free. It works on JPEG and other formats and has a number of compression algorithms where it chooses the one that fits most. You can decide if you accept some loss of information or not. Usually, taking the pics of my camera that come in at 3-4MB (medium size setting on the camera, whatever that means), they are reduced by 90% on average without me seeing any loss of information. cheers Martin

Previous Topic Next Topic

开云体育

Service manual scan post processing