Digitization is done in three stages: Image Capture, Clean up/OCR work, and metadata. Image Capture is fairly straight forward and is performed either with a scanner or a digital camera.
MINIMUM SCANNING GUIDELINES
“Master files” are intended to be archival-quality digital images. These files used to generate derivatives (“access” and “thumbnail”) files for present-day web delivery. They are archived and will be used to generate other derivative versions for future uses.
Text based originals (Books, pamphlets, archival materials, etc.)
Master File FormatTIFF Bit Depth1 bit bitonal 8 bit grayscale 24 bit color Spatial Resolution300-600 ppi (400 ppi and up for OCR purposes) Spatial Dimensions100% of original
Grayscale scanning is most often done for these materials, unless the original document had important color information such as a seal, or if information best captured with a color scan, such as paper deterioration, is deemed important for a specific project.
Text based copies (Scans from old photocopies or prints from microfilm). This is less of a concern since virtually all of these will be unable to generate a “good” copy.
Master File FormatTIFF Bit Depth1 bit bitonal 8 bit grayscale 24 bit color Spatial Resolution300-600 ppi (400 ppi and up for OCR purposes) Spatial Dimensions100% of original
Photographic originals
MasterAccessThumbnail File FormatTIFFJPEGJPEG Bit Depth8 bit grayscale / 24 bit color8 bit grayscale / 24 bit color8 bit grayscale / 24 bit color Spatial Resolution300-800 ppi or 3000 to 5000 pixels across the long dimension (600 ppi preferred)72 ppi72 ppi Spatial Dimensions100% of original600 pixels across the long dimension150-200 pixels across the long dimension
Minimum settings should be geared towards 3000 pixels on a side (i.e. an 8x10 image scanned at 300 ppi, 4x5 at 600. [note: A 35mm contact print at (1 3/8” would be about 2230 ppi, but in fact many inexpensive scanners can’t go higher than 1200 ppi (they just pretend to using interpolation software). Look for the scanner’s true optical resolution]
In general, color photographs should be scanned as 24-bit RGB color and black & white photographs in 8-bit grayscale. There are many cases, however, when black & white photographs would benefit from color scanning, for example, when they are sepia-toned or badly faded.
Scan the negatives before prints.
Verso of Photographic originals
When the back (verso) of reflective images (e.g., photographs) contain annotations, drawings, or other significant markings, the backs should be digitized as well. Format should be grayscale with resolution set at 200 dpi unless two or more colors are present.
Naming Conventions:
Use the collection number for all images with the following annotations.
…r = Recto (or front)
…v = Verso (or back)
…u = Uncompressed/unaltered digital master
…r = Screen size reference/Access files
…f = Full screen files
Steps:
- Capture the original image. Use a flat bed scanner, or an overhead camera/scanner. Please note that an 8.5 x 11 sheet of paper at 300 dpi is 8.4 megapixels. An 8.5 x 11 sheet of paper at 400 dpi is 15 megapixels. Also, a clean sheet of glass should be placed in such a way as to minimize distortion from pages bulging away from the gutter of a book. Reflections on this glass can be countered by 2 bright lights placed at 45 degrees and turning off all the other lights in the room.
- Save as a tiff (jpgs have inherent flaws for long term storage, RAW file types are proprietary and incompatible with other RAW formats). It would be better if the images were named something like Author_Titlep#u.tiff. That is Author, Title, p for page, Number, u for unaltered digital master. If your scanner or camera saves as a jpg, convert to TIFF, ASAP. These masters should be saved temporarily on a cd and sent either to our webmaster, or the librarian (I really don’t care which). From there they should be saved more permanently on a terabyte external drive which will need to be purchased by the HCC from cash donations.
- Convert the tiffs to a temporary pdf. These should be fairly low density, and should not exceed 10 mb per file. If the text has to be broken up into smaller pdfs, that’s fine, since these are only temporary.
- Converting to a permanent format should be copies taken from the unaltered master TIFFS. Use the OCR software you have available. Illustrations and photographs can be cleaned up with any Draw and photo software you are comfortable with. The outcome may be dependent on the pdf resolution.
- The Standard for pdf creation is Adobe, and other packages should be avoided unless it can be positively determined that they don’t take proprietary coding shortcuts (I’ve lost too many files to other pdf programs that are unreadable by any other system than their own).
Metadata is that encoded data that tells us what the files are, and who did it. This should be put in place when the pdf is created in Adobe.