Miller Center

Options for Scanning

This resource is meant to serve as a very basic starting point for someone dealing with a scanning project, large or small. You may wish to seek out more (expert!) advice depending on the nature of your project, but we hope this will help you begin!

There are a few key areas we’d like to unpack. Your decisions in these areas will depend on your goals, so we hope to empower you to make the best decisions for your particular project! These areas include:

  • Scan quality: web quality vs. print quality vs. archival quality. What’s best?
  • Who should do the scanning: an outside company? Students/interns? You? 
  • Metadata: what matters?
So many organizations seem reticent to post material until it’s perfect and complete. But how often do we actually find “perfect”?! Let us help you get started.

Scan Quality

The most essential measure of scan quality is an image’s resolution. The word “resolution” can be confusing because it has two related-but-different meanings.

“Resolution” (A confusing word!)

 
  3 dpi
 
  10 dpi
 
  50 dpi
 
  72 dpi

These are simulated examples of pixel density: the more pixels per inch, the higher quality the image.

For our purposes, “resolution” refers to the quality—that is, the depth—of an image. For example, you might have seen references to a image’s DPI, or “dots per inch.” This tells you how many pixels are crammed into a square inch of an image. A lower the number = fewer dots = less visual information = lower quality image.

“Dots per inch” comes from the world of print media, but it is often used for web development since pixel density varies from screen to screen. Note that each measurement below is for an image at 100% of its final size (400 x 300px, 2 x 3in, etc.). When you stretch or shrink an image, the pixel density (dpi) changes.

Here are some common resolutions:

  • 72dpi = Traditional “best practice” image quality for use on the web. Can appear a bit more grainy than a higher quality scan.
  • 300dpi = Considered “print quality” or “high res”. Most people can’t see any pixellation at all at this level of quality and it captures nearly the level of detail of the original.
  • 600dpi or higher = Photographic quality. Only a very keen eye is able to tell the difference between a 600dpi digitally-processed image and an analog image (for example, a letter or a photograph). Often a good choice for archival scans.

Two notes:
- Higher dpi = more data to store = bigger file size.
- Historically, digital screens were not able to display higher quality than 72dpi on the web. Now, with higher resolution screens in new devices, that is quickly changing. That said, with higher resolution comes larger file sizes and slower download time... and often for naught since many devices still aren't able to display higher-res detail. If you want to make high-res available online, I'd recommend showing a low-res (72dpi) thumbnail of the image and provide a link (not embedded image) to an optional high-res download.

Colors, greys, and the gaps between (… aka white)

There are three common color settings for scanning: full color, greyscale, and black and white.

Full color is what it sounds like: it will capture an image (to the extent the scanner is able) at 100% of its original color. It captures the most information about the original item, but then has to store that information, which makes for large file sizes that will require more storage space on your computer or hard drive.

Greyscale allows for a wide range of variation… but only variations of grey. It can be great for capturing “just enough” detail without creating the burden of giant file sizes. However, it’s important to note that visual material often has small amounts of color that we don’t notice. For example, ink on a letter might have a slight tinge of red or blue. If scanned greyscale, the ink will appear fully colorless (though will show variety in depth, thanks to the wide range of greys). Note that greyscale images will also require a lot of storage space, though less space than full color.

Black and white is just that: only black and the space between black (white). It does not capture much nuance. For archival purposes, it’s very rare that you would want black and white scans. That said, because they’re capturing and storing less visual information, they do make for smaller file sizes. Significantly smaller! But the payoff comes in lower quality image representation, so use cautiously.

If you'd like to take a deeper dive into technical specifications for archival scanning, the National Archives has published a fairly comprehensive guide that is rich with details.

Who should do the work?

Broadly speaking, there are two ways you might go about scanning or digitizing items: 1) do it yourself, or 2) outsource it. Below are suggestions for both cases, include times when you might lean toward one or the other.

1. Do it yourself!

When this is a good idea:

  • You have only a few items (or you have a fast scanner) (or you have a lot of time to kill).
  • Your items are not particularly fragile.
  • You have access to a scanner.
    • NOTE: If you don’t own a scanner, you might look into local organizations who might be willing to loan one. Historical societies and colleges or universities might be good places to start. One organization remarked that they can borrow a scanner from their local chapter of the DAR. Be creative—you may only need it for a few hours! Also check to see if a local DPLA service hub can help.
  • Are you documenting 3D items? You probably can “scan” them yourself simply by taking photographs.

The Intern Option
Several of the organizations we talked to have had good luck with making scanning an intern project. If you are affiliated with (or near) a college or university, student services can often help you find individuals interested in an internship. Be sure to do some test batches yourself since your interns may likely have less experience with archives. Cost-wise, the DLF's digitization calculator can help you know what to expect.

As for specific equipment recommendations, the University of North Texas has put together a wide range of materials related to scanning recommendations, including this list of scanners and scanning systems. Their site also includes recommendations about quality standards and metadata—it's an essential resource!

Scanning in-house might be a less optimal choice when...

  • When your items are very fragile.
  • When you have a large number of items and your scanner doesn’t allow bulk processing.
  • When you’re scanning microfilm: locating a microfilm scanner can be difficult and you might consider hiring out. (See suggestions below.)

2. Outsource it!

If scanning in-house doesn't seem like the right option, there are plenty of companies who will capture your scans for a fee. Again, we do recommend checking with local libraries or museums to see if they have equipment (or labor!) that you could use for free or at a reduced price. Additionally, DPLA service hubs exist specifically for this purpose. Check to see if there's one near you!

Finally: believe it or not, scanning technology (microfilm and otherwise) is still improving so be sure to do a bit of research before committing the funds.

 

Metadata

Good news: scanning is a huge step forward! Bad news: it’s just the start of the journey. To make content available online, you need some kind of useful metadata about it. For more introductory information about metadata, check out this short video.

“Scanning is the least of it!” - one of the many wise librarians we interviewed

Digital images need metadata because computers are generally not able to “read” an image to know what’s in it. Scanning machines will often add basic metadata to a file: the file size, the image number (for microfilm), its resolution, etc. But non-technical metadata usually needs to be added by hand. Think about what’s do-able in your situation: Could you hire a student to add very basic descriptive information (like keywords, materials, or context)? Perhaps you or a colleague could allocate a few hours a week to the task? Talk to your team about approachable ways to gradually build this crucial descriptive information.

With that said: do not let yourself be bogged down by impartial or imperfect metadata! So many organizations seem reticent to post material until it’s perfect and complete. But how often do we actually find “perfect”? Remember that material can always be improved! Make it available to your visitors—online, in person—sooner than later! Who knows, maybe your visitors will even help you!

How do I decide what metadata to include?

Generally speaking, more data is better. But we have to be practical. Often these types of projects can be approached in phases. A phased approach might look like this:

  1. Scan documents, post online. At this point it's basically a pile of digital images. But you're making them available, which is a huge step forward!
    • Metadata for this phase:
      - collection name
      - item number
  2. Use student assistance, or crowd-sourcing, to add some very basic descriptive information. This moves your items toward being searchable, and thus more visible on the public stage.
    • Metadata for this phase:
      - collection name
      - item number
      - basic title
      - basic description
      - people names (if relevant)
      - place names (if relevant)
  3. Professionally refine metadata.* The better your data, the more discoverable you are. But every step forward—even small—is progress!
    • Metadata for this phase:
      - collection name
      - item number
      - refined title
      - refined description
      - subject headings: people, places, topics
      - other relevant information (size, provenance, transcript, etc.)

*This is likely to be the longest phase—it may last years! Trust that you’re doing good work and that forward momentum is valuable!

Conclusion: Save us from paper!

Physical media—paper, photographs, film—degrades. The more time goes on, the more quality you lose.

Erring on the side of quality (you can always downgrade!), adding any metadata that's practical to add, and making the material as broadly available as possible will very quickly pull your collections into the modern digital world. Time to take the leap!