Miller Center

All About Metadata

Transcript

My name is Amber Lautigar Reichert and I’m a web developer with the University of Virginia’s Miller Center. The Miller Center focuses on study of the presidency and seeks to apply lessons of history to modern life.

In the course of building presidentialcollections.org, we’ve encountered a lot of understandable confusion about metadata. This video is intended to clear up some confusion and provide general guidelines for working with it.

What is metadata?


Metadata is information about something.

Metadata is simply data about something. For example, when you’re listening to a friend describe a photograph, that person is giving you metadata. They might say,

“I found this wonderful black and white photo of President Nixon with Elvis! It was taken at the White House in 1970.”

That is all metadata. But if your friend then says,

“Here it is, see?”

This is the item itself—not metadata.

Metadata lets us know things about an item, like its description, year of origin, ownership, or creator, without necessarily seeing the item.

A library record is metadata. Your pet’s ID tag is metadata. Metadata is the information that describes a thing—not the thing itself.

Metadata is what lets us search for an item, or sort a list based on type or origin. Without metadata, a collection is simply a chaotic pile of stuff.

What is a format/schema?

A format, or schema, is simply the container we use to hold our metadata in a structured way. You’ve probably used many schemas in your life and not even realized it. For example, iTunes holds metadata about every album or song in its library: title, artist, and year of release are all pieces of metadata.

iTunes gathers and displays metadata about every item in its collection. In this case, the metadata is the title, date, artist, etc.--but not the music itself. Music itself is the data (or "item").

In the digital library world, there are a wide variety of different data formats that both computers and humans can understand.

Those metadata formats are defined in something called a schema. Some examples of different formats are: TEI, Dublin Core, METS, and MODS. A schema simply defines a common structure and set of vocabulary for use with that particular format.

One of the most commons ways of building metadata is through coding called XML (Extensible Markup Language). You might be familiar with HTML, which we use to structure web pages. XML is very similar to HTML, but XML is used to describe content rather than presentation.

Another common way to structure data is using a language called JSON (JavaScript Object Notation). JSON is the native format for use with javascript so it’s often preferable on the web. That said, as long your metadata is in good shape, it’s usually not too difficult to translate it from one format to another.

Pro tips! (but first...)

I’m about to give some advice for building high-quality metadata. But first I want to acknowledge something: it is very, very rare that someone would be building metadata from scratch. Usually we’re sorting through previously-abandoned projects, outdated schemas, or partial catalogs. I know it, you know it, our bosses and colleagues know it.

The most important thing—new project or old—is to start somewhere. Here are some tips to help you be successful:

1. Pick an appropriate schema

Different schemas have different strengths. Try to pick a metadata schema that will serve a variety of purposes you might realistically encounter. For example, Dublin Core doesn’t currently have a built-in field to hold a transcription. If you think you might need that someday, lean toward a format that might have more options. No format will be perfect in the long run, but thinking about your future challenges will help you pick a long-lasting solution.

2. Respect standardized fields

Some fields in a metadata schema are meant to serve very specific purposes. It’s important to have an unyielding respect for those purposes.

In iTunes, genre is a rigidly standardized field.

Remember our iTunes library? Genre is an example of a standardized field: its options are limited, and it uses a controlled vocabulary to keep entries consistent.

Some other examples of standardized fields might be ID numbers, dates, addresses, or size.

Lets take ID number as a real-world example: an item’s ID should always be unique and remain unchanged for the full lifetime of an item. You might be tempted to use dates as an ID, or format, or cataloger as an ID… avoid this temptation (!) since all of these things can change over time.

Relatedly,

3. Be consistent!

This might be the most important of all. Once you start building metadata, document your practices. For example, how do you format names? Firstname Lastname? Lastname comma firstname? How do you handle names with three words? or four? Being consistent now will make your collections more discoverable to users in the long run, AND it increases your flexibility in case you ever need to change your schema.

Remember: you’re building a system that might be in use far beyond your own involvement. So the clearer you can make it, the better it will be!