Introduction to Metadata

Our understanding of the world is facilitated by our ability to associate things, to compare and contrast, to categorize, and to form abstract relationships. To shape information in ways that allow others to better understand, we deliberately describe the information around us to shape it, creating new forms of knowledge. When communicating with computers, we can do this using metadata.

Metadata is simply a piece of information that describes other information. For example, let's look at some text, a headline from

Bush Continues to Push Congress for Resolution on Iraq


President Bush today kept up pressure on Congress to approve action against Iraq amid new criticism from Democrats.

The data in this case is the headline and summary:

Bush Continues to Push Congress for Resolution on Iraq

President Bush today kept up pressure on Congress to approve action against Iraq amid new criticism from Democrats.

The metadata is the surrounding information that helps us understand the context or to categorize the data:


Publish time: 12:30 PM ET

Related information:

There may also be other metadata that isn't displayed but which helps the system display or organize the data:

Desk: National

Information Type: News

Format: Column

To allow readers to search or browse their news, the New York Times might collect one taxonomy of terms - a form of metadata - and display all these terms together. For example, the Desk taxonomy looks like this:









New York Region




This collection is called a metadata schema, meaning a systematic combination of elements.

Metadata can describe other things as well, such as people or places.


There are several types of schemes that can be used when organizing metadata:

[ insert chart ]

adapted from "Levels of Control" from and "An Ontology Spectrum" from Deborah McGuiness


Essentially, the benefits of these metadata schema are:

Here's some basic definitions to help tell the different kinds of schema apart:

It might help to define some related terms:

Controlled Vocabularies - a defined set of preferred terms. Types of controlled vocabularies include Synonym Rings,

Authority Files, Taxonomies, Faceted Taxonomies, and Thesauri. Ontologies are not usually considered a form of controlled vocabulary but rather a form of knowledge representation.

Attribute - an aspect of an object, such as the publisher name. Attributes are alternately called "facets" when applied to taxonomies, "slots" when applied to ontologies, or "fields" when applied to databases.

Attribute Value - a value assigned to an attribute. For example the attribute "Publisher Name" can have a value of "New York Times".

{show examples of all these}

A note on metatags: metadata and metatags are related, but are different things. Metatags are found within markup code (like HTML pages) to identify certain attributes of that information. Metadata goes *into* metatags, but metadata has many other uses as well.

Saturday, September 28, 2002 | Permalink | Filed in Knowledge Base-Driven Systems


Dreyfuss Mobile Phone

Survey of Web Genres

Doblin's Short, Grandiose Theory

Marsupial Mouse

Search method seeds

Volunteering pays


Headline! Radio buttons originally controlled radios

Cost and Style

Litmus test for scent/meaning

Shifting information goals

Theory: EBay as Flea Market

Teaching in Sound Bites