Information Management - Metadata

NoyMcGuiness, Shirky

In recent years the development of ontologies—explicit formal specifications of the terms in the domain and relations among them (Gruber 1993)—has been moving from the realm of Artificial-Intelligence laboratories to the desktops of domain experts. Ontologies have become common on the World-Wide Web. The ontologies on the Web range from large taxonomies categorizing Web sites (such as on Yahoo!) to categorizations of products for sale and their features (such as on Amazon.com).
ERROR: Invalid Zotero username in config file.

Many disciplines now develop standardized ontologies that domain experts can use to share and annotate information in their fields.
ERROR: Invalid Zotero username in config file.

An ontology defines a common vocabulary for researchers who need to share information in a domain. It includes machine-interpretable definitions of basic concepts in the domain and relations among them. Why would someone want to develop an ontology? Some of the reasons are:

To share common understanding of the structure of information among people or software agents
To enable reuse of domain knowledge
To make domain assumptions explicit
To separate domain knowledge from the operational knowledge
To analyze domain knowledge

ERROR: Invalid Zotero username in config file.

Often an ontology of the domain is not a goal in itself. Developing an ontology is akin to defining a set of data and their structure for other programs to use. Problem-solving methods, domain-independent applications, and software agents use ontologies and knowledge bases built from ontologies as data.
ERROR: Invalid Zotero username in config file.

Object-oriented programming centers primarily around methods on classes—a programmer makes design decisions based on the operational properties of a class, whereas an ontology designer makes these decisions based on the structural properties of a class. As a result, a class structure and relations among classes in an ontology are different from the structure for a similar domain in an object-oriented program.
ERROR: Invalid Zotero username in config file.

For the purposes of this guide an ontology is a formal explicit description of concepts in a domain of discourse (classes (sometimes called concepts)), properties of each concept describing various features and attributes of the concept (slots (sometimes called roles or properties)), and restrictions on slots (facets (sometimes called role restrictions)). An ontology together with a set of individual instances of classes constitutes a knowledge base. In reality, there is a fine line where the ontology ends and the knowledge base begins.
ERROR: Invalid Zotero username in config file.

There is no one correct way to model a domain—there are always viable alternatives. The best solution almost always depends on the application that you have in mind and the extensions that you anticipate.
Ontology development is necessarily an iterative process.
Concepts in the ontology should be close to objects (physical or logical) and relationships in your domain of interest. These are most likely to be nouns (objects) or verbs (relationships) in sentences that describe your domain.

ERROR: Invalid Zotero username in config file.

Ontology design is a creative process and no two ontologies designed by different people would be the same. The potential applications of the ontology and the designer’s understanding and view of the domain will undoubtedly affect ontology design choices. “The proof is in the pudding”—we can assess the quality of our ontology only by using it in applications for which we designed it.
ERROR: Invalid Zotero username in config file.

The strategy of tagging – free-form labeling, without regard to categorical constraints – seems like a recipe for disaster, but as the Web has shown us, you can extract a surprising amount of value from big messy data sets.
ERROR: Invalid Zotero username in config file.

It's tempting to think that the classification schemes that libraries have optimized for in the past can be extended in an uncomplicated way into the digital world. This badly underestimates, in my view, the degree to which what libraries have historically been managing is an entirely different problem.
ERROR: Invalid Zotero username in config file.

The essence of a book isn't the ideas it contains. The essence of a book is “book.” Thinking that library catalogs exist to organize concepts confuses the container for the thing contained.

The categorization scheme is a response to physical constraints on storage, and to people's inability to keep the location of more than a few hundred things in their mind at once. Once you own more than a few hundred books, you have to organize them somehow.
ERROR: Invalid Zotero username in config file.

It isn't the ideas in a book that have to be in one place – a book can be about several things at once. It is the book itself, the physical fact of the bound object, that has to be one place, and if it's one place, it can't also be in another place. And this in turn means that a book has to be declared to be about some main thing. A book which is equally about two things breaks the 'be in one place' requirement, so each book needs to be declared to about one thing more than others, regardless of its actual contents.
ERROR: Invalid Zotero username in config file.

In the digital world, there is no physical constraint that's forcing this kind of organization on us any longer.
ERROR: Invalid Zotero username in config file.

There is no shelf. There is no file system. The links alone are enough.
ERROR: Invalid Zotero username in config file.

One reason Google was adopted so quickly when it came along is that Google understood there is no shelf, and that there is no file system. Google can decide what goes with what after hearing from the user, rather than trying to predict in advance what it is you need to know.
ERROR: Invalid Zotero username in config file.

Browse versus search is a radical increase in the trust we put in link infrastructure, and in the degree of power derived from that link structure. Browse says the people making the ontology, the people doing the categorization, have the responsibility to organize the world in advance. Given this requirement, the views of the catalogers necessarily override the user's needs and the user's view of the world. If you want something that hasn't been categorized in the way you think about it, you're out of luck.

The search paradigm says the reverse. It says nobody gets to tell you in advance what it is you need. Search says that, at the moment that you are looking for it, we will do our best to service it based on this link structure, because we believe we can build a world where we don't need the hierarchy to coexist with the link structure.
ERROR: Invalid Zotero username in config file.

When people were offered search and categorization side-by-side, fewer and fewer people were using categorization to find things.
ERROR: Invalid Zotero username in config file.

Users have a terrifically hard time guessing how something they want will have been categorized in advance, unless they have been educated about those categories in advance as well, and the bigger the user base, the more work that user education is.
ERROR: Invalid Zotero username in config file.

The list of factors making ontology a bad fit is, also, an almost perfect description of the Web – largest corpus, most naïve users, no global authority, and so on.
ERROR: Invalid Zotero username in config file.

The problem is, because the cataloguers assume their classification should have force on the world, they underestimate the difficulty of understanding what users are thinking, and they overestimate the amount to which users will agree, either with one another or with the catalogers, about the best way to categorize. They also underestimate the loss from erasing difference of expression, and they overestimate loss from the lack of a thesaurus.
ERROR: Invalid Zotero username in config file.

Tags are important mainly for what they leave out. By forgoing formal classification, tags enable a huge amount of user-produced organizational value, at vanishingly small cost.
ERROR: Invalid Zotero username in config file.

Much of the expense of existing catalogue systems is in trying to prevent one-off categories. With tagging, what you say is “As long as a lot of people are tagging any given link, the rare tags can be used or ignored, as the user likes. We won't even have to expend the cost to prevent people from using them. We'll just help other users ignore them if they want to.”
ERROR: Invalid Zotero username in config file.

We move from a binary choice between saying two tags are the same or different to the Venn diagram option of “kind of is/somewhat is/sort of is/overlaps to this degree”. That is a really profound change.
ERROR: Invalid Zotero username in config file.

It comes down ultimately to a question of philosophy. Does the world make sense or do we make sense of the world?
ERROR: Invalid Zotero username in config file.

The signal benefit of these systems is that they don't recreate the structured, hierarchical categorization so often forced onto us by our physical systems. Instead, we're dealing with a significant break – by letting users tag URLs and then aggregating those tags, we're going to be able to build alternate organizational systems, systems that, like the Web itself, do a better job of letting individuals create value for one another, often without realizing it.
ERROR: Invalid Zotero username in config file.