Information Management - Metadata
In recent years the development of ontologies—explicit formal specifications of the terms in the domain and relations among them (Gruber 1993)—has been moving from the realm of Artificial-Intelligence laboratories to the desktops of domain experts. Ontologies have become common on the World-Wide Web. The ontologies on the Web range from large taxonomies categorizing Web sites (such as on Yahoo!) to categorizations of products for sale and their features (such as on Amazon.com).
[OntologyDev, p.1]
Many disciplines now develop standardized ontologies that domain experts can use to share and annotate information in their fields.
[OntologyDev, p.1]
An ontology defines a common vocabulary for researchers who need to share information in a domain. It includes machine-interpretable definitions of basic concepts in the domain and relations among them. Why would someone want to develop an ontology? Some of the reasons are:
- To share common understanding of the structure of information among people or software agents
- To enable reuse of domain knowledge
- To make domain assumptions explicit
- To separate domain knowledge from the operational knowledge
- To analyze domain knowledge
[OntologyDev, p.1]
Often an ontology of the domain is not a goal in itself. Developing an ontology is akin to defining a set of data and their structure for other programs to use. Problem-solving methods, domain-independent applications, and software agents use ontologies and knowledge bases built from ontologies as data.
[OntologyDev, p.2]
Object-oriented programming centers primarily around methods on classes—a programmer makes design decisions based on the operational properties of a class, whereas an ontology designer makes these decisions based on the structural properties of a class. As a result, a class structure and relations among classes in an ontology are different from the structure for a similar domain in an object-oriented program.
[OntologyDev, p.2]
For the purposes of this guide an ontology is a formal explicit description of concepts in a domain of discourse (classes (sometimes called concepts)), properties of each concept describing various features and attributes of the concept (slots (sometimes called roles or properties)), and restrictions on slots (facets (sometimes called role restrictions)). An ontology together with a set of individual instances of classes constitutes a knowledge base. In reality, there is a fine line where the ontology ends and the knowledge base begins.
[OntologyDev, p.3]
- There is no one correct way to model a domain—there are always viable alternatives. The best solution almost always depends on the application that you have in mind and the extensions that you anticipate.
- Ontology development is necessarily an iterative process.
- Concepts in the ontology should be close to objects (physical or logical) and relationships in your domain of interest. These are most likely to be nouns (objects) or verbs (relationships) in sentences that describe your domain.
[OntologyDev, p.4]
Ontology design is a creative process and no two ontologies designed by different people would be the same. The potential applications of the ontology and the designer’s understanding and view of the domain will undoubtedly affect ontology design choices. “The proof is in the pudding”—we can assess the quality of our ontology only by using it in applications for which we designed it.
[OntologyDev, p.23]
The strategy of tagging – free-form labeling, without regard to categorical constraints – seems like a recipe for disaster, but as the Web has shown us, you can extract a surprising amount of value from big messy data sets.
[OntologyOverrated, p.2]
It's tempting to think that the classification schemes that libraries have optimized for in the past can be extended in an uncomplicated way into the digital world. This badly underestimates, in my view, the degree to which what libraries have historically been managing is an entirely different problem.
[OntologyOverrated, p.7-8]
The essence of a book isn't the ideas it contains. The essence of a book is “book.” Thinking that library catalogs exist to organize concepts confuses the container for the thing contained.
The categorization scheme is a response to physical constraints on storage, and to people's inability to keep the location of more than a few hundred things in their mind at once. Once you own more than a few hundred books, you have to organize them somehow.
[OntologyOverrated, p.8]
It isn't the ideas in a book that have to be in one place – a book can be about several things at once. It is the book itself, the physical fact of the bound object, that has to be one place, and if it's one place, it can't also be in another place. And this in turn means that a book has to be declared to be about some main thing. A book which is equally about two things breaks the 'be in one place' requirement, so each book needs to be declared to about one thing more than others, regardless of its actual contents.
[OntologyOverrated, p.8-9]
In the digital world, there is no physical constraint that's forcing this kind of organization on us any longer.
[OntologyOverrated, p.9]
There is no shelf. There is no file system. The links alone are enough.
[OntologyOverrated, p.13]
One reason Google was adopted so quickly when it came along is that Google understood there is no shelf, and that there is no file system. Google can decide what goes with what after hearing from the user, rather than trying to predict in advance what it is you need to know.
[OntologyOverrated, p.14]
Browse versus search is a radical increase in the trust we put in link infrastructure, and in the degree of power derived from that link structure. Browse says the people making the ontology, the people doing the categorization, have the responsibility to organize the world in advance. Given this requirement, the views of the catalogers necessarily override the user's needs and the user's view of the world. If you want something that hasn't been categorized in the way you think about it, you're out of luck.
The search paradigm says the reverse. It says nobody gets to tell you in advance what it is you need. Search says that, at the moment that you are looking for it, we will do our best to service it based on this link structure, because we believe we can build a world where we don't need the hierarchy to coexist with the link structure.
[OntologyOverrated, p.14-15]
When people were offered search and categorization side-by-side, fewer and fewer people were using categorization to find things.
[OntologyOverrated, p.15]
Users have a terrifically hard time guessing how something they want will have been categorized in advance, unless they have been educated about those categories in advance as well, and the bigger the user base, the more work that user education is.
[OntologyOverrated, p.17]
The list of factors making ontology a bad fit is, also, an almost perfect description of the Web – largest corpus, most naïve users, no global authority, and so on.
[OntologyOverrated, p.18]
The problem is, because the cataloguers assume their classification should have force on the world, they underestimate the difficulty of understanding what users are thinking, and they overestimate the amount to which users will agree, either with one another or with the catalogers, about the best way to categorize. They also underestimate the loss from erasing difference of expression, and they overestimate loss from the lack of a thesaurus.
[OntologyOverrated, p.20]
Tags are important mainly for what they leave out. By forgoing formal classification, tags enable a huge amount of user-produced organizational value, at vanishingly small cost.
[OntologyOverrated, p.24-25]
Much of the expense of existing catalogue systems is in trying to prevent one-off categories. With tagging, what you say is “As long as a lot of people are tagging any given link, the rare tags can be used or ignored, as the user likes. We won't even have to expend the cost to prevent people from using them. We'll just help other users ignore them if they want to.”
[OntologyOverrated, p.28]
We move from a binary choice between saying two tags are the same or different to the Venn diagram option of “kind of is/somewhat is/sort of is/overlaps to this degree”. That is a really profound change.
[OntologyOverrated, p.29]
It comes down ultimately to a question of philosophy. Does the world make sense or do we make sense of the world?
[OntologyOverrated, p.33]
The signal benefit of these systems is that they don't recreate the structured, hierarchical categorization so often forced onto us by our physical systems. Instead, we're dealing with a significant break – by letting users tag URLs and then aggregating those tags, we're going to be able to build alternate organizational systems, systems that, like the Web itself, do a better job of letting individuals create value for one another, often without realizing it.
[OntologyOverrated, p.34]