eTech05: Ontology is Overrated - All the Pages Are My Days

Continuing a theme, Clay Shirky’s eTech presentation focused on categorization, tagging and labeling. Deep or wide or ? I know I face these issues in dealing with my image and music library.

We start by looking at the meaning of ontology: (Depends on what meaning of “Is” Is.)

* Entities and their relations.

Clay plans to argue that the classic goals of ontology are no longer relevant.

Descriptive and predictive values embodied in the classification scheme demonstrated by the periodic table. But, it’s not a perfect scheme.

Card catalogs are another major example of a classificaton scheme.

Look at the Dewey system under the category of religion. Same with the Library of Congress system. Major problems in that the sub-categories are very incomplete.

What’s being optimized is the number of books on the shelves. Nothing to do with the real world, at least not the political real world. In the local real world, storing all those books is the real problem being solved.

Not organization of ideas, but organization of the physical objects.

Librarians say: “there is no shelf”. We provide the interface between your request and the physical shelving. And, in my experience, they do. But, libraries are now only a small part of the body of information available.

Yahoo is used to further illustrate the problem; it’s still a provider-centric view, not a user-centric perspective.

Next comes hierarchy, plus links. Eventually, a lot of links.

Why did google do so well? Same as with gmail. No folders, no shelves, no categories setup in advance. Just search. The links represent the categories as expressed by a user query.

All this raises the question: is an ontological approach appropriate for dealing with the volumes of information available today? For specific use cases, yes, but in general, no. The slides do a good job of supporting this idea, and I’ll link to them when available.

Arguments for formal categorization: we need a thesarus for example, to deal with “Mac/Apple/OSX”. Maybe in that case, but how about “Movies/Film/Cinema”? Those terms mean different things to the folks who use them. It’s impossible to collapse categories, via thesarus or whatever, without some signal loss.

Some more. Dresden in East Germany. Dresden is real; (East) Germany is a political construct.

The future: del.icio.us, etc. User categorizing almost always results in a long tail distribution. This implies that rigid, limited categorizing is always slightly wrong.

(meta: this session is almost completely full. Fascinating that so many people are interested in this topic.)

He concludes with a review of the characteristics of organic categorization (folksonomies).

* Aggregation/summing is very robust.
* Merges create overlap, not sync.
* Merges are probabilistic, not binary.
* User and time are core attributes.

See the slides for the whole list.

Journalists often think the web needs an editor, not understanding that the web is the editor. If no one links to you, you’re not read. Similarly for user categorization. One-off categories are ignored, but decaf coffee pots are orange.

Does the world make sense, or do we make sense of the world? We make sense of the world is the assertion, and overall, I’m inclined to agree. There is value in aggregation/summing, but the power is derived from multiples of real people’s authority, not via top-down categorizing done by the few.