Taxonomy's Role in Content Management

Good article in eContent discussing the role of taxonomies for classifying content in a CMS. Defines taxonomies in the context of CMS, describes the current offerings, offers caveats, and presents 2 case studies.

Interesting observations about what makes a good taxonomy:

    A good taxonomy, according to Weinstein, is one in which content is distributed evenly across the classification scheme. "The depth of the taxonomy should be relatively uniform," he said. When some categories have too much or too little information, "it usually means that the people didn't understand the nature of the content they were classifying, or they believe that they had more or less than they actually did."

    A good taxonomy also is one in which "everything has a place and only one place." Weinstein says. "The sum total of the taxonomy is mutually exclusive of all of the content, and it's collectively exhaustive as well." Also, "the terms used in the taxonomy should be native terms to the user community. They have to be terms that the users will understand instantly, intuitively, and clearly."

Goes on to talk about the short-sightedness of many corporations of not seeing the obvious pool of resources in the corporate library staff that could support classification functions in the CMS. This is all too true.

    "No matter what taxonomy product you look at, it's not going to be a turnkey solution," cautions Rasmus at Giga. "Most of the systems, when you do automatic taxonomy generation, there still is quite a bit of manual effort involved to go back and change the names. The systems just come up with what they think a concept should be called. It's a machine name. It may be just a string of characters that are put together. So you have to go back and give it a real name that means something in the context of your business."

    Exacerbating that problem is the fact that "a lot of organizations have dropped off the corporate librarians and other people who have the skills for organizing content. I recommend to companies that they keep their librarians, and they may want to hire knowledge engineers even if they're using automated tools because the tools are really black boxes you throw content in. You read a document and put it into a training algorithm and say, 'Now every time I throw content at you, classify any documents that are like this one in this category.'

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Definitions used by vendors

That idea of everything having only one place is the purist concept of taxonomy and is the reason why taxonomies are NOT always ideal. The real goal, I think, in classifying content is simply to use a system for describing what content is about. Taxonomies, in the purist sense, force that desscription into a rigid hierarchy. In mainstream use today, however, it seems that software vendors aren't necessarily following this definition and taxonomies can be more flexible to support poly-hierararchy (?).

For me, that is why it seems most sensible to look to traditional practices employed in indexing (print and electronic) when thinking about classifying web content. Back of the book indexes are a good example of how to display the many different ways in which a concept can be described. Often, they use a controlled vocabulary, but sometimes they display a flat view of concepts that mainly supports known-item searching. Couple your indexing with the use of a thesaurus showing broader/narrow/related terms and browsing/learning can be supported as well.