Taxonomies on business blogs

On drop.org, Kika pointed to Doug Kaye's blog entry which talks about the use of taxonomies on business blogs. He states that Library science is inadequate for the range of knowledge and thought [that] are encountered with weblogs. Kaye writes a blog about business blogs called "Blog the Organization". He also had this to say about the use of taxonomies.

    At the very least people will spend a whole lot of time debating and complaining about the taxonomy. And people will find another person's taxonomy frustrating and will avoid using it.

He makes some observations that are correct -- requiring users to apply metadata may be a deterrant, taxonomies are subjective and require great effort to maintain. Kaye says that Library Science is not up to the task when it comes to blogs. I think what he is really saying is that taxonomies in general cannot serve every perspective. And he is right. I think there is a great misconception in the business world lately that taxonomies are the key to controlling your data. But they are just one piece of the information retrieval solution.

If you build it will they tag it?
I don't disagree with what Kay had to say about the added steps of applying metatags being a possible deterrent to use. I believe that this is really an interface usability issue. I do disagree strongly with what he has to say about the usefulness and relevance of taxonomies in business weblogs.

Business blogs and those with narrow subject scopes are perfectly apt candidates for the use of a taxonomy. But the onus of metatagging documents should lie in a select few individuals who maintain the indexing of the blog who are steeped in the blog's subject scope.

Kaye is right in suggesting that keeping up with the application of metadata is a daunting task. This is a human resources issue. If you don't have qualified resources to do your indexing, then human-maintained taxonomies are probably not for you. But if you do have the resources, taxonomies may provide substantial value in retrieving relevant documents with a high level of specificity. Want to know how? First I'll tell you what I've seen and then tell you what I propose for iaslash using Drupal.

The value of metadata
Is Kaye correct in saying that taxonomies are too problematic to be useful for business blogs? I don't think so. I think they are just one piece of the information retrieval solution that includes automated and human indexing and the creation and maintainance of a controlled vocabulary of index terms. These pieces coexist to support a classification scheme that attempts to support browsing or learning types of information seeking behaviors. These are not panaceas for satisfying all information seeking activities. They are tools which attempt to support some information seeking behaviors -- namely, those of browsers or learners who are not looking for a known item.

Classification systems (Dewey, UDC, Library of Congress) are not the end-all solution for all innformation seeking; they are problematic for several reasons. Success in retrieving documents based on a pre-determined set of categories and subcategories is related to your understanding of that system of categorization (this is also referred to as knowledge representation in the library and information science field). The success of retrieving relevant documents also depends on matching your description (your representation of that knowledge) during an information search with the description the indexer gave to it (the indexer's representation). This is made more problematic because the way in which we describe ideas changes over time even within one person.

I think anyone who maintains a system for controlling data for information retrieval will tell you that the task of maintaining that system (taxonomy, controlled vocabulary) is constant and that you will probably find that the success of that system in helping users find data is also directly proportional to the amount of effort expended on building and mantaining it.

That said, is there really no point in maintaining a system of classification? Well, actually there still is. Even the best attempts at automated classification fail to work without the help of a human intermediary. Such is the case with tools like Verity's intelligent classifier or Semio. And even using social networks, as Google does, relevancy of retrieved documents is not always excellent. The point is that, within any field of knowledge, computer algorithms simply cannot yet understand all of the complex concepts in human languages as a human steeped in a the language and subject matter of the field can.

[A good article related to the topic of automated classification is "Extracting Value from Automated Classification Tools", by Kat Hagedorn]

Human indexing of documents using a controlled vocabulary is one method to increase relevancy of retrieval. In the organization I work for, we maintain a narrowly focused controlled vocabulary (CV) for indexing all of the data we house -- internal documents, vendor data. That CV is used to support various taxonomies in use within the company.

Drupal
I have some ideas that I'm going to experiment with using Drupal. In Drupal terms, think of index terms of the CV as "attributes" and the taxonomy as the "meta tags".

What I am planning to do is to decide on a set of facets under which I will create index terms for each document I blog. For example the facets might be: Subject matter, Names of persons (individuals and groups), Names of places, Names of events. I will then at some point decide to develop a 2-3 level taxonomy using terms under the subject matter facet.

How I am going to apply this will be somewhat experimental for me. A while ago, I wrote a paper on indexing images. One of the concepts I liked at the time was using facetted analysis and specifically using the Modern Language Association's contextual indexing and faceted taxonomic access system (CIFT). The CIFT method is a post-coordinate method for displaying faceted descriptors. This method takes the terms which indexers have extracted to represent the subject matter facet and arranges them into an ordered string. Each term is shown in the index (paper or electronic) at the lead position in the string, and the connected/additional terms are shown . Each term gets shifted to the front of the string, depending on the term the user used in their search.

For more on the CIFT approach see:

I hope to try some things with Drupal using the above approach and may ask the Drupal community what they think. I think the approach may prove valuable to teach me a bit about the practical application of faceted analysis. Altough, I fear the approach might be a bit ambitious.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Lou Rosenfeld talks about Kaye's message

In Damning Metadata, Lou gives us a little history about the love hate relationship businesses have with classification since the mid 90's.

I can't remember which IA blog pointed me to Doug Kaye's blog, but I found his frustration with metadata to be... well, frustrating.

Some more related comments by

Some more related comments by Már Örlygsson on Kblog, and a related discussion on drop.

More on eleganthack

EH comments has a short discussion here.