A client-side Web agent for document categorization

Journal: Internet Research, v8n5, 1998, p387-399
Author: Boley, Daniel; Gini, Maria; Hastings, Kyle; Mobasher, Bamshad; Moore, Jerry

A client-side agent for exploring and categorizing documents on the World Wide Web is proposed. As the user browses the Web using a usual Web browser, this agent is designed to aid the user by classifying the documents the user finds most interesting into clusters. The agent carries out the task completely automatically and autonomously, with as little user intervention as the user desires. The principal novel components in this agent that make it possible are a scalable hierarchical clustering algorithm and a taxonomic label generator. In this paper, the overall architecture of this agent is described and the details of the algorithms within its key components are discussed.