ia/ - Search

Google Suggest does type-ahead for Google Index in near real-time

Fri, 10 Dec 2004 11:48:44 -0800

Go try Google Suggest now, if you haven't. Google Suggest shows the feasibility of using type ahead with very large collections of terms, like tags in a folksonomy.

Now, one of the drawbacks of using ad hoc tags in social classification is the lack of vocabulary control - people use different tags to mean the same thing. This is fine for organizing personal information architectures, but the lack of consistency, while reducing the cognitive cost of classification, actually increases effort in finding information.

To deal with the issue, there needs to be a feedback loop. Flickr has the most popular tags float to the top, 43 Things and others use type size to show more popular tags. There's an argument for that kind of subtle feedback. However, to really bridge between levels of classification, to move from a distributed folksonomy to a controlled vocabulary and then to a formal thesaurus, we need more than implicit incentive in using a particular tag. Using type ahead to show other tags is one way of doing that, as James Spahr illustrates so well. But I've always wondered about how scalable this approach would be with a massive tagset. With Google Suggest, instead of wondering how type ahead would scale, I'm wondering how we can implement a similar scale system for tags...

More IA Heuristics - Search

Fri, 03 Sep 2004 12:42:47 -0700

Lou Rosenfeld shares some more IA heuristics, this time focused on search.

Enhance Usability by Highlighting Search Terms

Tue, 10 Aug 2004 16:03:38 -0700

A List Apart offers a practical implementation of highlighting terms in the page that were searched for by the user. You can check out their demo search to see the script in action.

The Truth About Federated Searching

Thu, 20 May 2004 11:00:27 -0700

WebFeat, a provider of federated search technology has compiled a list of the five most commonly repeated misconceptions about federated searching, published in Information Today.

Do query string operators matter in search interfaces?

Mon, 10 Nov 2003 11:20:30 -0800

Research has reported that 90% of search engine users utilize query string operators, while the remaining 10% perform simple queries. Do boolean operators and "must include" (+) and phrase ("") operators make a difference in search engine results? Mostly no but sometimes yes according to this paper in ACM Transactions on Information Systems (Volume 21 ,Â Issue 4 Â (October 2003). Caroline Eastman and Bernard Jansen tested the effects of using query string operators on major search engines in their paper, "Coverage, relevance, and ranking:Â The impact of query operators on Web search engine results" to determine if these operators improved the effectiveness of web searching. When they say effectiveness, they are referring to relevance and relative precison of retrieval.

The paper attempts to find out if the use of certain query string operators makes any difference in search engine results. They found that implicit OR combination had a negative effect on performance and implicit AND had a positive effect on performance. As of their writing, MSN and AOL used implicit OR while Google appears to be using implicit AND. They found, generally, that most query string operators did not have a great effect on precision in the search engines tested. Precision was as high for simple queries as for advanced queries using query string operators. They did find, however, that in search engines using implicit OR, phrase operators sometimes had a positive effect on performance. [Note that this research didn't test exclusion operators (i.e. boolean NOT or the minus (-) operator). ]

So summarizing, there is limited advantage to using OR, and possibly some advantage to using PHRASE operators in some search engines. But generally speaking, these query string operators provide little or no benefit to users and are counter productive in some cases. Interesting? Maybe. I suppose this is saying that most search engines are doing better to match users expectations when doing simple searches. With 90% of the population using simple searches, those sophisticated algorithms on the back end become more important. They make a note that while it may hold true for general search engines that query string operators are less important, there is a place where they are still necessary in order to achieve satisfactory results -- in IR systems that do not have sophisticated matching and ranking algorithms.

Web searches: are they fixed?

Fri, 03 Oct 2003 04:38:36 -0700

Interesting article in Business Week Online regarding paid placements and some potential controversy involving small businesses. I found the link at searchengineposition.com.

Web Searches: The Fix Is In
by Ben Elgin, October 6, 2003

Dublin Core 2003: Seattle, WA

Fri, 08 Jun 2007 03:53:25 -0700

The Dublin Core 2003 Conference is currently going on in Seattle this week. A couple of the attendees and I will be sharing our notes(and photos) when we've recovered(it's actually still going on). But until then, enjoy the conference proceedings online.

Trumping Google? Metasearching's Promise

Wed, 01 Oct 2003 07:57:13 -0700

Information services organizations (libraries) continue to be challenged by information seeking behaviors and expectations of web search engine users. In a recent Library Journal article, Judy Luther discusses issues related to metasearch engines. In the article she writes, "For many searchers, the quality of the results matter less than the process -- they just expect the process to be quick and easy." Anecdotally, I've found this to be true of users I've encountered within my organization. For more exhaustive and relevant searching, these searchers can turn to researchers for help -- that is, real subject matter experts who know the sources and how to search them.

Searching multiple databases is a special kind of problem because the databases don't always share the same controlled vocabularies or use the same protocols (e.g. Z39.50, XML). But there is great advantage to users viewing intermixed and deduped search results from multiple sources. The search engine DogPile or the SpotLight federated search engines of the California Digital Library are good examples that show how this works. At the SLA conference, federated search seemed to still be a buzzword among search vendors.

See also the related article on federated search by Roy Tenant.

Putting it Together: Taxonomy, Classification & Search

Wed, 24 Sep 2003 13:56:04 -0700

A good overview of the current state of the art in combining taxonomies and search from Jeff Morris in Transform magazine. Combining taxonomy and classification with search gives people a map of the resources available to them. This kind of taxonomy, classification and search combination is becoming essential for the major search vendors. [thanks Infodesign]

Video search on PBS.org

Thu, 31 Jul 2003 05:50:21 -0700

Gary Price points out that PBS is offering free keyword and or title search of some of its video. Being the father of a two-year old, I do regular visits to JungleWalk with my son to look for animal videos. Now I can add Nature to my bookmarks.
Searching is quite nice on PBS. You can do keyword searches or browse by show/program title. Odd that they don't let you view the metadata, though. I wondered after searching the Nature archives for "leach" why vampire bat and mosquito videos were returned when what I wanted to find was the blood sucking leaches from the same "Blood Suckers" show. I guess there is one metadata record shared per show, which I guess makes sense when there are only 2 or three short videos available per program. That way, obviously related videos are presented in your search results.
Available presently on PBS:

Amazon Plan to offer full-text search of some non-fiction texts

Mon, 21 Jul 2003 12:56:55 -0700

Very interesting news from Amazon today in an article in the NY Times. The retailer is planning a new full-text searching service called "Look Inside the Book II" that will combine some of the functionalities of a digital library with the retailers' current methods for helping customers find and evaluate products. The full-text service will extend the "Peak inside" service that users get when previewing TOCs, indexes, and sample pages with "Look Inside the Book". I couldn't surmise from the article whether full-text searching would be offered only when viewing a single book or if it would be possible to do full-text searching across a corpus of digitized e-texts.

The new service is being met with some wariness from publishers and authors who worry that the service will make Amazon more like an information service a la ebrary and netLibrary and undoubtedly Amazon will have to do a lot to protect copyright.

Being someone who uses e-text vendors and full-text digital libraries, I think the service could be a boon to the book selling industry. There is no reason that full-text searching of some non-fiction works can be offered without protecting copyright. If brief keyword in context (KWIC) displays of search terms are given to offer some help in filtering out and refining your search without publishing too much information, then how can this hurt publishers? No doubt, some works such as reference books would give away too much in even a brief KWIC display, but surely there must be a way to make this work. I think it's a good step in making the Amazon shopping experience even more valuable. It's amazing that they continue to innovate the experience of buying online.

Federated search overview

Tue, 04 Nov 2003 07:52:34 -0800

I recently heard Roy Tennant tell a group of information professionals that "only librarians like to search, everyone else likes to find." Roy should get together with Peter to combine his findability meme with this appropriate tagline.

In keeping with this findability theme, Tennant's describes some of the current offerings in the federated search space in his latest Digital Libraries column in Library Journal. This is an area that is hot with vendors in the information searching space right now.

CIO Article on Auto/Semi-categorization software

Wed, 14 May 2003 13:39:12 -0700

CIO article "Sleuthing out data" by Fred Hapgood features a couple examples of how auto-semiauto categorization enables businesses and reduce costs. There is a company list included if you're interested in this arena.

Data Management meets Unstructured Information

Thu, 08 May 2003 04:18:17 -0700

Just came back from a conference on data management(Wilshire Metadata/DAMA International 2003 Conference. A recurring topic that surfaced about data management was the relevance of their work in relation to unstructured information. A reality check for everyone was that most corporate information actually existed in semi-structured of unstructured information and not in databases. From this thought, I was directed to DM Review and in particular this article. Digging Into the Web: XML, Meta Data and Other Paths to Unstructured Data - By Robert Blumberg and Shaku Atre. I definitely see an opportunity between IA(metadata/ux) type folks cross-pollinating with data modelers and data managers. It will be interesting to see and I look forward to hearing more from here. Thoughts?

New Yahoo! Search debuts

Mon, 07 Apr 2003 11:57:14 -0700