A List Apart
Brightly Colored Food
City of Sound
Croc o' Lyle
Digital Web Magazine
Dive Into Mark
Guide to ease
Joel on Software
Noise Between Stations
Off the top
Signal vs. Noise
Coveo is a site search engine from the makers of Copernic desktop search. Looks comparable to others in the entry-to-mid market, but is currently free for 5000 documents or less. I'm digging into the technical docs right now to look for things like synonym control and best bets, but haven't found them yet. Runs on Windows. Anyone tried it out? Thanks Ben Skelton
Brainboost, a new search engine, offers answers to regular questions - like What is information architecture? It extracts text snippets from a wide variety of sources that help answer the question - though I think we've got a way to go in defining the damn thing.
The Brainboost algorithm is useful, but sometimes lacking, pulling sentences that contain "...information architecture is..." even when the sentence is about something else. I don't think Google has much to fear, but the approach is helpful for basic questions. Thanks metacool
Just a reminder to us all that there is a ton of Internet activity that doesn’t take place in front of a beige box (or a shiny metal one if you’ve got a G5 :-) ).
Google has launched their integrated desktop search in public beta. The most interesting thing is that rather than being a desktop application, it simply adds another tab to Google’s search results, and displays indexed desktop content from email, Office documents, etc.
FIND/SVP, Empire Media and Triplehop Technologies launched www.Find.com, a search egnine for business professionals that aggregates results from several major search engines and hand-picked business-related sites.
A results sidebar shows you found topics that can be used for filtering by ANDing one or many terms to your search input. You have to re-submit the form to see your results. It takes a bit of figuring out at first, but functionally, it allows you to select multiple terms (I assume clusters your intial term fell into) before refining (re-executing) your search. This interaction could be improved quite a bit, I think. Sidebar tabs allow you to also filter by format, sites and source.
Probably most interesting is that they have a “Research” search tab that allows you to find results from premium research sources including Find’s research, Frost and Sullivan, and more. Other tabs include Directory (open directory listings) and News. I’ve been finding that their beta release is also not withouts its DHTML bugs (using Firefox). It looks like it might become a business user search alternative to watch, however.
Using search engines to compile a list - like the top 50 greatest blues guitarists by record sales, say - involves a lot of drudge work because you have to visit many web pages to gather the data you need. But the next step in search engine technology could make creating such lists possible with a single mouse click.
KnowItAll, a search engine under development at the University of Washington, Seattle, trawls the web for data and then collates it in the form of a list. The approach is unique, says its developer, Oren Etzioni, because it generates information that probably does not exist on any single web page.
The US Department of Defense’s research arm, DARPA, and Google, are so impressed that they are providing funding for the project.
Research has reported that 90% of search engine users utilize query string operators, while the remaining 10% perform simple queries. Do boolean operators and "must include" (+) and phrase ("") operators make a difference in search engine results? Mostly no but sometimes yes according to this paper in ACM Transactions on Information Systems (Volume 21 , Issue 4 (October 2003). Caroline Eastman and Bernard Jansen tested the effects of using query string operators on major search engines in their paper, "Coverage, relevance, and ranking: The impact of query operators on Web search engine results" to determine if these operators improved the effectiveness of web searching. When they say effectiveness, they are referring to relevance and relative precison of retrieval.
The paper attempts to find out if the use of certain query string operators makes any difference in search engine results. They found that implicit OR combination had a negative effect on performance and implicit AND had a positive effect on performance. As of their writing, MSN and AOL used implicit OR while Google appears to be using implicit AND. They found, generally, that most query string operators did not have a great effect on precision in the search engines tested. Precision was as high for simple queries as for advanced queries using query string operators. They did find, however, that in search engines using implicit OR, phrase operators sometimes had a positive effect on performance. [Note that this research didn't test exclusion operators (i.e. boolean NOT or the minus (-) operator). ]
So summarizing, there is limited advantage to using OR, and possibly some advantage to using PHRASE operators in some search engines. But generally speaking, these query string operators provide little or no benefit to users and are counter productive in some cases. Interesting? Maybe. I suppose this is saying that most search engines are doing better to match users expectations when doing simple searches. With 90% of the population using simple searches, those sophisticated algorithms on the back end become more important. They make a note that while it may hold true for general search engines that query string operators are less important, there is a place where they are still necessary in order to achieve satisfactory results -- in IR systems that do not have sophisticated matching and ranking algorithms.
The automatic clustering done by the new Mooter search engine seems interesting. An article in the Herald Sun interviews Liesl Capper, the proprietor of the new search engine company, which will be offering enterprise search services:
"What Mooter does is that we look at the long lists of results from other search engines and then we group them using artificial intelligence algorithms. But also we look at what you're doing and while you're working we actually move with you and push up things that you seem to be interested in".
Interesting article in Business Week Online regarding paid placements and some potential controversy involving small businesses. I found the link at searchengineposition.com.
Web Searches: The Fix Is In
by Ben Elgin, October 6, 2003
Nutch is a nascent effort to implement an open-source web search engine.
Stumbled on an AP article through NY Times(Free Reg Required) that Yahoo to buy Overture for $1.6B Deal.
Interesting article on Google Dance Syndrome by Chris Sherman over at SearchEngineWatch.com. Apparently there are many webmasters out there who are fixated on how they rank in Google to the point they worry and try to optimize. I have to admit I kind of review the sites of many of our web authors in the Google index, but I also review other sites such as Teoma and MSN :) Who doesn't? I'm curious about freshness,coverage, and depth for these engines, and it gives me a good idea about how our sites are doing from referrals from these engines. I'm curious to hear if others monitor their company's sites in the various search engines.
CIO article "Sleuthing out data" by Fred Hapgood features a couple examples of how auto-semiauto categorization enables businesses and reduce costs. There is a company list included if you're interested in this arena.
Just came back from a conference on data management(Wilshire Metadata/DAMA International 2003 Conference. A recurring topic that surfaced about data management was the relevance of their work in relation to unstructured information. A reality check for everyone was that most corporate information actually existed in semi-structured of unstructured information and not in databases. From this thought, I was directed to DM Review and in particular this article. Digging Into the Web: XML, Meta Data and Other Paths to Unstructured Data - By Robert Blumberg and Shaku Atre. I definitely see an opportunity between IA(metadata/ux) type folks cross-pollinating with data modelers and data managers. It will be interesting to see and I look forward to hearing more from here. Thoughts?
The April 21 Alertbox combines 2 old thoughts into one:
But any short-term gain from text-ads will vanish if they do not provide any value to users.
We saw this first with "banner blindness" - people visually ignoring rectangular images once they figured out most were useless ads.
I continue to see this across the board - not just with banners. If users regularly encounter a design element that is useless to them, then they quickly start to ignore it. Could be banners, or global navigation at the top, or related links on the left, or promotions on the right - does not matter.
I call this "feckless blindness" - as people discover that a part of the page is routinely useless, they become blind to it over time.
Forrester weighs in on Yahoo!s new search features (account required) claiming that a new emphasis on user experience will give search engine leaders a competitive advantage. Forrester likes the new Yahoo! for its streamlined (more Google-like) search entry page, cleaner and easier to read search results and use of text ads over banners. The market research company makes a few suggestions to the top search engines to put their results in context and add to the user experience: