IA blogs and SIGIA-L search

The IA site search engine. The link to this thing is now located in the primary navigation.

I went through the list of sites that I surf somewhat frequently and came up with a short list of sites to crawl (check the list). I made decisions based on who kept their links to IA and design resources fresh and relevant. I will crawl these sites once a week to start. If you think your site should be on this list because you blog relevant IA resources frequently, please let me know by commenting here or contacting me directly. If you're on the list below and don't want to be, contact me.

And by the way, the SIGIA archives are indexed here too! This thing is set to crawl weekly at 2AM EST on Saturday.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

expand

I think this could be very handy if interfaced correctly. I do, however, think that the list of sites searched should be expanded in a major way. Right now it is only representing a small portion of what is out there.

- Nick Finck
http://www.nickfinck.com
http://www.digital-web.com

Of course! I'm just twiddlin

Of course! I'm just twiddling with the thing to see if it gives good results on a known set of sites for now. If it doesn't work so well I'm going to scrap it.
-m

How about using their RSS feeds instead?

I've written some code that took your list and looked up the sites on the Syndic8.com database.

Some of them have RSS feeds available. Some are known to Syndic8 users and have been suggested as desirable candidates for creating a feed. Some are not known. A few are marked as rejected. This usually means it's a broken feed or one that previously existed but is now gone.

A few duplicates are shown because more than one feed URL was known in the database.

Right now you can join Syndic8 and add feeds to your personal list. You can download that list for use in your RSS reader.

Let me know if this is of interest.
Bill Kearney wkearney99@hotmail.com

Here's a list of them:

URLStatusRSS feedhttp://www.37signals.com/svn/DesiringSyndicationhttp://www.agwright.com/blog/Not Found http://www.anitrapavka.com/Not Found http://www.blackbeltjones.com/work/DesiringSyndicationhttp://www.bogieland.com/infodesign/Syndicatedhttp://www.newsisfree.com/HPE/xml/feeds/35/2535.xmlhttp://www.bradlauster.com/Syndicatedhttp://bradlauster.com/index.xmlhttp://www.brightlycoloredfood.com/Not Found http://www.cuberootconsulting.com/~madonnalisa/blog/Not Found http://www.digital-web.com/Syndicatedhttp://rss.blogspace.com/blogify?public=1&url=http%3A%2F%2Fwww.digital-web.com%2Fnew%2Findex.shtmlhttp://www.eleganthack.com/Rejectedhttp://www.ourfavoritesongs.com/users/bill.litfin@motivo.com/eleganthack.xmlhttp://www.eleganthack.com/Syndicatedhttp://www.eleganthack.com/index.xmlhttp://www.emdezine.com/designwritings/Not Found http://www.evolt.org/category/IA_Usability/4090/index.htmlNot Found http://www.giantant.com/antenna/Rejectedhttp://www.ourfavoritesongs.com/users/bill.litfin@motivo.com/antenna.xmlhttp://www.heyotwell.com/heyblog/index.htmlNot Found http://www.iaslash.org/Syndicatedhttp://www.iaslash.org/ia/backend.phphttp://www.iaslash.org/Rejectedhttp://www.iaslash.org/ia/module.php?mod=node&op=feedhttp://www.iaslash.org/Rejectedhttp://www.iaslash.org/ia/module.php?mod=node&op=feedhttp://www.iawiki.net/Not Found http://www.info-arch.org/lists/sigia-l/Not Found http://www.jjg.net/Not Found http://www.kottke.org/Syndicatedhttp://www.newsisfree.com/HPE/xml/feeds/60/2560.xmlhttp://www.louisrosenfeld.com/DesiringSyndicationhttp://www.makovision.com/Not Found http://www.mersault.com/thinking/Not Found http://www.meryl.net/blog/Syndicatedhttp://www.meryl.net/blog/index.xmlhttp://www.noisebetweenstations.com/personal/weblogs/Not Found http://www.othermedia.com/blog/Not Found http://www.peterme.com/DesiringSyndicationhttp://www.poorbuthappy.com/ease/Not Found http://www.resourceshelf.blogspot.com/Not Found http://www.semanticstudios.com/Not Found http://www.studioid.com/DesiringSyndicationhttp://www.syntax-design.de/usability/Not Found http://www.tomalak.org/Syndicatedhttp://static.userland.com/tomalak/links2.xmlhttp://www.tomalak.org/Rejectedhttp://www.tomalak.org/recentTodaysLinks.xmlhttp://www.uoguelph.ca/~stuartr/Not Found http://www.usabilitynews.com/Not Found http://www.useit.com/Syndicatedhttp://www.newsisfree.com/HPE/xml/feeds/27/827.xmlhttp://www.v-2.org/main.shtmlNot Found http://www.vanderwal.net/random/index.phpNot Found http://www.web-graphics.com/Not Found http://www.website-analyst.co.il/lucdesk/lucdesk.htmlDesiringSyndicationhttp://www.website-analyst.co.il/lucdesk/lucdesk.htmlSyndicatedhttp://www.website-analyst.co.il/lucdesk/lucdesk_rss.xmlhttp://www.webword.com/Not Found http://www.wiremine.org/Not Found 

Bill, If we do RSS feeds i

Bill,

If we do RSS feeds instead, doesn't that mean we'll only get the current content? I wanted to be able to search old content as well.

-Michael

RSS and past content

If you search the site and it keeps the old data then using the HTML will get you more data. If you want to read the current material without the hassles of scraping the content then reading the RSS files might be worth considering. You could also grab the RSS files and keep the content for later searching.

Perhaps a combination of the two might be appropriate.

RSS not the complete solution

Technically (IIRC), RSS 0.92 has a max-items count of 15, and so if there has been more than 15 new items since the last time the RSS.XML was sucked down stuff will be missing.

Good direction

I like where this is going. This could be a helpful resource. I tried finding using the search term 'metaphor' to find the discussion on peterme from November that triggered the metaphor of attraction. Was the indexing done on just the current index/home pages?

Yeah, I'm hitting some bugs.

Yeah, I'm hitting some bugs. Not sure it is going as deeply as I'm setting it to go. Hang in, maybe I can get it to work. If not, I'm going to try to compile htdig.

Even with htdig, I still coul

Even with htdig, I still couldn't find anything using all of the terms "metaphor of attraction" in one search. Found a lot using just "metaphor" but didn't sift through them all.

Using other terms worked

I used "navigation" and "metaphor" to trackdown the peterme discussion that I knew existed.

I really like this tool.

smarter IAwiki crawling...

I would recommend that you start the IAwiki crawl from http://www.IAwiki.net/cgi-bin/wiki.pl?action=rc&days=7&all=0&showedit=1, and then only follow links which don't include /cgi-bin/ in the URL, and crawl no deeper after that (other than the first time ever).

This way you will get all the changed pages (including minor edits), and none of the variable guff like page diffs or revision lists.

Blatent blanket repetitive dumb crawling already accounts for way more bandwidth than I care to know.

btw - do you respect robots.txt?

Thanks, E

I changed the URL to crawl the wiki. And yes, ht://dig respects robot.txt, so if you want sections to be ignored, they will be. -m

phrase as query term?

thanks - this will be very handy for searching the SIGIA list alone. I can never get Google to do that.

I couldn't get it to search on an exact phrase however, like "window navigation diagram" (including the quotes). Choosing ALL doesn't respect that word placement, and Boolean coughs blood.

I wonder how IAs write search queries? Are you keeping track? Please tell us!

I know...

I know you can't do phrase or proximity searching presently. You apparently can in the current beta version of ht:://dig, but I chickened out and installed the stable version. When the stable version allows this, I will install.

Sorry. -m

no apologies necessary

Stable = good.