Posted by: Kay at Suicyte | August 4, 2008

Search engines

Everybody seems to be blogging about new search engines these days. Most of them discuss the new CUIL search, which I found mostly disappointing. But so did everybody else. Over the last months, I have tried a couple of other search engines. What I typically do is a highly sophisticated benchmark involving a well-balanced testbed of three common search tasks:

  1. Search for the term “ubiquitin OR proteasome”. Check how many entries are found, and read all of them.
  2. Search for my own name. Vanity rules. High-scoring matches are typically 15-year old Usenet postings of mine, asking silly questions that nobody cares to answer. Runners-up are caused by other people appropriating my name, including one European scientist accused of scientific misconduct. Yuck. That wasn’t me, I promise!
  3. Search for the term “ataxin-3” or “Rpn13” (according to taste). If the highest scoring matches start like “comparison shop for ataxin-3 at cooldealz.com” or similar, the search engine gets bonus points.

Here are a few observations with notable search engines.

Google: Finds the largest number of entries, some of them sensible. Lists a certain amount of shopping sites, typically offering antibodies to everything biological that I throw at it (except for my name). For a while, the top entry on Rpn13 was a blog post here at my site, but this time is long gone and my blog post has disappeared in the abyss (i.e.  after position 10). What I hate about google is the microsoftesk attempt to outsmart the user by also showing results with ‘spelling corrections’ applied to, say, my name. Unfortunately, there are lots of people with a name similar to mine, and I am tired to see their pages on top when all I want to see is me, me, me.

Cuil: used to find a lot of entries, most of them with funny pictures of no obvious connection to the query. By now, the pictures seem to have gone, but so do a lot of the hits, too. As of today, the top hit on ‘ataxin-3’ points to a paper of mine (10 bonus points!), but strangely not to the paper itself but rather to a Nodalpoint page linking to the paper.

Gigablast: Not an NCBI product, despite the name. Ugly color scheme. There are two useful features: The categories on top are simple and intuitive and even seem to work (unlike some other search engines with built-in categorization). Even better: There is an easy way to filter for new entries (“freshness=2” restricts search to entries from the last two days). This is really useful, and I haven’t figured out how to do it in google (although I guess that it is possible somehow)

Vadlo: I learned about this just today from GTO. Supposed to be dedicated to life science content, e.g. protocols and the like. Fails miserably on my benchmarks. I don’t want go buy ataxin-2 antibodies when searching for ataxin-3. And no, I didn’t mean ‘Rp13’ when I typed ‘Rpn13’, thank you very much. The main feature of this site seems to be a daily science cartoon. I had to gasp when looking at todays cartoon (see below, the original is here: Life_in_Research_Cartoon_1138.html). Are they really making fun about a guy who talks about ubiquitination, shows a structure slide that looks a lot like mine, and is supposedly unable to run a decent agarose gel? How could they possible know? But wait, I am much fatter than the guy in the cartoon, I don’t usually wear a tie, and there are more than 3 people in the audience. This cannot be me, what a relief.

Cartoon on stupid ubiquitin guy

. . . . . .


Responses

  1. Google has advanced search options which allow you to limit the search to the last day, week, month, etc. but it’s not clear to me how to do it straight from the search line, if that’s what you’re interested it.

  2. Nice experiment.
    Maybe you could have also a metacrawler like dogpile or clusty.com (I can’t post many links here because it seems your spam filter doesn’t like it).

    Other scientific engines are:

    – scirus
    – hubmed
    http://citeseer.ist.psu.edu/
    http://www.ihop-net.org/

  3. You should have tried the keyword ataxin 2 in the databases category of Vadlo. You would have received some pretty useful info.

    They don’t have category for products, so your trying to do that and getting disappointed are of no use!!


Leave a comment

Categories