Everybody seems to be blogging about new search engines these days. Most of them discuss the new CUIL search, which I found mostly disappointing. But so did everybody else. Over the last months, I have tried a couple of other search engines. What I typically do is a highly sophisticated benchmark involving a well-balanced testbed of three common search tasks:
- Search for the term “ubiquitin OR proteasome”. Check how many entries are found, and read all of them.
- Search for my own name. Vanity rules. High-scoring matches are typically 15-year old Usenet postings of mine, asking silly questions that nobody cares to answer. Runners-up are caused by other people appropriating my name, including one European scientist accused of scientific misconduct. Yuck. That wasn’t me, I promise!
- Search for the term “ataxin-3″ or “Rpn13″ (according to taste). If the highest scoring matches start like “comparison shop for ataxin-3 at cooldealz.com” or similar, the search engine gets bonus points.
Here are a few observations with notable search engines.
Google: Finds the largest number of entries, some of them sensible. Lists a certain amount of shopping sites, typically offering antibodies to everything biological that I throw at it (except for my name). For a while, the top entry on Rpn13 was a blog post here at my site, but this time is long gone and my blog post has disappeared in the abyss (i.e. after position 10). What I hate about google is the microsoftesk attempt to outsmart the user by also showing results with ‘spelling corrections’ applied to, say, my name. Unfortunately, there are lots of people with a name similar to mine, and I am tired to see their pages on top when all I want to see is me, me, me.
Cuil: used to find a lot of entries, most of them with funny pictures of no obvious connection to the query. By now, the pictures seem to have gone, but so do a lot of the hits, too. As of today, the top hit on ‘ataxin-3′ points to a paper of mine (10 bonus points!), but strangely not to the paper itself but rather to a Nodalpoint page linking to the paper.
Gigablast: Not an NCBI product, despite the name. Ugly color scheme. There are two useful features: The categories on top are simple and intuitive and even seem to work (unlike some other search engines with built-in categorization). Even better: There is an easy way to filter for new entries (“freshness=2″ restricts search to entries from the last two days). This is really useful, and I haven’t figured out how to do it in google (although I guess that it is possible somehow)
Vadlo: I learned about this just today from GTO. Supposed to be dedicated to life science content, e.g. protocols and the like. Fails miserably on my benchmarks. I don’t want go buy ataxin-2 antibodies when searching for ataxin-3. And no, I didn’t mean ‘Rp13′ when I typed ‘Rpn13′, thank you very much. The main feature of this site seems to be a daily science cartoon. I had to gasp when looking at todays cartoon (see below, the original is here: Life_in_Research_Cartoon_1138.html). Are they really making fun about a guy who talks about ubiquitination, shows a structure slide that looks a lot like mine, and is supposedly unable to run a decent agarose gel? How could they possible know? But wait, I am much fatter than the guy in the cartoon, I don’t usually wear a tie, and there are more than 3 people in the audience. This cannot be me, what a relief.
. . . . . .