Posted by: Kay at Suicyte | July 22, 2007

Text mining in Vienna

I am currently attending the ISMB conference in Vienna, my first ISMB after a 3 year hiatus. As usual, a packed program that does not leave much time for blogging. My group is not presenting anything, so we can enjoy the 200 talks and 1000 posters. ISMBs are always a good occasion to meet people – yesterday I met a guy I had last seen 1991 during my first bioinformatics conference ever.

This year, I am particularly interested in text mining – not that i want to start doing text mining myself, but I am interested to see what resources based on text-mining efforts are available. Yesterday, I attended a tutorial called “automatic text analysis based on web services“, which looked just right for my interest. Although the presenter, EBI’s Dietrich Rebholz-Schuhmann, did quite a good job explaining the text mining concepts, I was nevertheless disappointed. Most of the tutorial was just like all the other text mining talks I have heard in recent years: many slided explaining why text mining is important (which I already know), many more slides on how complicated text mining can be (which is the reason why I am not doing it myself). Every self-respecting text miner has a collection of slides showing how difficult it is to recognize a gene name (or a chemical name) in a scientific text, and what the pitfalls are. Apparently, there is a drosophila gene called ‘how’, with the alternative name ‘who’. Unfortunately, the tutorial was rather short on what kind of ‘web services’ actually are available, and how they can be used. There were a couple of links, though. I will try them when i am back in the office – in case any of this turns out to be useful, I will write about it.

Today, we heard a very lively keynote talk by Michal Eisen, the brother of the equally famous Jonathan Eisen. The talk was mostly about patterning genes in the fly embryo, and how to find the relevant transcription factor binding sites. At the end, Michael surprised the audience with the recommendation that bioinformaticians should stop writing software for microarray analysis, as in two years time nobody will be using microarrays anyway. And that we should either publish in open access journals or die. I am not sure if I agree with his first suggestion (disclaimer: I am working for a microarray company). The latter suggestion made me think of Pierre’s nightmare #9 (with me, wearing an Elsevier T-shirt).

More on ISMB when I find the time. Got to sleep now – the meeting starts at 8.15 (yuck)



  1. I guess what he meant was that the gene expression information will come directly from sequencing instead of microarrays. Still we need tools to analyze the gene expression information no matter where it comes from.

  2. OMG – It is in print now! I am equally famous to my brother. I mean – I know I am known by some people but alas, when your brother has like the single most cited paper in the history of PNAS (his array clustering paper) and is one of the founders of PLoS, it is hard to keep up. Though I am certain you are not correct in your statement, I will use it for all it is worth.

  3. Pedro: yes, this was exactly what he was refering to. I don’t known enough about the modern sequencing techniques to judge if this is feasible – maybe. I was not totally serious, even if this is true, my job is not in danger, as I more concerned about high-level analysis. But at least the folks inventing new Affy probe level methods should be worried.

    Jonathan: I have heard that you give frequent interviews on the radio and in newspapers. For me, this is as close to being famous as it gets – at least for somebody in our field. I wouldn’t overrate a highly-cited method paper. Even your brother said that he doesn’t want to talk about it anymore. I can understand him, my own top-cited paper is also far from being the scientifically most important one – just a method paper in NAR (although probably cited less than 10% of Michael’s clustering paper)

  4. Fine fine … I am on the radio and in papers. And yes, my brother may not want to talk about his clustering paper. But alas, he is still more famous than I am (not complaining — just stating a fact).

  5. I believe you may have misheard Michal. I believe he said that genomic tiling arrays will be obsolete in a a couple of years and thus people working on algorithms for them should run for the hills.

    If this is true then I couldn’t agree more. These arrays are used to scan unsequenced strains of bacteria or individuals of a population that have a reference genome already sequenced. Basically, this will be pointless once sequencing is quicker than the time it takes to design the array.

    It would be nice if Michal could clarify whether he was rerring to regular expression arrays or genomic arrays.

    I was actually going to blog about this, but alas you beat me to it!

    P.S. Drop by poster K13 if you would like to chat about blogging.

