In the last few days, I have written two posts on the Gene Ontology project (GO), the first and rather polemic one on the problems I encountered when using GO, the second one making some suggestions on how to improve GO.
Now, I am having second thoughts on whether it was a good idea to criticize a resource that takes so much effort to create and maintain, but nevertheless is free for all. It is clearly a kind of work that is useful for many scientists, and also something most people (including myself) wouldn’t want to do on their own. So after all, we should be thankful that this project exists. Nevertheless, after pondering this question for the last day, I think that some amount of constructive criticism is warranted.
For one thing, there is no shortage of reports on how great GO is, and how many problems in biology it is going to solve. To give you just a selection, there are:
- All systems GO for understanding mouse gene function
- Get ready to GO! A biologist’s guide to the Gene Ontology
- It’s All GO for Plant Scientists
There is another, maybe more serious issue: I have seen an increasing number of papers that describe new tools for things as clustering or function prediction, which actually use the GO annotations as a gold standard for benchmarking their methods. This is something that makes my hair stand on end.
It seems that I am not the only one concerned about GO quality. There are also a number of papers (free download) that deal with evaluating GO and correcting errors:
- Evaluation of high-throughput functional categorization of human disease genes
- GOChase: correcting errors from Gene Ontology-based annotations for gene products.
- Quality control for terms and definitions in ontologies and taxonomies
- Estimating the annotation error rate of curated GO database sequence annotations
In particular the latter paper finds the annotation error rate to be in the range of 28-30%. To my surprise, the authors call this error rate “reasonably low” – maybe my expectations are just too high?
Finally, here is a paper that makes suggestions how to use expert systems for creating GO annotations: An evaluation of GO annotation retrieval for BioCreAtIvE and GOA. It is more or less the opposite of what I would like to see, but I guess this will be the future of GO.
I have to say that I expected to see some more comments, maybe even to get roasted by infuriated GO fans. I am not sure if the lack of feedback means that people tend to agree, or if nobody is interested in GO. One interesting comment from Jacob Frelinger pointed me to an online article by Clay Shirky, entitled “Ontology is Overrated: Categories, Links, and Tags”. While this text is a pleasure to read and has many interesting facets, it generally focuses on a different kind of problems with ontologies. I think that the concept of an ontology is suited for GO’s purpose, but it should be handled in a more quality-controlled manner.
P.S. I will write a mail to the GO helpdesk and point them to the relevant entries in this blog – maybe that will work to provoke the toxic comments I am expecting.