Today could turn out to be a dark day for my approach to research. I just had a look at the latest online issue of Nature and found the first report on the ENCODE project. For those of you who don’t known about ENCODE: it is a huge project trying to analyze the genome by both experimental and bioinformatical methods, with the aim to find and catalog all ‘functional elements’ encoded by the genomic DNA. In this context, ‘functional elements’ include various classes, such as coding regions, 5′ and 3′ UTRs, promoters, enhancers, noncoding RNAs, chromatin organizers and lots of other protein binding sites.
I must admit that I have not read the entire paper, which comprises 18 information-dense pages. Catching up on this is rather high on my to-do list, but I first have to deal with some urgent job-related issues. However, what I read by skipping through the pages and reading Nature’s news article is enough to give me some headaches. Here is what I am worried about:
But when the different groups compared their results, they found that their predictions about key portions of the genome didn’t always agree: the biologists’ list of functional sequences didn’t match the computational group’s list of constrained sequences.
This quote comes from the news article by Erika Check, where she goes on to explain some of the observed discrepancies and to mention what several ENCODE participants can provide by way of explanation. Nothing of this offers much comfort to me. As I had mentioned in previous posts, I am a bioinformatician trying to predict function from sequence. One of my mantras is “if it is not conserved, it is not important”. If ENCODE turns out to be right, I will have to re-think this assumption. Not good.
As I haven’t read the full paper so far, there is still hope that the authors use a concept of ‘functionality’ that differs from mine. Actually, chances are not bad, as ‘functionality’ is one of the issues where I am the proud owner of a minority opinion. I always intended to make this a central topic of another posting, and probably will do so in the near future. Just briefly, I don’t subscribe to the idea that everything that exists in biology has a particular purpose. For reasons that I don’t understand, almost everybody in the biosciences seems to think differently.
Let me just give you one example: everybody seems to believe that alternative splicing is a pervasive and incredibly important phenomenon. Not me. Some aspects I do agree on, like i) there are some examples where alternative splicing is really important, and ii) at a certain level of detection sensitivity, splicing variants can be seen for many genes. However, I do not agree that one can conclude from those two observations that alternative splicing is important for many (if not all) genes. My alternative explanation is that most of the perceived ‘alternative splicing’ should rather be called ‘faulty splicing’. It is not functionally important, but is just tolerated by the cell. In my opinion, a certain degree of splicing variability is the default for any gene, which is perfectly o.k. as long as there is enough of the ‘correct’ splice form and the splice variants are not toxic. According to this alternative explanation, a 100% unique splicing pattern is only observed for those genes where any variant would be detrimental. In most other cases, splice variants either don’t hurt or are being taken care of by nonsense-mediated decay and related QC mechanisms.
Here is my point: What it would take to convince me that a certain alternative splicing event is meaningful, is showing that the splicing pattern is conserved in evolution. If now the ENCODE guys come along and tell me that there are many functionally important things going on without any evolutionary constraints, this would kill (or at least severely wound) my idea of how biology works.
What I really have to check now is what kind of functionality the ENCODE people find in the absence of evolutionary constraint. It the ‘functionality’ turns out to be the mere presence within a transcript, or the observed binding of a protein, I could easily do away with that: I would just argue that the presence of that region in a transcript doesn’t matter to the cell and thus is not actively suppressed. The same could be done with protein-DNA binding or various other alleged ‘functionalities’. Problem solved.
However, if ENCODE shows a real evidence for a function, e.g. a detrimental effect of a mutation, or the vital importance of having this region included in a transcript, I would probably have to look for a new job.
Finally, a search on who else has reported or blogged on this topic: New Scientist, Konrad’s considerations , Egghead , Nobel Intent , Eureka, GenomeWeb, Genomicron, Science Blog, and probably lots of others that I have missed.