Posted by: Kay at Suicyte | June 14, 2007

ENCODEing functionality

Today could turn out to be a dark day for my approach to research. I just had a look at the latest online issue of Nature and found the first report on the ENCODE project. For those of you who don’t known about ENCODE: it is a huge project trying to analyze the genome by both experimental and bioinformatical methods, with the aim to find and catalog all ‘functional elements’ encoded by the genomic DNA. In this context, ‘functional elements’ include various classes, such as coding regions, 5′ and 3′ UTRs, promoters, enhancers, noncoding RNAs, chromatin organizers and lots of other protein binding sites.

I must admit that I have not read the entire paper, which comprises 18 information-dense pages. Catching up on this is rather high on my to-do list, but I first have to deal with some urgent job-related issues. However, what I read by skipping through the pages and reading Nature’s news article is enough to give me some headaches. Here is what I am worried about:

But when the different groups compared their results, they found that their predictions about key portions of the genome didn’t always agree: the biologists’ list of functional sequences didn’t match the computational group’s list of constrained sequences.

This quote comes from the news article by Erika Check, where she goes on to explain some of the observed discrepancies and to mention what several ENCODE participants can provide by way of explanation. Nothing of this offers much comfort to me. As I had mentioned in previous posts, I am a bioinformatician trying to predict function from sequence. One of my mantras is “if it is not conserved, it is not important”. If ENCODE turns out to be right, I will have to re-think this assumption. Not good.

As I haven’t read the full paper so far, there is still hope that the authors use a concept of ‘functionality’ that differs from mine. Actually, chances are not bad, as ‘functionality’ is one of the issues where I am the proud owner of a minority opinion. I always intended to make this a central topic of another posting, and probably will do so in the near future. Just briefly, I don’t subscribe to the idea that everything that exists in biology has a particular purpose. For reasons that I don’t understand, almost everybody in the biosciences seems to think differently.

Let me just give you one example: everybody seems to believe that alternative splicing is a pervasive and incredibly important phenomenon. Not me. Some aspects I do agree on, like i) there are some examples where alternative splicing is really important, and ii) at a certain level of detection sensitivity, splicing variants can be seen for many genes. However, I do not agree that one can conclude from those two observations that alternative splicing is important for many (if not all) genes. My alternative explanation is that most of the perceived ‘alternative splicing’ should rather be called ‘faulty splicing’. It is not functionally important, but is just tolerated by the cell. In my opinion, a certain degree of splicing variability is the default for any gene, which is perfectly o.k. as long as there is enough of the ‘correct’ splice form and the splice variants are not toxic. According to this alternative explanation, a 100% unique splicing pattern is only observed for those genes where any variant would be detrimental. In most other cases, splice variants either don’t hurt or are being taken care of by nonsense-mediated decay and related QC mechanisms.

Here is my point: What it would take to convince me that a certain alternative splicing event is meaningful, is showing that the splicing pattern is conserved in evolution. If now the ENCODE guys come along and tell me that there are many functionally important things going on without any evolutionary constraints, this would kill (or at least severely wound) my idea of how biology works.

What I really have to check now is what kind of functionality the ENCODE people find in the absence of evolutionary constraint. It the ‘functionality’ turns out to be the mere presence within a transcript, or the observed binding of a protein, I could easily do away with that: I would just argue that the presence of that region in a transcript doesn’t matter to the cell and thus is not actively suppressed. The same could be done with protein-DNA binding or various other alleged ‘functionalities’. Problem solved.

However, if ENCODE shows a real evidence for a function, e.g. a detrimental effect of a mutation, or the vital importance of having this region included in a transcript, I would probably have to look for a new job.

Finally, a search on who else has reported or blogged on this topic: New Scientist, Konrad’s considerations , Egghead , Nobel Intent , Eureka, GenomeWeb, Genomicron, Science Blog, and probably lots of others that I have missed.



  1. If every important thing was conserved wouldn’t we be a single species ? I also follow the same usage of conservation for my analysis but there must be important functional elements that are not conserved.

  2. Pedro,
    when I am talking about ‘conserved’, I do not mean ‘identical’. Obviously, not all of the important bits can be identical. But I have problems believing that important sequence features are floating freely – and that is what’s being claimed in the ENCODE paper. They talk about being “free from evolutionary constraints”.

  3. I stumbled upon your website while looking for some other information and thought I would leave a quick comment here. I believe that there is a huge challenge in identifying what is truely conserved versus perceived “junk” as it’s so often called. If you look at our understanding of riboswitches, very small regions of strict sequence conservation can give rise to very robust regulatory interactions. The structures are what are highly conserved, but there are many opportunities for sequence variability.

    As for the comment about alternative splicing not being important. I direct you to a recent Nature publication from some of my colleagues in my research group.

    I think it is especially relevent to your comments. While I do think there is some degree of miss-processing that goes on, I think quite a bit of this will end up having some function as more details begin to be elucidated. Just my $0.02.

  4. Ben,
    thanks for your comment and the link to the riboswitch paper. I need to do some catching up on this topic.
    As I said in my post, I have some doubts that the majority of alternative splicing is of fundamental importance, but I am ready to change my mind if there is good evidence.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s


%d bloggers like this: