Posted by: Kay at Suicyte | August 12, 2007

Never trust a hypothesis ?

The last decade has seen a gradual replacement of hypothesis-driven research by hypothesis-free high throughput studies – at least when looking at the selection of publications that made the biggest splash. The reasons for this development are obvious: the development of new techniques, the availability of entire genome sequences, and progress in lab automation have enabled high-throughput research. Without any doubt, these large-scale studies yield more results per Euro, as it has always been much easier to find out anything in general rather than something in particular. Imagine an old fashioned biologist, throwing out his hypothesis-driven fishing line into the pond of science, hoping that a particular fish will be attracted by the highly specific bait. Then imagine a group of contemporary way cool high-throughput researchers (those guys always come in large groups), trawling the same pond with their fishing net, catching literally thousands of fishes in the same time. With a little luck, they also catch the particular fish that the first guy was after. If so, great. If not, who cares – there are lots of other fishermen out there. Some of them will be interested in the fish caught in the trawling effort.

When I come to think about this kind of situation, which I do quite often, I always feel terribly old-fashioned, if not old – much older than I really am. I always tend to sympathize with the lone fisherman (think: old man and the sea). Leaving aside thoughts about who is more deserving of the catch, there are quite a few arguments in favour of hypothesis-driven approaches. Here are the two I consider the most important ones: i) the pond of science is actually quite big, and if it is a rare fish you are after, you might not be able to find it in a non-targeted trawling approach, and ii) given the large numbers of fish that the trawlers have to deal with, plus the fact that they are often more experts in trawling techniques than in marine biology, chances are good that they don’t even recognize what they have caught – or at least do not appreciate its significance.

There is a third issue that is heard often in discussions on hypothesis-driven vs. high-througput approaches: reliability. The rest of this post will be devoted to this particular issue, and you will see that I entertain a rather unconventional opinion on this topic. For simplicity, I will restrict the discussion to a particular topic, which is the detection of protein-protein interactions. However, most of my points will also apply to most other research areas. In the protein interaction field, errors come in two flavours: reporting interactions that are not real, and missing interactions that are real. There is little doubt that systematic studies produce on average more errors than a good hypothesis-driven study. The reason is simply that because there are more results, less time and effort can be devoted to reliability testing per result. However, there are several factors counter-acting this trend, e.g. the detection of notorious trouble makers among the interaction constructs is much easier to do in a large data set than in a single experiment.

Before discussing the relative merits of the approaches, I would like to ask a question. Imagine that you are interested in a protein X. You check the literature and various databases and find a single report that the protein X interacts with the p53 protein, which is quite unexpected but could have interesting implications. Before you start to pursue this line of research, you ponder the reliability of the published evidence. At this point, please imagine 5 different scenarios concerning the evidence’s provenance :

  1. A researcher working on protein X had the hypothesis that it might interact with p53, tested it and found it to be true.
  2. A researcher working on p53 had the hypothesis that it might interact with protein X, tested it and found it to be true.
  3. A researcher working on protein X wanted to find out what it does, did a screen, and found that it binds to p53
  4. A researcher working on p53 wanted to find further interactors, did a screen, and found protein X
  5. The interaction p53 :: protein X was found in a systematic high-throughput study and was verified.

Let us also assume that in all five cases the same experimental methods were used (e.g. an initial yeast two-hybrid assay, followed by a confirmation through Co-IP). My question is, in which scenario would you put most trust into the reported interaction? As an unfaltering believer in published science, this should be a non-issue. The result has been verified by two independent methods, it has passed a peer review, it is published, thus you should assume that it is true. Most of us (at least those who have ever done experiments on their own) should be more skeptical. From what I wrote in the initial paragraphs of this post, you would assume that I put most trust in scenarios 1 and 2, followed by 3 and 4, and be very skeptical about scenario 5. Here is what I really think: most trust in 4 and 5, followed by 2 and3, most skeptical about 1.

Odd, huh? Let me try to explain. I am somewhat skeptical (you may say: paranoid) about scenarios 1 and 2, because here the researcher had a theory that X and p53 should interact. Without alleging any (intentional) misconduct, I would still be afraid that those researchers have tried their best to find this interaction. If it didn’t work the first time, they might have changed the conditions. If it still didn’t work, they might have used a slightly different construct, and so on. One should not over-estimate the value of controls and verifications. In a typical hypothesis-driven study, only one negative control will be provided. Strictly speaking, this only shows that protein X interacts better with p53 than does protein Y. It is still possible that half of the proteome shows an equally convincing interaction with p53 as protein X. Verifications with independent methods are certainly useful, but at that stage the researcher is even more convinced that the proteins interact: it is predicted by the hypothesis and has been shown in the Y2H experiment. As a consequence, even more parameter tweaking will take place to confirm the interaction. There are also statistical issues, which I don’t want to discuss in here any detail. While thinking about this, I can only conclude that I have lost most of my faith in published experimental confirmations of hypotheses.

Some of my scepticism about scenarios 1 and 3 is explained in my previous post about p53. The idea that people like to see interactions with exciting proteins makes me favour scenario 2 over 1, and 4 over 2; P53 researchers have not much to gain from seeing an interaction with protein X. The reasons why I would mostly trust scenario 5 are the following: The approach was unbiased and the researchers do not see the result as a proof of their superior intellect. Moreover, in a high-throughput setting, chances are good that the screen and the confirmation have only been tried under a single condition.

It is true that high-throughput studies are error prone, too, but these are different kinds of errors. High-throughput errors are often false-negatives (type II) rather than false-positives (type I), and they often are of statistical rather than systematic nature. By contrast, I would assume that hypothesis-driven research has a greater danger of type I and systematic errors, always biased towards confirming the theory.

Is the situation really all that bad, and is there anything we can do about it? Maybe I am exaggerating, maybe I am paranoid – I wouldn’t disagree on both counts. Certainly, not all scientists are equally susceptible to self-delusion. Nevertheless, I think the problem is real. If you don’t believe me, go have a look at the QED paper published 2001 in Genome Biology, a controversial but very instructive piece of work, to which I have contributed some minor examples.

Advertisements

Responses

  1. Good write-up of the current situation in biology.

    My only complaint is that I think your fisherman analogy sets big science and little science up as adversarial, where I see it as more of a synergistic relationship.

    Big biology (high-throughput screens, genome-wide association studies, etc) is there to narrow down the list of potential targets to a manageable size. This knowledge allows your lone fisherman to test 2 or 3 likely hypotheses, rather than facing overwheming numbers of gene or proteins in a list.

    I understand that the two approaches are competing for grant dollars right now, but I don’t think it’s a situation that will last much longer. Sequencing genomes is getting ridiculously cheap, and I expect that we’ll see similar advances in proteomics over the next 20 years of so.

    As the high-throughput technologies mature, they’ll require less R&D funding, and the small science labs will be able to pick up that money and go back to what they do best – proving hypotheses. The difference is, at that point, they’ll be doing it with vastly more information at their disposal. I see it as a win-win situation.

  2. I think The author of that writing is capable in science.According to me the hypothesis must be proved to get a satisfied result.In an example there is any person that you never knew before.How can we make some steps to solve this problem ? First you have to get the name.Check it out and find it out all of that person’s identification.Proving with the one is important to do in the field.Maybe you have learned about how
    a hyprothesis helped a scientist for any invention.Thank you for all readers

  3. Great writeup man, thanks 🙂
    I agree with Chris, as these high-through-put technologies become cheaper and standardized with nice piece of analytical engines (with the help of those lonely fisherman coming from mathematics, computer science, physics, chemistry, biology and linguistics) it will be easier to do replicate studies at global level and do meaningful statistical analysis, unlike the GIGO state prevalent now. Sound (unbiased as you said) hypothesis will be build on the comprehensive data with replicates. Then the wet-lab territory with be more focused for the lone sailor to work on 🙂

  4. nice post. It made me remind of Jabberwock’s column in the Journal of Cell Science on “life-changing moments and the death of the hypothesis”. Not exactly the same as your topic, but it adds more issues on (non)hypotheses & high-troughput. (doi: 10.1242/10.1242/jcs.02591)
    http://jcs.biologists.org/cgi/content/full/118/18/4075?maxtoshow=&HITS=20&hits=20&RESULTFORMAT=&searchid=1&FIRSTINDEX=0&displaysectionid=Sticky+Wicket&resourcetype=HWCIT

  5. I think it was in the interactome meeting last year that there was this general discussion exactly between protein-interactions screening and hypothesis driven small scale studies. Several people were comparing protein interaction mapping with genome sequencing and how the development of high-throughout sequencing with standardization of methods and quality measures lead to an huge improvement in method accuracy. To a point were new high-throughput sequencing was more accurate than the sequencing that was done in the labs. Today few people actually think about sequencing errors and I don’t know anyone sequencing in the lab.

    One thing about the fishing expeditions. What is the point of forcing someone to find interesting cases in a dataset to showcase in glorified press release if the dataset itself is interesting outcome of the research? Give out the data with proper quality assessment and methods, give it a DOI and put out as many real press releases as needed.

  6. that was an interesting read! 🙂

  7. First, thanks for all the positive feedback! here are some responses:

    #1 Chris: You are right, hypothesis-driven research and large-scale research typically do not compete. They might compete for grants, which can be a serious issue, but this is not what I was talking about. Relevant here is only the competition for results. Despite the very different approaches, there can be some overlap, and I plan to make this the subject of a future posting.

    #2 Karta: Checking names and affiliations is the typical reaction of scientists when they encounter published data. It is not considered good style, but it is common practice. Well, maybe looking at names is o.k. (you know that Dr. A is a careful and scrupulous worker, while Dr. B is a maverick with a history of selling preliminary data as final results). Often, you don’t know the people involved. What most people do then is to look at the affiliation (Dr. C is at Harvard, he must be correct, while Dr. D. is at U. of Southern Antarctica, which is not in the premier league of science places – let’s rather trust Dr. C).

    #3 Animesh: Let us all hope that your vision of the future will come true. I am a pessimist by nature, and rather envisage a HTGIHTGO world (where HT means high-throughput).

    #4: Marijke: thanks for the link! very interesting reading. This ‘jaberwock’ really seems to be an old school hypothesis-driven scientist. I am not yet sure if I would subscribe to his/her view of discarding non-hypothesis driven work as non-science. I am not that rigorous, although there are quite a few things that are commonly considered as ‘science’ which I would rather call something else (which does not mean that they are not important or not useful!)

    #5: Pedro: As I mentioned in my post, I agree that large-scale research and method standardization tends to reduce some kinds of errors (mainly the false-positive or type 1 errors). The tweaking of conditions, which I had described rather negatively in my post) will increase false-positives, but I believe that it can be necessary and useful for avoiding false-negatives. I know of several examples, where protein interaction in multi-domain proteins either works exclusively with isolated domains, or with entire proteins. Both results can have interesting implications, but you cannot expect this kind of result from your average high throughput approach. With regard to glory and press releases: on one hand there is the old school that considers unbiased work as non-science (c.f. Jabberwock), on the other hand there is also plenty of Nature papers and press releases for high-throughput research. I think future science needs both approaches.

    #6: narziss: thank you!

  8. Very nice post and I like your views on the reliability of unbiased evidence! I just wanted to add that there are also people who do not eat fish 😉
    In other words, I don’t think that HTP experiments should necessarily all be reduced to “screens” that, as Chris writes, “narrow down the list of potential targets”. Genome-wide datasets, and in particular the integrations between different large datasets, allows also asking questions that the lone old fisherman would not be able to ask. Do fishes swim alone or in groups (“modularity”)? How many fishes can I fish without endangering the species (“robustness”)? How will this affect prey and predators (“dynamics”)? Etc… The analogy has its limit, but properties like modularity, robustness, stability, number and dynamics of functional states, pleiotropy, redundancy etc… require different strategies than the classical fishing-my-gene-of-interest approach. Even more fundamentally, at the foundation of systems biology is the realization that functional states of a biological system are best understood by considering all the system’s components (think of functional states of a metabolic network, for example). If you decide that the system is the entire cell, which is an ambitious but natural choice, then the need for HTP data to generate cell-wide models comes as a logical consequence.

  9. #8: Thomas, a very interesting remark. I have not touched the aspect of systerms biology in my post, I was just thinking of large-scale studies that try to achieve the same thing as small-scale hypothesis-driven studies, but on a larger scale (and without a hypothesis). My somewhat narrow view is caused by my own field of work. Several years in the biotech industry amount to a brain washing – I am nowadays mainly interested in results that can be turned into drug target and the like, not so much in finding general trends in the organization of biologicla networks. I am not implying that the latter is not interesting – it is just not my field of work.

  10. […] from Suicyte Notes has a thoughtful post regarding scientific research approaches. A good […]


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Categories

%d bloggers like this: