Posted by: Kay at Suicyte | December 16, 2008

Microarrays may be bad, but not that bad.

I normally do not blog about topics related to my daytime job, which involves a lot of microarray data analysis. However, a series of recent blog posts [here, here and here] talk about microarray-related problems that differ so much from my own experiences that I cannot let them go uncommented.

I am the last person to claim that microarrays are a perfect tool for tackling all questions conceivable . They are not.  DNA microarrays can be seen as some kind of hammer that is being (rightfully) applied to a few nails, but unfortunately also to lots of objects with no nail-like properties whatsoever. Microarray data are problematic in many different ways. However, we should be careful not to throw out the baby with the bath water.

Here are the main points of criticism that have been raised in the recent posts, along with my comment. I might exaggerate to some extent, but this just serves to make my point more clear.

1) Microarrays are useless, because it has been shown that protein levels correlate poorly with mRNA levels. You hear this argument a lot, especially from the mass-spec people, who want to convince you that only they have a handle on the truth. I admit freely: microarray are mostly useless if you want to learn about protein levels. This is not a nail, go use another tool. You should use microarrays mainly if you are interested in mRNA levels. There are lots of interesting applications for that, e.g. learning which transcription factors are activated. Several stress responses, including those to toxic substances, lead to a dramatic and very specific induction of certain mRNAs. You don’t have to know if the corresponding proteins are really being made, the transcriptional response is the earliest and most specific indicator of many stress conditions. This knowledge can be very useful in its own right. Just don’t try to predict whether there is more protein A in the cell than protein B, just by looking at their microarray signals.

By the way, microarrays are somewhat better in judging changes of protein levels, rather than the protein levels themselves. But still, if protein levels are what you are after, your should turn to another tool.

2) Microrray experiments cannot be trusted because the statistical significance values are wrong. This argument is reiterated here, and the author certainly has a point. Somewhat surprisingly, the examples used in the blog post talk about genetic associations studies rather than the common gene-expression  microarrays. There also seems to be some confusion about the numbers of SNPs vs the number of genes. Nevertheless, the main problem is shared between GWAS and transcriptomics studies: a microarray gives you tons of data and chances that one of the genes appears as strongly regulated just by chance alone is substantial. On the other hand, this ‘multiple testing’ problem is well known in the microarray field and is routinely taken into account. There are methods to correct for the bias in p-value (best known is the ‘Bonferroni correction‘),  Thus, a situation similar to the one described in the blog post would certainly not reach a p-value of 0.05, at least not  in a responsible microarray analysis.

3) Batch effects play a major role and often conceal the real regulation. Admitted, there are batch effects. However, with modern microarray platforms and hybridization methods they can be be safely neglected – at least in comparison to other common noise sources. Obviously, batch effects depend on the technology used. I have experience with three different microarray platforms (two major vendors and and one type developed by the company I work for), and for each of them the batch effects were typically much smaller than the noise from sample preparation or inter-individual differences.

While we are talking about noise sources, here are what I consider the main offenders:

1) Sampling. Particularly problematic when dealing with surgical or biopsy samples. Are you sure that each of your biopsies samples exactly the same tissue structure? With the same relative proportions of cells?  Same amount of blood in the tissue samples? Least problematic when comparing things like treated and untreated cell lines.

2) Inter-individual differences. This problem is ofter under-appreciated but is slowly gaining publicity. Most problematic when dealing with human samples or other outbred (animal-)populations. The differences between ‘healthy’ tissue of two donors are often much more pronounced than between ‘healthy’ and ‘diseased’ tissue of the same donor. Less problematic when dealing with imbred strains or cell culture. Even then, there still might be inter-individual differences related to e.g. nutrition status, circadian effects, etc.

3) Extreme amplification protocols. For many microarray studies, the available material is severely limited. Compliance of tissue donors is often inversely correlated with the size of the biopsy needle. There are several protocols for getting sufficient cDNA for microarray analysis out of very small samples, and some of them are clearly better than others. However, all of them share one common problem: less starting material means more dramatic amplification, which in turn means more noise.

Needless to say that most of these problems can be overcome by using really large sample numbers. Unfortunately, this if often impossible due to limited availability of samples or money. As a consequence, we have to live with the shortcomings mentioned above. I usually recommend that microarray results should not be considered the final outcome of an experiment, but rather as a method for identifiying candidate genes that can be used for a more detailed follow-up study.

For the sake of full disclosure: if you haven’t noticed already, I am working for a company that sells microarrays, microarray services and microarray data analysis. Obviously, this affiliation might bias my view of things. Nevertheless, I speak only for myself and not for my employer. I have tried my best to keep this brief discussion as unbiased as possible, it is just meant to reflect my personal experiences from about 10 years of microarray data analysis.

About these ads

Responses

  1. Nice summary of the issues surrounding microarray analysis.

    I don’t want to imply that microarray studies are not worthwhile, only that they have to be planned and executed carefully and the results need to be interpreted skeptically. Of course the same can be said of any experiment, but for the reasons you mention, it especially applies to microarray experiments.

  2. I could not agree more with that. Careful planning and execution is crucial for any large-scale experiment, including microarrays.

  3. To say batch effects can be be safely neglected is supporting the general idea that the modern platforms we are using (Illumina/Affymetrix/Agilent in my case) are so good we can forget basic experimental design. It is still common, in fact far too common, for users of microarrays to go ahead with a badly planned experiment. I still see instances of people that have analysed all of group A separately from group B and expect to determine the biological difference between the two groups.
    The extra effort required to design an experiment, such that technical and biological noise is minimised, is usually only an hour or so discussion and thought by two or three people. Ideally this would include the biologist asking the question, an experienced microarray user and a statistician/bioinformatician. Simple planning of sample collection, or cell culture and layout of array processing; can reduce technical and biological noise to a minimum.
    Modern commercial platforms are also allowing us to use quite large replicate numbers. The latest arrays are close to the cost of a plate of real-time PCR (an order of magnitude more perhaps) but for whole genome expression. The amount of validation required after a well designed experiment is drastically less in most cases saving lots of time, effort and expense in the end.
    Array experiments should given out like guns and tatoos; with a cooling off period before you are allowed to start collecting data!

  4. James,

    maybe “neglected” was not the right word to use. What I meant to say was that in my own experience, microarray batch effects are small compared to other error sources. Unless there are REAL technical problems, that is. Of course everybody should still use randomization protocols, but I consider it even more important that the other problem areas are covered appropriately.

    You say that you have seen people analysing group A samples separately from group B samples. This is not the way to go, but not as bad as using females for obtaining ‘normal’ samples and males for the ‘affected’ samples. Or drawing blood for the group A and group B samples at different times of the day. Or getting biopsy material for healthy and diseased patients from different hospitals, each with their own biopsy protocols. There are many parameters that need to be controled, and all of this is not specific to microarrays.

    I still maintain that in a lab that knows their microarrays, biological noise heavily outweighs the technical one. You may disagree. As I said, this is just my personal opinion based on my very own observations – which are not even statistically validated!

  5. Very nice article. Thanks for it and for the links to other discussions.

  6. Great post! I often hear the argument to your first point about gene expression not matching protein levels and how microarrays will become extinct now that mass-spec is improving. You concisely point out why this is not necessarily true. Thanks.

  7. “Needless to say that most of these problems can be overcome by using really large sample numbers.”

    So how is your suggestion on the sample number?

    50? or 100?

  8. Woody, this is a very important question. Unfortunately, it cannot be answered in a general way. The number of required samples depends heavily on 1) the strength of the effect you want to detect and 2) the inherent noise levels of your system.

    If you want to study the influence of LPS stimulation on responsive cell lines, you could probably see robust regulation effects at 3-5 samples. May genes will be regulated >50-fold and you won’t need sophisticated statistics to notice this effect.
    On the other hand, if you use human biopsy samples (high sample variability) and want to study potential differences between people susceptible or resistant to condition X (small expression differences, if any), you will probably need hundreds of samples to really be sure if there is an effect.

  9. Really Very nice article….


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Categories

Follow

Get every new post delivered to your Inbox.

%d bloggers like this: