Sorry for the delay in starting this mini-series. My job has kept me more busy than I expected, and what I am trying to do here requires a lot of free time, which is currently a very limited resource.
Just a brief reminder: in a recent post, I discussed a series of papers (pre-)published in Science, which link the susceptibility to myocardial disease and type 2 diabetes to a common region on human chromosome 9p21.3. Francis Collins has likened this region to the “soul of the genome”, and is this soul that I am going to search for. The effort will be documented here on this blog, and you are all invited to advise me or correct me by using the comment function. In this first part of the series, I will analyze in somewhat more detail what the papers say about the limits of the critical region.
Let us start with the two papers on coronary heart disease (CHD), more specifically with the McPherson et al. paper. I have selected this paper for a more detailed description because it is well written (i.e. even I can understand it) and it gives a lot of details. Similar to the other studies, the authors (led by a Canadian/Texan consortium) start with a genome-wide association study, using 100,000 SNPs and a case-control design with slighly more than 300 subjects in either group. It is interesting to note that patients with diabetes or other risk factors for heart disease were excluded. After a number of steps that I will skip here, the authors came up with two SNPs within a 20kb region (rs10757274 and rs2383206), which showed the strongest association.
In order to find out if these associations are trustworthy, the authors looked at the same SNPs in patients from several other studies and found the SNPs to be consistently linked to CHD. Subsequently, the authors performed a fine-mapping of the region surrounding the two SNPs on 9p21.3. They looked at additional SNPs at ~5kb intervals in a region 175 kb up- and downstream. Eight additional SNPs were found with significant association to CHD; they all lie within a 58 kb interval that also contains the two initial SNPs. This 58 kb critical region (in sequence coordinates: 22,062,301 to 22,120,389) is flanked on both sides by 50kb regions where none of the tested SNPs shows any association with CHD.
Figure 1 shows the 58kb critical region for CHD susceptibility in the Ensembl genome browser, using the NCBI-36 assembly. The critical region is devoid of established genes, contains a small number of spliced ESTs, and is relatively close to the genes for the cell cycle inhibitors CDKN2A and CDKN2B.
Finally, the authors have performed a re-sequencing of the 58kb critical region for two carriers of the high-risk allele and one carrier of the low-risk allele. Overall, 66 polymorphisms (mostly SNPs) were detected, of which 35 were specific to the high-risk allele. These variants are tabulated in a supplementary table of the paper and might be useful at the later stages of my planned analysis.
The second paper on this subject comes from deCode (Iceland) with some US collaborators and describes a quite similar procedure. In a genome-wide screen, a significant association was found for 3 closely linked SNPs, rs1333040, rs2383207 and rs10116277, which could be confirmed using data from other studies. All 3 SNPs lie in the 58kb critical region mentioned above, but the authors of the second paper emphasize the observation that the 3 SNPs (plus some others coming from a fine-mapping exercise) are part of a 190kb linkage block, as defined by the HapMap project. By contrast to the gene-less 58kb critical region, this 190kb linkage block contains the CDKN2A/2B genes.
What can we learn from these two papers? First, there seems to be solid evidence for a linkage of CHD risk to this particular region on chromosome 9, as the association has been seen by different author consortia in several different patient groups. Both papers also agree on the observation that markers with the strongest association lie in a 58kb critical region right (centromeric) of the CDKN2 genes. However, there seem to be slight differences in the interpretation of the haplotype structure of the interesting region. Both papers show figures derived from HapMap data, which I would love to reproduce here (but don’t dare to, because these are copyrighted closed-access papers). I will try to describe the key features: the version in the deCode paper shows a pronounced linkage block of 190kb that spans both the CDKN2 genes and the strongly-associated markers. In the analogous figure of the Texan-Canadian paper, this block is also visible, but seems to consist of two sub-blocks: one containing the CDKN2 genes, the other one containing the CHD-associated markers. Consequentially, the interpretation of the results also differs with regard to the possible inclusion of CDKN2 in the critical region. Either way, a re-sequencing of the coding and non-coding parts of the CDKN2 genes did not reveal any significant correlation with CHD risk.
In addition to providing the linkage data, both papers also describe different attempts to look for interesting features within the critical region. This will be a subject of my later posts, once I have begun to do some analysis on my own. Tomorrow (I hope) will bring the 2nd post in the series, dealing with the paper data on association with type 2 diabetes. After that, I will discuss the sequence features in the CHD and diabetes critical regions and start to do some real sequence analysis. Who knows, maybe I can find something interesting that has been overlooked by the other groups? After reading the papers and seeing what these groups actually did, I am afraid that my chances are slim.