Posted by: Kay at Suicyte | November 20, 2007

Smallest primate ever discovered! (updated)

Nobody who follows the scientific literature can possibly have missed the reports on Craig Venter’s metagenomics effort, trawling the oceans on his “Sorcerer II” yacht on the search for new DNA sequences. Unlike some of my fellow bloggers who know their metagenome, I have never been quite convinced that metagenomics is good for anything (except maybe for yachting on grant money).

This notion has changed today, when at around 2 pm CET, people in my group made an earth-shattering discovery while analysing some portion of the global ocean sampling data. It appears that the book on primate biology has to be re-written, or in other words, by overturning a century-old dogma, primate science is experiencing a veritable paradigm shift. To put our epochal discovery into context: Until recently, the smallest known primates were probably the pygmy mouse lemur (microcebus myoxinus), maybe this guy here. With a weight of 30-70 grams, and a length of 10-14 centimeters, these primates are firmly anchored within the macroscopic realm. However, as we have learned now, there is at least one more species of primates, living in the oceans, which has eluded us due to its small size.

By using most advanced computing hardware (HP dc7700) and software technology (BLAST), we were able to show that at least 1090 different sequences from various oceanic regions contain Alu elements, which are a hallmark of the primate genome. Our first suspicion was that the trawling device inadvertently sampled a Scuba diver, who did not manage to escape the subsequent homogenization step. However, this theory had to be abandoned, as i) the ocean sample sequences did not yield a perfect match to the human genome, ii) there was no homogenization step, and iii) the sampling procedure described in the original publication was selective for the size range of 0.1–0.8-μm, which does not accommodate divers.

Thus, we are currently pursuing alternative explanations, the most likely of which is the existence of a hitherto unnoticed ocean dwelling micro-primate. While our analyses are still ongoing, the results obtained so far fully support this theory. Not only are the ocean-derived Alu elements and their flanking DNA sequences distinct from the human genome, they also do not match the genome of J.C. Venter, J.D. Watson, or any other sequenced model primate. We are currently trying to obtain more genomic information on this elusive primate, which might give us hints to its evolutionary ancestry and its planktonic habitat. A first important observation in that respect is the apparent overabundance of DNA fragments derived from odorant receptors. This enrichment suggests that the micro-primate makes thorough use of its olfactory sense, most likely for finding nutrients and evading predators.

There is little doubt that the bioinformatical discovery of the micro-primate will have as far-reaching consequences as the ant populations living in outer space, which were identified in 1999 through the discovery of dense clumps of formic acid in interstellar molecular clouds.


I have adjusted some numbers and links in the main article. For those interested in further pursuing this line of research, here is a list of the Alu-containing ocean samples. Unfortunately, we don’t have enough time to work on this, as we are already preparing our next scoop: a comprehensive metagenomic analysis of the ALH84001 meteorite.



  1. 🙂

    I also remain to be convinced regarding the value of metagenomes. My favourite example is that the Sargasso dataset contains a lot of short ORFs annotated as “hypothetical proteins”. They are in fact just the result of translating 23S genes in 6 frames – for some reason 16S, but not 23S, were annotated. If metagenomes means junk in the public databases, I’d rather do without.

    On the other hand, there are some quite nice studies of smaller communities that are better defined ecologically than “the ocean”, such as “stuff up the nose”.

  2. I have one comment on your words: “except maybe for yachting on grant money”. As far as I know, Venter’s recent expedition was funded primarily by private money – the only public contributor was US Department of Energy. I believe this needs to be clearly said, since I heard people complaining about NIH (?!) spending money on Venter’s yacht :).

  3. I would like to point you to a short story I wrote on a similar topic. It was written for a class I took from Carl Djerassi (by the way -the group of us who took the class got a short story published in Nature … I think the first fiction there that was not portrayed as fact)

    Anyway – here is a link to the story about some contaminants leading to mistakes in studies of the human-Neanderthal split.

    As for the metagenomics being worthwhile or not. I accept your skepticism. But I think you are missing the big picture — random samples of the genomes of microbes from various ecosystems have enormous potential value. This is why the NAS, NIH, EU, and others are all pumping money into the use of metagenomics. Personally, I think metagenomics is most useful in simple communities right now, but as sequencing gets cheaper, we will be able to get better coverage of complex communities too.

    The notion that metagneomics has uses does not mean that all metagenomics-based studies are perfect. But lets not kid ourselves into thinking that there is not also a ton of junk coming from standard genome projects. Certainly the hype associated with metagenomics allows crappier work to be published … but that will change as the hype faded. And when that happens I would bet that most environmental microbiologists would not be caught dead without using some amount of metagenomic data.

  4. […] This is super weird. DNA samples taken from the sea show signatures (the Alu elements) of primate. This is, of course, a joke. […]

  5. Wow, some important blog must have linked here. The number of page visits yesterday is close to what usually takes a month or two.
    I would like to respond to some of the comments.
    Neil and Jonathan: Agreed, my sweeping statement that metagenomics might not be good for anything was a little off the mark. As I have detailed in a
    previous post
    , I prefer large-scale efforts that can reach a point of completion (or at least saturation). Ocean sequencing is probably not one of those. In general, I tend to be sceptical about anything overhyped. However, being a scientist, I am ready to be convinced by good data.

    With regard to quality of the data: No mistake, other large-scale projects have a comparable amount of contamination. This is something we have to live with.

    freesci: I am aware that Venter financed the Sorcerer II trip mainly from private money. I just couldn’t resist …

    Hsin-Hao: while my posting is obviously a joke, the Alu elements in the ocean metagenome are real. It appears that (at least) most of them are human contaminants, as they can be found in human genome sequences. For some reason, many of these sequences did not make it into the final assembly. At least they are not found in Ensembl.

  6. Brilliant piece of writing; I love the “inadvertently sampled a scuba diver” line.

  7. These are clearly sea monkeys (

  8. I will pass on the debate over whether metagenomics is worthwhile or not, but I can state that part of the GOS was paid for through grant money. However, much of this grant money came from a foundation and not the federal government.

  9. I would like to point out a few things relating to large scale sequencing projects and “contamination”

    First, it remains to be seen what these elements are in the GOS data. I looked at some of them and the hits to ALU like sequences are strong, but not necessarily proof that there is primate DNA in the database. One thing to remember, this is a MASSIVE amount of sequence data. One needs to adjust ones expectations as to what one might find in the data based on the size of the database.

    Second, if these really are from some primate, then most likely they are in fact contamination of the database. I should note that such contamination is VERY common in large scale sequencing projects. Contamination can happen anywhere along the pipeline of producing sequence data, from collecting samples (e.g., the scuba diver in the sampler), to making libraries, to tracking lanes, to releasing data.

    The difficulty in avoiding contamination is one of the reasons I (and others) have argued that if you can, you want to finish genomes wherever possible. That was an argument we made in regard to sequencing genomes of cultured microbes. For metagenomics, we do not have the luxury of finishing everything. So this means that one needs to interpret with caution any findings that are based solely on analysis of the sequences themselves.

    I will forward the blog link to some of the Venter Institute folks (I worked on the project). Maybe they can look at the reads and see if they ca figure out what is going on.

    Also – in regard to the funding, it is there in the papers> For example from the Risch et al GOS paper:

    “We acknowledge the Department of Energy (DOE), Office of Science, and Office of Biological and Environmental Research (DE-FG02-02ER63453), the Gordon and Betty Moore Foundation, the Discovery Channel, and the J. Craig Venter Science Foundation for funding to undertake this study.”

    I believe, but am not certain, that the Moore Foundation funded the sequencing and the analysis of much of the new data, that the DOE funded some of the initial sequencing and analysis, and that the subtly named J. Craig Venter Science Foundation paid for the boat related expenses. But I do not know that for sure. I am not sure what the Discovery Channel funding was for.

  10. […] Smallest primate ever discovered! (updated) Nobody who follows the scientific literature can possibly have missed the reports on Craig Venter’s metagenomics […] […]

  11. “This is madness”


    I really couldn’t resist. Intriguing article, a little out of my comfort zone knowlegde-wise, but fascinating.

  12. Hilarious! I enjoyed this post, but of course it isn’t surprising that there are lots of contaminents in the ocean survey data. In fact, one of their main findings in the original paper (on the “Sargasso Sea sample”) was later shown to be a bacterial contaminent, which is why it was present in surprisingly high quantity. The disproof appeared in the journal PLoSONE.
    I also remember a few years ago, when I was working on the human genome project, “discovering” about a dozen malaria genes (from the parasite Plasmodium falciparum) in the human genome! Wow, lateral gene transfer from a parasite to the human host! After a few minutes envisioning a major paper describing my finding, I calmed down and realized these genes were contaminents. After I notified the sequencing center (the Sanger Institute, in this case), the problem was quickly corrected.
    But “microscopic ocean-dwelling monkeys” really does have a nice ring to it, doesn’t it?

  13. It’s obviously Homo sapiens miniorientalis, previously known from a fossil from Japan. See:,5500,1169713,00.html

  14. Yes! I had forgotten the “Reports of the Okamura Fossil Laboratory”. But seriously, what could these sequences be? If it was just human contamination (the obvious explanation), wouldn’t they match sequenced Alu repeats? Could Alu repeats maybe not be limited to primates? Or maybe Alu repeats vary more than we thought among individuals. When I used to work in proteomics, human keratin (from skin flakes) was the primary component of many of our samples. We always wondered if we could maybe identify the person(s) who had handled the sample based on the keratin fragments seen.

  15. Thanks for all your comments, both the funny and the helpful ones!
    Jonathan: I am relatively convinced that most, if not all of the Alu sequences are human. I did not check the majority of them, but for a few examples I managed to find perfect database matches. In some of the cases, I do not find a perfect match in the Ensembl version of the human genome (but I did not investigate into the causes). However, I do find perfect matches in human genome contigs when BLASTing against the NCBI genome contig database. Maybe the sequences with the hits did not get into the final assembly? But again, I did not follow this issue thoroughly. BTW, I did my BLAST searches not with the Alu repeats themselves, but rather with the non-Alu portion of the sequences (to avoid browsing through tons of sequences).
    Maybe someone should scan the Venterome specifically, to see if JCV did manage to sneak his
    private microbiome
    into the ocean project.

  16. Reminds me of ‘hack till it works’ scenario. May be the Alu sequence is the “hello world” program of this language.

  17. […] […]

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s


%d bloggers like this: