Posted by: Kay at Suicyte | November 8, 2007

A bad day in structure space

Frequent readers of my blog will know that I am more into sequences than into structure. I don’t want to go into details why this is so, maybe another time. Here is the one-sentence version: My goal is the prediction of protein function, and – contrary to what is being reiterated in the literature – I am convinced that this is better done directly from sequence, paying close attention to evolution. Going via structure is a detour, if not a dead end. Ok, two sentences.

Occasionally, though, I have to do some structure analysis. No mistake, I am using structures all the time. Nothing beats a structural superposition if you have to align two extremely divergent sequences. If there is a structure. I also use structures (if available) to make plausibility checks of my sequence-based predictions. And, from time to time, I have to deal with structures for different reasons.

Today was such a day. I was analyzing a sequence family (nothing exciting, it was a contract work for a customer), with one member having a highly-resolved X-ray structure. For one aspect of my analysis, I decided it was a good idea to select residues that are highly conserved but whose side chains are pointing to the outside of the structure. The idea was that in real life these conserved surface residues might contact a potential interaction partner that is not present in the structure. This task looks easy enough, in particular as I have done similar things before. I could not remember how I had done it the last time, but at first I did not expect this to be a problem. There must be hundreds of programs out there that take a PDB file and give you a list of residues indicating their degree of surface exposure.

I turned to google, but despite two hours spent at the computer, and using all the tricks in the book, the only promising hit I could find was a program called naccess. Plus lots of papers in NAR and Bioinformatics, either pointing to naccess or to some web pages talking about Error 404. This latter trend must have be blogged before: it is great that journal articles describe useful software and web services, and that they also provide links. But why is a (perceived) 90% of these links broken after 2-3 years?

Anyway, if everybody is using naccess, why shouldn’t I? Well, there are a number of reasons. The first one is a note on the home page of naccess, saying “Industrial user/Profit-makers, please read this”. I hate it if a web page starts like this. It means that you have to pay for using the program or service. What’s more, those people (almost) never say “obtaining this software will cost you x $”. They always ask you to get in contact with some IP protection department, which involves talking to lawyers and signing some contract, which (of course) I am not allowed to do without talking to our lawyers first, who in turn find some conditions in the contract utterly unacceptable, while the other lawyers insist that this clause is essential, and so on, ad nauseam. If, after some discussion, we get to a point where the amount of money is actually mentioned, it typically exceeds my group’s annual budget (including hardware!) considerably. Ok, I must admit that I did not even try to buy naccess, so I have no idea about the special conditions applying here, but I do have related experience from previous occasions. Why do these departments always assume that people working for a company are swimming in money? Maybe some do, but I don’t.

This is not all. Even honorable academics (a.k.a. Jedi researchers) have a hard time getting their hands on naccess. As the download pages says:

You will receive a compressed tar file via ftp containing everything you need. The tar file has been encrypted. You need to get a decryption key to decrypt the file. See later.

We ask users to complete a a short Confidentiality agreement. Please print it out, sign it and return it to us via normal mail (Not email or fax please!).

You see, very convenient. When I read this page, it reminded me of other such examples from ancient times, before I succumbed to the dark side of science. It occurred to me that this kind of restrictions only seem to be used for sofware dealing with structure analysis, never with sequence analysis. Did you have to sign a contract before using BLAST? FASTA? Do you have to pay 20,000$ for using ClustalW (or T-Coffee, MUSCLE or Mafft ?) Is the use of Pfam or INTERPRO restricted to academics? Do you have to send something by paper mail to Sean Eddy before using HMMER? Of course not, what a ridiculous idea!

Structure analysis seems to be a different culture. Lots of restrictions whereever you look. There are free structure viewers (check the interesting posting on “Freelancing Science“), but if you read the fineprint, almost all viewers capable of producing publication-quality output may not be used by company employees. (Ok, they may be used, but the $$$ and other conditions amount to them being useless for me). There are some examples of viewers that are cheap or free for people like me, these are often scaled-down versions of commercial software. And if you look for structure analysis other than viewing and rotating PDBs on screen, matters look even bleaker.

Let us get back to my original surface exposure problem. After abandoning the idea of using naccess, I found one (probably) quick & dirty solution that worked for my one-off application: I turned to spdbv (DeepView), one of the pdb viewers I am allowed to use. This might not be the nicest program to look at (strange user interface, poor fonts, some linux problems), but it is great for structure superpositions. It also turned out to have a feature to select residues on the basis of their %surface exposure. This was exactly what I needed, although I did not find a way of getting a text output of this information. Anyway, a little manual work – case closed. I should add that – after finishing my work – I did find a fine web server that does exactly what I wanted. It is called GetArea, and I have no idea why I didn’t see it in my previous searches. Maybe I should exercise my google skills.

Here a a few more links on bioinformatics software availability:

  • This is the position of the ISCB (via Deepak).
  • An interesting statistics of software availability found on Flags and Lollipops.
  • And finally, a missing link. A few month ago there was an interesting posting on this topic somewhere, followed by a lively discussion (with contributions by Sean Eddy and a few other notable figures). If a am not mistaken, I also posted a comment, but I cannot find the posting anymore. Is is possible to search for blog comments?


  1. What’s even worse than forcing corporate users to buy tools (and I agree that published tools should be free to use for everyone), is that many university authors don’t get the fact that researchers at non-profit research institutes aren’t “corporate”. I’m supported by grants just like university people are, but because I work at the JCVI (Craig Venter’s institute, not his company) people assume I’m made out of money or something.

  2. It’s a little strange that outside of PyMol, which comes with it’s own baggage, anf QuteMol, there haven’t been too much good structure visualization programs that have come out over the years.

    One possibility, other than cultural issues (which are almost certainly there). Good openGL programming and molecular rendering is a little more difficult to visualize and program than your usual sequence-based program.

  3. Academics are sometimes also affected, when some automated registration system rejects emails from non-edu domains (like Max-Planck Institutes). One of my previous employers, non-profit research institute, was desperately sticking to its old .edu domain only for that reason.

    Deepak, I’m not a programmer, but I don’t think that 3D rendering is more difficult than anything else – people are writing software for that (only static images) even in Perl, as an exercise. I think it’s the matter of using components that are protected. If I’m right, one of such components was APBS (calculation electrostatic properties), which only very recently moved from non-for-profit to the GPL license.

    Kay, maybe the Jmol will do? It’s open source, but I’m not sure about the features…

  4. I think you’re absolutely right abut the different cultures. Many structural informaticists come from a chemistry background, and chemists don’t have the same issues with charging for their work that afflict biologists. CAS, for example, has never been free. Papers published by the ACS are only available to members. The software follows the same course.

    On the other hand, the problem of abandonware and dead links isn’t going to go away. Maintaining software and updating web sites has a cost. Until it’s clear whether those costs will be recovered by a “fee for service” type of model or by government funding (in the case of BLAST), the problem will persist.

  5. Jonathan, maybe it wasn’t such a great idea to rename the TIGR. For several reasons, I guess.

    Deepak, isn’t Pymol also restricted to academics? QuteMol might be great, but won’t run on my computer (seems to require a particular graphics card). For figure making, I found Yasara to be quite useful. It has a very strange user interface, but it is fast, even on my dated computer, and makes nice graphics. See e.g. the UBA figure in my post on the Lake Garda ubiquitin meeting.

    Deepak and freesci, I cannot comment on the relative difficulties of OpenGL programming, I for sure am not able to do it.

    Freesci, I might try Jmol. I don’t think it will help me too much, as in my experience java applications are so slow on my linux box. On my windows PC, I am running spdbv, Yasara and Weblab viewer. These programs do most of the things that I need (except for this surface calculation bit I mentioned in my post).

  6. Sandra, I agree that there is most likely a difference in culture. But not an unsurmountable one. I come from a chemical background myself, but never had any problems to put my own software into the public domain.

    Concerning ‘abandonware’ (nice word, your creation?), I also don’t expect it to go away. Nevertheless, I wonder if it is really that hard to find a hosting institution for your web applications. Agreed, if you are running a BLAST service with gazillions of simultaneous users and substantial cpu load, it is a different story. By contrast, the server I had been looking for were low-volume things with (probably) small cpu requirements.

    When I left academia 10 years ago, I had no problems convincing people at my former institute to keep my web services running (they still do) and quite a few other sites run mirror services without me asking for this (clearly helped by the fact that this was public domain software). So, it cannot be that hard. (I should hasten to add that these are rather useless pieces of software, which haven’t seen an update in 12 years. I really cannot recommend their use to anybody living in the 21th century).

  7. Good to see you back Kay 🙂
    I went through a similar experience a couple of months back. It is worse for me, because I am not working for a company and neither I am enrolled in some university. I realised nothing much great has happened since Bernstein’s Rasmol [ ] launched in 1993 is capable of almost all basic stuff one needs (yes there are script junkies out there who can do what you have mentioned here just with rasmol and some beer) while the binary being less then an MB [ ]. For molecular dynamics I could find gromacs and autodock for docking. I got the overall feeling of underground black hat hacking culture, protecting some dark secret which the outsiders are not allowed to use (unless they pay for services!) and this made me real uncomfortable.
    I do agree with the visualisation being the big a deal as Deepak said because we hardly see good open games in Linux and neither do we see production quality graphics software. But as far as I can see, do we really need to visualise so much even in case of structural biology? For example, in Kay’s case, all he wants is potential sites which one can find by some intricate matching and thresholding.
    Cultural issue might be the problem and probably this is the reason that people like Grove saying .

  8. Thanks Kay,

    I can’t take credit for “abandonware.” It comes from a 2004 article in The Scientist (Scientists Abandon their Software, 18(3):47, not open access I’m afraid.)

    I think some of the most popular programs, i.e. BLAST and Cn3D, remain freely available and get updated consistently because the US government invests in them and in their upkeep through government institutions like the NCBI. Other programs, like the Phylip package, are kept alive and updated as a labor of love.

    But as you said, when you click on the links in the NAR database issue, many are dead. Many databases live short lives after the paper is published. Why don’t they get updated or maintained? No incentive? No funds? I really don’t know.

  9. Not sure what the restrictions are on VMD and NAMD but those are what I use for visualization of structures, energy minimization, molecular dynamics, etc. As well as the previously mentioned GROMACS.

  10. Hi! I was surfing and found your blog post… nice! I love your blog. 🙂 Cheers! Sandra. R.

  11. Thats very good to know… thanks

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s


%d bloggers like this: