There’s a number of protein families that have internal repeats (like TPR, Armadillo, ankyrin etc.). I’m very interested in many of them for reasons I will explain in other post. Assessing arrangement of these repeats is straightforward in majority of cases – most of them tend to occur next to each other, with little or no insertions between them (finding them at first is completely different story). However, there are proteins where internal repeats are separated by other domains or repeats, which can result in a real mess (or in scientific language: mosaic-like architecture). When couple of months ago I looked for some visualization method that would allow me to have a quick overview of internal structure of such proteins, I’ve stumbled across The Shape of Song – visualization method developed by Martin Wattenberg, researcher at IBM. This fitted my requirements so I’ve implemented it with some help of Processing (and which I’ve added later to a protein analysis server that has a chance to be published next month). Resulting visualization is below:

Repeats are colored according to repeat type and are connected according to repeat family. If you think about it in terms of SCOP (Structural Classification of Proteins) hierarchy, colors represent class, while arcs connect superfamilies. The longer and more complicated analysed sequence is, the more useful this approach seems to be, so for short proteins typical domain bubbles would work better.
People that are into genomic sequences may notice similarity of this approach to Circos developed by Martin Krzywinski (whose work I really admire, especially on HDTR). Basically the idea behind both is pretty much the same, but I’ve never thought about straightening that circle until I saw The Shape of Song. My thinking is sometimes dramatically schematic…






Thoughts on CASP – Critical assessment of methods of protein structure prediction
I’ve just read an introduction to the supplemental issue of the journal PROTEINS, dedicated to the most recent round of the CASP experiment. It describes the progress of the protein structure prediction over the last few CASP editions.
The list of advancements include:
I believe that this was possible thanks to the progress that has been made in the area of sequence homology searches. Finding similarity between two sequences well beyond any reasonable identity thresholds is now doable thanks to profile-to-profile comparison, meta-servers (joining predictions from many different methods) or recent hmm-to-hmm algorithms (comparison of Hidden Markov Models). If you can find a suitable template for your protein, the rest is then much easier, isn’t it?
There are of course fields that still need some work. One of these often stirs a lot of discussion: automated assessing of model similarity to the real structure. The current methods have proven their suitability, I definitely agree. However I hope that at some point the protein structure comparison software will refuse to superimpose eight- and ten-stranded beta-barrels or left- and right-handed coiled-coil with a message: “It doesn’t make sense.”
Posted by Pawel Szczesny on October 10, 2007 in Comments, Papers, Research, Structure prediction
Tags: bioinformatics, casp, Proteins, Research, Structure prediction