Tag Archives: repeats

Visualization of internal repeats in proteins (or DNA)

There’s a number of protein families that have internal repeats (like TPR, Armadillo, ankyrin etc.). I’m very interested in many of them for reasons I will explain in other post. Assessing arrangement of these repeats is straightforward in majority of cases – most of them tend to occur next to each other, with little or no insertions between them (finding them at first is completely different story). However, there are proteins where internal repeats are separated by other domains or repeats, which can result in a real mess (or in scientific language: mosaic-like architecture). When couple of months ago I looked for some visualization method that would allow me to have a quick overview of internal structure of such proteins, I’ve stumbled across The Shape of Song – visualization method developed by Martin Wattenberg, researcher at IBM. This fitted my requirements so I’ve implemented it with some help of Processing (and which I’ve added later to a protein analysis server that has a chance to be published next month). Resulting visualization is below:

Internal repeats in a protein

Repeats are colored according to repeat type and are connected according to repeat family. If you think about it in terms of SCOP (Structural Classification of Proteins) hierarchy, colors represent class, while arcs connect superfamilies. The longer and more complicated analysed sequence is, the more useful this approach seems to be, so for short proteins typical domain bubbles would work better.

People that are into genomic sequences may notice similarity of this approach to Circos developed by Martin Krzywinski (whose work I really admire, especially on HDTR). Basically the idea behind both is pretty much the same, but I’ve never thought about straightening that circle until I saw The Shape of Song. My thinking is sometimes dramatically schematic…


Tags: , , , , ,