Category Archives: Proteins

Bioinformatics is a visual analytics (sometimes)

Short description of my research interest is “I do proteins” (I took this phrase from my friend Ana). I try to figure out what particular protein, protein family, or set of proteins does in the wider context. Usually I start where automated methods have ended – I have all kinds of annotation so I try to put data together and form some hypothesis. I recently realized that the process is basically visualizing different kind of data – or rather looking at the same issue from many different perspectives.

It starts with alignments. Lots of alignments. And they all end up in different forms of visual representation. Sometimes it’s a conservation with secondary structure prediction (with AlignmentViewer or Jalview):


Sometimes I look for transmembrane beta-barrels (with ProfTMB):


Sometimes I try to find a pattern in hydrophobicity and side-chain size values across the alignment (Aln2Plot):


Afterwards I seek for patterns and interesting correlations in domain organization (PFAM, Smart):


Sometimes I map all these findings onto a structure or a model that I make somewhere in the meantime based on found data (Pymol, VMD, Chimera):


I also try to make sense out of genomic context (works for eukaryotic organisms as well – The SEED):


I investigate how the proteins cluster together according to their similarity (CLANS):


And figure out how the protein or the system I’m studying fits into interaction or metabolic networks (Cytoscape, Medusa, STRING, STITCH):


If there’s some additional numerical information I dump it into analysis software (R, for simpler things DiVisa):


And I make note along the process in the form of a mindmap (Freemind, recently switched to Xmind, because it allows to store attachments and images in the mindmap file, not just link to them like Freemind does):blog-0010

So it turns out that I mainly do visual analytics. I spend considerable amount of time on preparing various representations of biological data and then the rest of the time I look at the pictures. While that’s not something every bioinformatician does, many of my colleagues have their own workflows that also rely heavily on pictures. For some areas it’s more prominent, for others it’s not, but the fact is that pictures are everywhere.

There are two reasons I use manual workflow with lots looking at intermediate results: I work with weak signals (for example, sometimes I need to run BLAST at E-value of 1000) or I need to deeply understand the system I study. Making connections between two seemingly unrelated biological entities requires wrapping one’s brain around the problem and… lots of looking at it.

And here comes the frustration. I counted that I use more than twenty (!) different programs for visualization. And even if I’m enjoying monitor setup 4500 pixels wide which is almost enough to put all that data onto screen, the main issue is that the software isn’t connected. AlignmentViewer cannot adjust its display automatically based on the domain I’m looking at or a network node I’m investigating – I need to do it by myself. Of course I can couple alignments and structure in Jalview, Chimera or VMD but I don’t find such solution to be usable on the long run. To have the best of all worlds, I need to juggle all these applications.

I’ve been longing for some time already for a generic visualization platform that is able to show 2D and 3D data within the single environment, so I follow development of SecondLife visualization environment and Croquet/Cobalt initiatives. While these don’t look very exciting right now, I hope they will provide a common platform for different visualization methods (and of course visual collaboration environment).

But to be realistic, visual analytics in biology is not going to become a mainstream. It’s far more efficient to improve algorithms for multidimensional data analysis than to spend more time looking at pictures. I had already few such situations when I could see some weak signal and in a year or two it became obvious. But I’m still going to enjoy scientific visualization. I came to science for aesthetic reasons after all. 🙂

Reblog this post [with Zemanta]

Tags: , , , , , , , ,

Structure of usher pore is available

Structure of usher pore

Some time ago I posted breaking news about solved structure of usher pore. And few days ago it was deposited into PDB as 2VQI (publication appeared in Cell, here’s the abstract). The structure is a beatiful dimer (see above) of 24 stranded beta-barrel, the first of its kind. The paper contains also structures of the whole complex reconstructed based on cryo-EM data.

Interestingly, while the structure of the native dimer is symmetrical, the function of the units is not. Both of twinned pores are involved in alternating recruitment of chaperone:pili-subunit complexes, but only one actually transports pili subunits out. Overall, given large amount of detailed studies on the mechanistic properties of pili transport and formation, this is the best understood translocation process at a structural level.

Read the paper and draw your own conclusions, but for me it changes the way of thinking about protein translocation in bacteria. We learnt a lot on bacterial secretion by observing how similar proteins are involved in fundamentally different processes (for example DNA export and toxin secretion may use the same system). Similarly, usher pore is going to serve as an exemplar for newly found translocation elements.

Comments Off on Structure of usher pore is available

Posted by on May 31, 2008 in Papers, Proteins


Tags: , , ,

Breaking news: structure of usher pore solved

My colleague is at Grenoble on the conference about host-pathogen interactions. Today he sent me important news: Gabriel Waksman (that’s not surprising to anybody interested in structural biology of bacterial pathogenesis) showed structure of usher pore – soon to be published.

Why is that important? Usher is a membrane part of two component system responsible for assembly and transport of fimbriae/pili in gram-negative bacteria – pretty essential element in these organisms. This protein was identified in early 90s (or even earlier) and for quite a while lots of people tried to solve/predict/model its structure. Its structure was assumed to resemble porin – but a large insert right in the middle of beta-barrel gave lots of problems in predicting correct topology. Now we know (at least my colleague saw it, we need to wait) how the final structure looks like and I was also told that its functional aspects have a big relevance to other secretion systems. Have a look on this protein when it’s out (I’ll post definitely about it) – I think you will be surprised even if nuances of host-pathogen interactions are not very appealing to you.

Studying any niche area on the molecular level can be very rewarding. Novel protein fold by itself is not a big deal anymore (it used to be – browse through archives of Nat. Struc. Biol. from several years ago). But putting this novel structure in well known functional context and understanding constrains that led to a new solution is still considered a first-class science.


Tags: , , ,

Visualization of internal repeats in proteins (or DNA)

There’s a number of protein families that have internal repeats (like TPR, Armadillo, ankyrin etc.). I’m very interested in many of them for reasons I will explain in other post. Assessing arrangement of these repeats is straightforward in majority of cases – most of them tend to occur next to each other, with little or no insertions between them (finding them at first is completely different story). However, there are proteins where internal repeats are separated by other domains or repeats, which can result in a real mess (or in scientific language: mosaic-like architecture). When couple of months ago I looked for some visualization method that would allow me to have a quick overview of internal structure of such proteins, I’ve stumbled across The Shape of Song – visualization method developed by Martin Wattenberg, researcher at IBM. This fitted my requirements so I’ve implemented it with some help of Processing (and which I’ve added later to a protein analysis server that has a chance to be published next month). Resulting visualization is below:

Internal repeats in a protein

Repeats are colored according to repeat type and are connected according to repeat family. If you think about it in terms of SCOP (Structural Classification of Proteins) hierarchy, colors represent class, while arcs connect superfamilies. The longer and more complicated analysed sequence is, the more useful this approach seems to be, so for short proteins typical domain bubbles would work better.

People that are into genomic sequences may notice similarity of this approach to Circos developed by Martin Krzywinski (whose work I really admire, especially on HDTR). Basically the idea behind both is pretty much the same, but I’ve never thought about straightening that circle until I saw The Shape of Song. My thinking is sometimes dramatically schematic…


Tags: , , , , ,

Imaginary protein nanodevices #1

Simple nanodevice - coiled-coil and leucin-rich-repeat protein

This post starts a series devoted to imaginary nanodevices made of proteins. I’m going to play around with known protein structures to see if some of them can form an interesting arrangement. Basic requirement is lack of obvious sterical clashes at the level of a main chain trace. If that is fulfilled I would assume very slight chance that particular arrangement is possible. However, in most cases I won’t bother inventing how to recreate it in the lab, since I don’t feel competent enough. The whole series is more fiction than science and my goal is mainly stretching my and readers imagination.

Lets start with something simple. Structure depicted above is a dimer of leucin rich repeat (LRR) protein (PDB: 1A4Y, chains A and D) with a trimeric coiled-coil (my own model made with BeammotifCC) fitted in. The opening is wide enough to accommodate three helices without any problems. Picture below shows main chain trace of the coiled-coil (in red) surrounded by LRR dimer (all atoms, blue and sea green). As you can see, any coiled-coil made of aminoacids with small side chains would not create any sterical issues. In fact, approximate size of the opening (~35 Angstroms) is much larger than the opening size of the membrane anchor of trimeric autotransporter adhesins (twelve stranded beta-barrel, PDB: 2GR7), which also accommodates a trimeric coiled-coil. So why not to use a beta-barrel instead of LRR? Well, beta-barrels are hardly present outside membranes 🙂 .

Simple nanodevice - coiled-coil and leucin-rich-repeat protein

One can ask question if the single LRR protein can make a full ring. It looks possible from the structure of the single repeat (beta-turn-alpha) – interactions with preceding and following repeats are virtually the same. However, secondary structure elements of these repeats are not perfectly aligned with the axis of the opening. Their tilt forces consecutive repeats to form an imaginary spiral, not a circle (although the tilt does not seem to be large enough to actually allow for spiral folding of larger number of repeats – but that’s only my assumption, it would be worth to check).

So that’s it for now. If you feel that I’m rediscovering wheel, writing something completely silly, or you have any suggestions, please feel free to discourage/encourage me with comments.

Comments Off on Imaginary protein nanodevices #1

Posted by on January 21, 2008 in Imaginary nanodevice, Proteins, Structural biology


Tags: , , , ,

Type VII secretion system

Yet another secretion system was described, this time from Gram-positive bacteria (types I to VI were from Gram-negative). I expect that the further microbiology will go from E. coli, the more secretion systems will be found. Within the large spectrum of bacterial species we still know very little on bacteria outside proteobacterial group.

This is from Nature Reviews Microbiology, and subscription may be required.

clipped from

Recent evidence shows that mycobacteria have developed novel and specialized secretion systems for the transport of extracellular proteins across their hydrophobic, and highly impermeable, cell wall. Strikingly, mycobacterial genomes encode up to five of these transport systems. Two of these systems, ESX-1 and ESX-5, are involved in virulence — they both affect the cell-to-cell migration of pathogenic mycobacteria. Here, we discuss this novel secretion pathway and consider variants that are present in various Gram-positive bacteria. Given the unique composition of this secretion system, and its general importance, we propose that, in line with the accepted nomenclature, it should be called type VII secretion.

  blog it
Comments Off on Type VII secretion system

Posted by on October 9, 2007 in Clipped, Proteins, Secretion system


Tags: , ,

Structure of molecular needle

After yesterday’s post it’s no secret anymore that I’m interested (among other things) in oligomeric prokaryotic proteins. I often browse recent additions at PDB and see if there’s a new, exciting and pretty (symmetric) structure deposited. A week ago a picture similar to this below drew my attention.

PDB 2v6l

This nice ring of helices is a model of the molecular needle of type III secretion system (T3SS). This system is used by many bacterial species during an infection: attaching to the host cell is followed by a insertion of a needle into the host cell and transporting effector proteins directly into that cell. The model here is a combination of the crystal structure of the single subunit and 3D reconstruction of the needle from electron microscopy.

I believe that next years will bring more full atom models of important cell structures. It can be seen directly from the publications: certain research groups are solving structures of missing elements of the large protein complexes, one by one. The model above is for sure a one step closer to having the whole T3SS, including cytoplasmic, transmembrane and extracellular parts at a atomic resolution.

The paper about this model was published last year in PNAS (free access) by Janet E. Deane and Pietro Roversi et al. The model is deposited in the PDB as 2V6L. PubMed ID for the abstract is: 16888041.


Posted by on August 8, 2007 in Papers, Proteins, Secretion system