RSS

Tag Archives: protein

Bioinformatics is a visual analytics (sometimes)

Short description of my research interest is “I do proteins” (I took this phrase from my friend Ana). I try to figure out what particular protein, protein family, or set of proteins does in the wider context. Usually I start where automated methods have ended – I have all kinds of annotation so I try to put data together and form some hypothesis. I recently realized that the process is basically visualizing different kind of data – or rather looking at the same issue from many different perspectives.

It starts with alignments. Lots of alignments. And they all end up in different forms of visual representation. Sometimes it’s a conservation with secondary structure prediction (with AlignmentViewer or Jalview):

blog-0005

Sometimes I look for transmembrane beta-barrels (with ProfTMB):

blog-0005

Sometimes I try to find a pattern in hydrophobicity and side-chain size values across the alignment (Aln2Plot):

blog-0005

Afterwards I seek for patterns and interesting correlations in domain organization (PFAM, Smart):

blog-0008

Sometimes I map all these findings onto a structure or a model that I make somewhere in the meantime based on found data (Pymol, VMD, Chimera):

blog-0006

I also try to make sense out of genomic context (works for eukaryotic organisms as well – The SEED):

blog-0005

I investigate how the proteins cluster together according to their similarity (CLANS):

blog-0013

And figure out how the protein or the system I’m studying fits into interaction or metabolic networks (Cytoscape, Medusa, STRING, STITCH):

blog-0007

If there’s some additional numerical information I dump it into analysis software (R, for simpler things DiVisa):

blog-0005

And I make note along the process in the form of a mindmap (Freemind, recently switched to Xmind, because it allows to store attachments and images in the mindmap file, not just link to them like Freemind does): blog-0010

So it turns out that I mainly do visual analytics. I spend considerable amount of time on preparing various representations of biological data and then the rest of the time I look at the pictures. While that’s not something every bioinformatician does, many of my colleagues have their own workflows that also rely heavily on pictures. For some areas it’s more prominent, for others it’s not, but the fact is that pictures are everywhere.

There are two reasons I use manual workflow with lots looking at intermediate results: I work with weak signals (for example, sometimes I need to run BLAST at E-value of 1000) or I need to deeply understand the system I study. Making connections between two seemingly unrelated biological entities requires wrapping one’s brain around the problem and… lots of looking at it.

And here comes the frustration. I counted that I use more than twenty (!) different programs for visualization. And even if I’m enjoying monitor setup 4500 pixels wide which is almost enough to put all that data onto screen, the main issue is that the software isn’t connected. AlignmentViewer cannot adjust its display automatically based on the domain I’m looking at or a network node I’m investigating – I need to do it by myself. Of course I can couple alignments and structure in Jalview, Chimera or VMD but I don’t find such solution to be usable on the long run. To have the best of all worlds, I need to juggle all these applications.

I’ve been longing for some time already for a generic visualization platform that is able to show 2D and 3D data within the single environment, so I follow development of SecondLife visualization environment and Croquet/Cobalt initiatives. While these don’t look very exciting right now, I hope they will provide a common platform for different visualization methods (and of course visual collaboration environment).

But to be realistic, visual analytics in biology is not going to become a mainstream. It’s far more efficient to improve algorithms for multidimensional data analysis than to spend more time looking at pictures. I had already few such situations when I could see some weak signal and in a year or two it became obvious. But I’m still going to enjoy scientific visualization. I came to science for aesthetic reasons after all. 🙂

6 Comments

Posted by Pawel Szczesny on December 18, 2008 in bioinformatics, Proteins, Research, Software, Visualization

Tags: bioinformatics, biology, Chimera, Cytoscape, Online Services, protein, Protein family, Visual analytics, Visualization

BadA head structure

09 Aug

Modularity is one of the most interesting features of the trimeric autotransporter adhesins, and probably one of the most frustrating. As I wrote before, domain annotation is quite difficult, especially that these proteins can have often few thousands residues in length.

BadA, the major adhesin of Bartonella henselae, is probably the best known large TAA out there. Its sequence served us as a unofficial benchmark for domain annotation tool. Its head consist of three domains, one resembling head of YadA and two others which we claimed are similar to Hia head domains. The claim at the moment of starting this project wasn’t supported very well – Evalues of HHpred alignments were around 1 (of course all less sensitive tools didn’t see anything), but we knew they must be similar (because that two,three conserved residues were at exactly where we expected). Crystal structure of these two domains from BadA couldn’t be solved directly, so we’ve attempted molecular replacement and that worked. On the picture above you can see three known head structures for TAAs, BadA (ours), Hia and YadA (full BadA head model in on the right) and arrangement of corresponding domains in all three proteins. The whole story and lots of pretty pictures (you must see EM figures) was published today yesterday in PLoS Pathogens (OA).

Today the story isn’t so exciting as it was at the beginning. Currently HHpred easily finds domains from Hia and BadA similar with high probability – it’s an advantage of bigger database size and more mediating sequences. But I’m still pretty happy about how it went – such projects build confidence in one’s analysis skills.

Domain annotation in trimeric autotransporter adhesins

2 Comments

Posted by Pawel Szczesny on August 9, 2008 in bioinformatics

Tags: Annotation, bioinformatics, protein, Protein domain, protein structure

Type VII secretion system

09 Oct

Yet another secretion system was described, this time from Gram-positive bacteria (types I to VI were from Gram-negative). I expect that the further microbiology will go from E. coli, the more secretion systems will be found. Within the large spectrum of bacterial species we still know very little on bacteria outside proteobacterial group.

This is from Nature Reviews Microbiology, and subscription may be required.

clipped from www.nature.com

Recent evidence shows that mycobacteria have developed novel and specialized secretion systems for the transport of extracellular proteins across their hydrophobic, and highly impermeable, cell wall. Strikingly, mycobacterial genomes encode up to five of these transport systems. Two of these systems, ESX-1 and ESX-5, are involved in virulence — they both affect the cell-to-cell migration of pathogenic mycobacteria. Here, we discuss this novel secretion pathway and consider variants that are present in various Gram-positive bacteria. Given the unique composition of this secretion system, and its general importance, we propose that, in line with the accepted nomenclature, it should be called type VII secretion.