biology | Freelancing science

The Life Scientists room over at FriendFeed

It’s one of the slides prepared for “Open Science in Poland” conference. I captured screenshots of subscribers to The Life Scientists room over at FriendFeed (however, as you’ll notice, with uneven number of rows, so not all of subscribers did fit). I hesitated to share it at a high quality, but new FriendFeed layout does not look anywhere as pretty as the old one (and has much smaller number of avatars per page), so here it is.

The Life Scientists slide – PowerPoint format, PNG format.

Comments Off

Posted by Pawel Szczesny on May 3, 2009 in Community

Tags: biology, FriendFeed, Social network

HMMER3 testing notes – my skills are (finally) becoming obsolete

22 Apr

: Image via Wikipedia

It’s already quite a while since I’ve started to extensively test performance of HMMER3. As many other people noticed before, speed of the search has improved dramatically – I’m really impressed how fast it is. However, it’s only part of the story. The smaller part actually.

As some of readers may know, most of my projects so far were revolving around protein sequence analysis and sequence-structure relationships. Mainly I was doing analysis of sequences that had no clear similarity to anything known, without functional annotation. Usual task was to run sequence comparison software and look at the end of the hit list, trying to make sense from hits beyond any reasonable E-value thresholds (for example I often run BLAST at E-value of 100 or 1000). I use very limited number of tools, because it takes quite a while to understand on which specific patterns a particular software fails.

The high-end tool I use most often is HHpred – HMM-HMM comparison software. It’s slow but very sensitive – my personal benchmarks show that it is able to identify very subtle patterns in sequence formed slightly above level of similar secondary structures (in other words, from the set of equally dissimilar sequences with identical secondary structure order, it correctly identifies the ones with similar tertiary structure).

The most surprising thing about HMMER3 is that in my personal benchmarks it’s almost as sensitive as HHpred. I wasn’t expecting that HMM-sequence comparison can be as good as HMM-HMM. This observation suggests that there’s still a room for improvement for the latter approach, however it has already big implications.

PFAM will soon migrate to HMMER3 (the PFAM team is now resolving overlaps between families that arose due to increased sensitivity) and the moment it is be available, it will make a huge number of publications obsolete, or simply wrong. There are thousands of articles that discuss in detail evolutionary history of some particular domain (many of these will become obsolete) or draw some conclusions from the observation that some domain is not present in analyzed sequence/system (many of these will need to be revised). It will also make my skills quite obsolete, but that is always to be expected, no matter in what branch of science one is working. I also imagine that systems biology people will be very happy to have much better functional annotation of proteins.

I don’t want to call development of HMMER3 a revolution, but it will definitely have similar impact on biology as BLAST and HMMER2 had. Not only because of its speed, but also because it will create a picture of similarities between all proteins comparable to the picture state-of-the-art methods could only calculate for their small subset.

The curse of BLAST (mndoci.com)

3 Comments

Posted by Pawel Szczesny on April 22, 2009 in bioinformatics, Research, Software

Tags: bioinformatics, biology, HMM, HMMER, PFAM

Structure prediction without structure – visual inspection of BLAST results

03 Feb

portschema My recent post on visual analytics in bioinformatics lacked a specific example, but I’m happy to finally provide one (happiness comes also from the fact that respective publication is finally in press). The image above shows a multiple pairwise alignment from BLAST of a putative inner membrane protein from Porphyromonas gingivalis. Image is small but it does not really matter – colour patches seem to be visible anyway.

Regions marked with ovals are clearly less conserved, than other part of the protein. There are five hydrophobic (green patches, underlined with blue lines) regions in this alignment (I ignore N-terminus, as this is likely the signal peptide), however the three inner ones appear to be of similar length, while the outer ones seem to be of the half as long as the inner ones. If we assume that the single unit is the short one, we can summarize the protein as follows: 8 beta structures, four long loops, for short loops. It looks like an eight-stranded outer membrane beta-barrel. Almost structure prediction, but without a structure.

I could end the story here, but the model didn’t fit previously published data. Its localization in the inner membrane was confirmed by an experiment, however pores in the inner membrane are considered very harmfull 😉 . Fortunately, one of my colleagues explained to me that particular localization technique is not 100% reliable, so I gathered more evidence, created detailed description of topology and the other group has designed experiments which confirmed my visual analysis.

Lessons learned? Maybe without this feedback on quality of that experimental technique, I would still claim that this is OM beta-barrel. Or maybe not. But I’ve learned that to safely ignore experimental results, one needs a more than a intuition. Also, it shows that sometimes looking at the results, is all one needs to make a reasonable prediction (I still have no idea what were E-values of these BLAST hits, but does it matter?).

7 Comments

Posted by Pawel Szczesny on February 3, 2009 in bioinformatics, Research, Visualization

Tags: bioinformatics, biology, Inner membrane, Membrane protein, Porphyromonas gingivalis, Visual analytics

Bioinformatics is a visual analytics (sometimes)

18 Dec

Short description of my research interest is “I do proteins” (I took this phrase from my friend Ana). I try to figure out what particular protein, protein family, or set of proteins does in the wider context. Usually I start where automated methods have ended – I have all kinds of annotation so I try to put data together and form some hypothesis. I recently realized that the process is basically visualizing different kind of data – or rather looking at the same issue from many different perspectives.

It starts with alignments. Lots of alignments. And they all end up in different forms of visual representation. Sometimes it’s a conservation with secondary structure prediction (with AlignmentViewer or Jalview):

blog-0005

Sometimes I look for transmembrane beta-barrels (with ProfTMB):

blog-0005

Sometimes I try to find a pattern in hydrophobicity and side-chain size values across the alignment (Aln2Plot):

blog-0005

Afterwards I seek for patterns and interesting correlations in domain organization (PFAM, Smart):

blog-0008

Sometimes I map all these findings onto a structure or a model that I make somewhere in the meantime based on found data (Pymol, VMD, Chimera):

blog-0006

I also try to make sense out of genomic context (works for eukaryotic organisms as well – The SEED):

blog-0005

I investigate how the proteins cluster together according to their similarity (CLANS):

blog-0013

And figure out how the protein or the system I’m studying fits into interaction or metabolic networks (Cytoscape, Medusa, STRING, STITCH):

blog-0007

If there’s some additional numerical information I dump it into analysis software (R, for simpler things DiVisa):

blog-0005

And I make note along the process in the form of a mindmap (Freemind, recently switched to Xmind, because it allows to store attachments and images in the mindmap file, not just link to them like Freemind does): blog-0010

So it turns out that I mainly do visual analytics. I spend considerable amount of time on preparing various representations of biological data and then the rest of the time I look at the pictures. While that’s not something every bioinformatician does, many of my colleagues have their own workflows that also rely heavily on pictures. For some areas it’s more prominent, for others it’s not, but the fact is that pictures are everywhere.

There are two reasons I use manual workflow with lots looking at intermediate results: I work with weak signals (for example, sometimes I need to run BLAST at E-value of 1000) or I need to deeply understand the system I study. Making connections between two seemingly unrelated biological entities requires wrapping one’s brain around the problem and… lots of looking at it.

And here comes the frustration. I counted that I use more than twenty (!) different programs for visualization. And even if I’m enjoying monitor setup 4500 pixels wide which is almost enough to put all that data onto screen, the main issue is that the software isn’t connected. AlignmentViewer cannot adjust its display automatically based on the domain I’m looking at or a network node I’m investigating – I need to do it by myself. Of course I can couple alignments and structure in Jalview, Chimera or VMD but I don’t find such solution to be usable on the long run. To have the best of all worlds, I need to juggle all these applications.

I’ve been longing for some time already for a generic visualization platform that is able to show 2D and 3D data within the single environment, so I follow development of SecondLife visualization environment and Croquet/Cobalt initiatives. While these don’t look very exciting right now, I hope they will provide a common platform for different visualization methods (and of course visual collaboration environment).

But to be realistic, visual analytics in biology is not going to become a mainstream. It’s far more efficient to improve algorithms for multidimensional data analysis than to spend more time looking at pictures. I had already few such situations when I could see some weak signal and in a year or two it became obvious. But I’m still going to enjoy scientific visualization. I came to science for aesthetic reasons after all. 🙂

6 Comments

Posted by Pawel Szczesny on December 18, 2008 in bioinformatics, Proteins, Research, Software, Visualization

Tags: bioinformatics, biology, Chimera, Cytoscape, Online Services, protein, Protein family, Visual analytics, Visualization

Human genobiome and disease risk assesment

06 Jul

Image via Wikipedia

I’ve recently attended a talk on the advancements of human metagenomics projects. As the speaker admitted, the whole field is a researchers’ gold mine – almost all they find is new and interesting. There were couple of interesting points – mainly concerning how limited our knowledge about things in here is. For example, there was a unconfirmed feeling among microbiologists that in fact all modern microbiology is nothing more than biology of E. coli and relatives. Now we know that for sure – number of known to us microbial species is estimated at 0.5% of all existing microbial species. Also, I heard a nice story about polish doctor who described in 19th century Helicobacter pylori and its role in gastric diseases (there was a Nobel prize for that in 2005), wrote a book and then trashed the whole thing because he couldn’t grow the bacteria in a pure culture. Another important issue was amount of data and lack of new ways of handling them.

But the most interesting for me was a connection between human microbiome and diseases. Or rather a possibility of such connection. I am not aware of any single case when composition of human microbiome have been proven to influence chance of getting ill and I don’t think there will be a lots of such correlations found soon. My impression is that correlations are to be found when we have both, a complete human genome and a complete metagenome of all that lives on particular person – a human genobiome, as I’ve called it (BTW, word “genobiome” is not present in Google – is there a better word for that?). And I believe that getting the first full human genobiome will be the achievement compared to sequencing human genome for the first time. Not because of technical difficulties – because of the all discoveries that need to be made to make it happen. For example, human gut of all people carries a species doing some sulfur reaction – but its population is only up to few thousands cells. How many such cases are we have in our organisms? That is very good question. The field is brand new, and possibilities of speculations are endless.

SNPWatch: Researchers Find SNP Associated with Diffuse-type Gastric Cancer

Comments Off

Posted by Pawel Szczesny on July 6, 2008 in bioinformatics, Research

Tags: biology, Helicobacter pylori, metagenomics, microbiology

Structure of usher pore is available

31 May

Structure of usher pore

Some time ago I posted breaking news about solved structure of usher pore. And few days ago it was deposited into PDB as 2VQI (publication appeared in Cell, here’s the abstract). The structure is a beatiful dimer (see above) of 24 stranded beta-barrel, the first of its kind. The paper contains also structures of the whole complex reconstructed based on cryo-EM data.

Interestingly, while the structure of the native dimer is symmetrical, the function of the units is not. Both of twinned pores are involved in alternating recruitment of chaperone:pili-subunit complexes, but only one actually transports pili subunits out. Overall, given large amount of detailed studies on the mechanistic properties of pili transport and formation, this is the best understood translocation process at a structural level.

Read the paper and draw your own conclusions, but for me it changes the way of thinking about protein translocation in bacteria. We learnt a lot on bacterial secretion by observing how similar proteins are involved in fundamentally different processes (for example DNA export and toxin secretion may use the same system). Similarly, usher pore is going to serve as an exemplar for newly found translocation elements.

Comments Off

Posted by Pawel Szczesny on May 31, 2008 in Papers, Proteins

Tags: biology, Biomolecules, Proteins and Enzymes, usher

Can a biologist fix a radio?

05 Feb

[via Molecule of the Day] Go and read (if you haven’t before) this brilliant piece on modern biology: Can a Biologist Fix a Radio? Read it twice if you call yourself bioinformatician…

2 Comments

Posted by Pawel Szczesny on February 5, 2008 in bioinformatics, Fun

Tags: bioinformatics, biology, humor

Freelancing science

Tag Archives: biology

HMMER3 testing notes – my skills are (finally) becoming obsolete

Structure prediction without structure – visual inspection of BLAST results

Bioinformatics is a visual analytics (sometimes)

Human genobiome and disease risk assesment

Structure of usher pore is available

Can a biologist fix a radio?

What I use

About this site

Other sites and projects

Most popular posts

Twitter Updates

Shared items

GR starred items (not necessarily scientific)

My science-related images

Archives

Tag Archives: biology

Related articles by Zemanta

Related articles by Zemanta

What I use

About this site

Other sites and projects

Most popular posts

My science-related images

Archives