RSS

Tag Archives: Visualization

Another collaborative environment: Project Wonderland

This is a short post on the Sun’s Project Wonderland. Citing from its home page

Project Wonderland is a 100% Java and open source toolkit for creating collaborative 3D virtual worlds. Within those worlds, users can communicate with high-fidelity, immersive audio, share live desktop applications and documents and conduct real business. Wonderland is completely extensible; developers and graphic artists can extend its functionality to create entire new worlds and new features in existing worlds.

In my recent post I’ve mentioned Second Life and Croquet: two platforms that can evolve into decent 3D visualization environments. Obviously I didn’t research the topic enough, as I’ve just found Project Wonderland. It seems to have the best of both worlds – professional team of developers, pretty flexible architecture and possibility of running your own instance of “virtual world”.

Have you spotted "Biogang" written on the whiteboard? 🙂

I didn’t play with it for a long time – current version is not very feature-rich (although it already contains video player with webcam support, PDF viewer, VNC viewer and a crude whiteboard), however the roadmap looks very interesting. I really liked extensive audio features – true stereo, sounds fade out with distance, special “cone of silence” (place where you can have a private conversation) – it proves that Sun is really trying to build an effective collaboration platform.

I haven’t seen yet much about data visualization in Wonderland – although below you can find interesting example of molecular simulation trajectory shown inside Wonderland.

Comments Off

Posted by Pawel Szczesny on December 29, 2008 in Education, Research, Visualization

Tags: collaboration, Online Services, Software, Visualization

Bioinformatics is a visual analytics (sometimes)

18 Dec

Short description of my research interest is “I do proteins” (I took this phrase from my friend Ana). I try to figure out what particular protein, protein family, or set of proteins does in the wider context. Usually I start where automated methods have ended – I have all kinds of annotation so I try to put data together and form some hypothesis. I recently realized that the process is basically visualizing different kind of data – or rather looking at the same issue from many different perspectives.

It starts with alignments. Lots of alignments. And they all end up in different forms of visual representation. Sometimes it’s a conservation with secondary structure prediction (with AlignmentViewer or Jalview):

blog-0005

Sometimes I look for transmembrane beta-barrels (with ProfTMB):

blog-0005

Sometimes I try to find a pattern in hydrophobicity and side-chain size values across the alignment (Aln2Plot):

blog-0005

Afterwards I seek for patterns and interesting correlations in domain organization (PFAM, Smart):

blog-0008

Sometimes I map all these findings onto a structure or a model that I make somewhere in the meantime based on found data (Pymol, VMD, Chimera):

blog-0006

I also try to make sense out of genomic context (works for eukaryotic organisms as well – The SEED):

blog-0005

I investigate how the proteins cluster together according to their similarity (CLANS):

blog-0013

And figure out how the protein or the system I’m studying fits into interaction or metabolic networks (Cytoscape, Medusa, STRING, STITCH):

blog-0007

If there’s some additional numerical information I dump it into analysis software (R, for simpler things DiVisa):

blog-0005

And I make note along the process in the form of a mindmap (Freemind, recently switched to Xmind, because it allows to store attachments and images in the mindmap file, not just link to them like Freemind does): blog-0010

So it turns out that I mainly do visual analytics. I spend considerable amount of time on preparing various representations of biological data and then the rest of the time I look at the pictures. While that’s not something every bioinformatician does, many of my colleagues have their own workflows that also rely heavily on pictures. For some areas it’s more prominent, for others it’s not, but the fact is that pictures are everywhere.

There are two reasons I use manual workflow with lots looking at intermediate results: I work with weak signals (for example, sometimes I need to run BLAST at E-value of 1000) or I need to deeply understand the system I study. Making connections between two seemingly unrelated biological entities requires wrapping one’s brain around the problem and… lots of looking at it.

And here comes the frustration. I counted that I use more than twenty (!) different programs for visualization. And even if I’m enjoying monitor setup 4500 pixels wide which is almost enough to put all that data onto screen, the main issue is that the software isn’t connected. AlignmentViewer cannot adjust its display automatically based on the domain I’m looking at or a network node I’m investigating – I need to do it by myself. Of course I can couple alignments and structure in Jalview, Chimera or VMD but I don’t find such solution to be usable on the long run. To have the best of all worlds, I need to juggle all these applications.

I’ve been longing for some time already for a generic visualization platform that is able to show 2D and 3D data within the single environment, so I follow development of SecondLife visualization environment and Croquet/Cobalt initiatives. While these don’t look very exciting right now, I hope they will provide a common platform for different visualization methods (and of course visual collaboration environment).

But to be realistic, visual analytics in biology is not going to become a mainstream. It’s far more efficient to improve algorithms for multidimensional data analysis than to spend more time looking at pictures. I had already few such situations when I could see some weak signal and in a year or two it became obvious. But I’m still going to enjoy scientific visualization. I came to science for aesthetic reasons after all. 🙂

6 Comments

Posted by Pawel Szczesny on December 18, 2008 in bioinformatics, Proteins, Research, Software, Visualization

Tags: bioinformatics, biology, Chimera, Cytoscape, Online Services, protein, Protein family, Visual analytics, Visualization

Photography is not a hobby. Updated CV and feedback request.

18 Nov

Yesterday I asked over at FriendFeed for the feedback on my early attempt of making visual CV (big thanks to all who commented). Here’s a revised version that hopefully looks much better. The key to read the image above (click to see larger version) is as follows: Y-axis represents time (with dotted line indicating more or less the present moment); areas of interest are along X-axis; color of the phrases indicates my confidence level; font size denotes amount of time I spent on the topic (so in this case I have spent lots of time using perl, but I still don’t feel very confident about it); placement of the phrases denotes which areas of interest particular project/phrase spans; area below the dotted line shows my approximate plans and hopes for the future.

The first version had “Photography” area instead of “Visualization”, but I needed to change that since it was confusing everybody and raised questions why I put a hobby on a professional CV. Photography (or visual arts) is not my hobby. My hobby is choir singing (which I do for over 14 years already, currently singing jazz and gospel). Visualization/Photography is there to indicate that I consider data visualization one of the most important elements of scientific method. What I’m trying to figure out is what kind of presentation can help us in understanding really complex systems, such as human (genetic, to make it easier) diseases. And when we understand them curing is going to be much easier. At least I hope it will.

Anyway, the true reason to post it is to ask my readers for feedback on missing elements of my plans. So far my ideas for the future research projects split into a few paths. First path is to work further on bacterial systems (or subsystems, such as secretion systems etc.). This work would translate later on into something I call Synthetic Biology Framework, which would be a tool helping in designing new biological systems, and maybe later would result in creating a programming language for a cell. My first ideas about the framework were to design engineered bacteria producing some important compounds, maybe drugs, but now I think the cooler use for the framework would be to design bionano machines. The second path is about modelling of human diseases, with important milestone which is analysis of human genome and metagenome (genobiome as I call it) – if the data will be available. Because I don’t think I could do better here than thousands of scientists if I were using the same information, here’s a moment where synthetic biology comes into play again – I hope that I could design nanomachines that would server as quick diagnostic tools or would be reporting the body state in some mostly non-invasive way (aiming at issue of “how is my cholesterol level building up”). The third path is mostly empty and concerns visualization methods. So far I have no clear idea how to build a system that would visually assist in understanding how cells work. I plan to experiment with 3D printing and 3D visualization of biological networks, but I have no clear idea where this will lead me.

So if you have some opinion, comment, idea how to connect some dots, how to jump from one area to another (for example I have no yet idea how to approach pharmacogenomics), or if you think that it doesn’t make sense at all feel free to comment.

3 Comments

Posted by Pawel Szczesny on November 18, 2008 in Career, Research

Tags: CV, Data visualization, FriendFeed, Human genome, Information Visualization, Photography, Resume, Scientific method, Scientific Visualization, Synthetic biology, Visualization

Qutemol and Ubuntu – native support

16 Oct

Image via Wikipedia

A week ago I got an email from a long-time-no-see friend, Marcin Feder, with information that Qutemol works fine on the Hardy and Gutsy versions of Ubuntu (binary packages were prepared by Morten Kjeldgaard; see more https://blueprints.launchpad.net/~mok0/+related-software, there are some other interesting titles there). According to Marcin following steps are enough to enjoy Qutemol on your linux box:

sudo aptitude install libungif4g  libwxbase2.8-0 libwxgtk2.8-0
wget http://mirrors.kernel.org/ubuntu/pool/main/g/glew/libglew1.4_1.4.0-1ubuntu1_i386.debhttp://ppa.launchpad.net/mok0/ubuntu/pool/main/q/qutemol/qutemol_0.4.1~cvs20080130-0ubuntu1~gutsy~ppa1_i386.deb
sudo dpkg -i libglew1.4_1.4.0-1ubuntu1_i386.deb
sudo dpkg -i qutemol_0.4.1~cvs20080130-0ubuntu1~gutsy~ppa1_i386.deb

I have too ancient Ubuntu version to check it right now, but not all of you are so lazy with upgrades so have fun.

5 Comments

Posted by Pawel Szczesny on October 16, 2008 in Visualization

Tags: linux, Qutemol, Ubuntu, Visualization

Many Eyes and literature summary

04 Oct

I’m not the first one to come up with this idea – Ntino posted about it before. However, I didn’t really understand before how powerful it could be. Using Many Eyes visualization capabilities I’ve created a quick browsable summary of abstracts related to a particular protein. I took all abstracts PubMed returned for a particular query (in this case it was “YadA Yersinia”; YadA is a prominent adhesin and important pathogenicity factor in Yersiniae) and uploaded them as text into Many Eyes. I chose “Word Tree” representation and searched for “yada”, which gave a nice graph of the most prominent phases related to this protein/gene name. Maybe it’s not a breakthrough, but compared to the classification/semantification provided by GoPubMed, such approach works much better for entities that aren’t well described in biological ontologies.

Given that the whole concept is pretty straightforward, it would be nice if one of alternative PubMed search engines provided a similar method of summarizing user’s query, don’t you think?

9 Comments

Posted by Pawel Szczesny on October 4, 2008 in Papers, Research, Visualization

Tags: Abstract, Knowledge Management, Many Eyes, Publications, PubMed, Research, Scientific Visualization, Visualization

Skyrails and STRING

09 Sep

Of course I couldn’t resist not to play a little bit with Skyrails after I saw it at Flowing Data blog. Skyrails is a graph visualization system that was designed with expandability and awesome look in mind. All menus can be programmed in odd-looking, but quite easy to learn language, which helps in writing customized interface to particular data.

My quick attempt was to take some sample data from STRING, feed it into Skyrails and see if that makes any sense. My choice was #1 example from STRING main page, which was trpA protein from E. coli K12. The main graph on the trpA interactions page looks as follows:

The same graph in Skyrails:

Of course Skyrails has a 3D representation, is fully interactive, with a little work one can filter some of the connections out, put images of structures instead of green dots, etc. etc. It doesn’t look as clear as STRING, because it wasn’t optimized for such use – in practice it’s much clearer. The video below shows the basic interactions with this dataset.

Is it useful? At the moment, not really. It has already lots of features that more mature programs lack (completely programmable menus are great idea), but usage is still crude and in some cases the flashy effects are disturbing. However, it’s worth to keep an eye on Skyrails. First, development is pretty much guaranteed, as the author said he starts a PhD on this project. Second, the basic roadmap includes features that again aren’t present anywhere else, like client-server architecture (so you can talk to Skyrails system from external application – dynamic, time-aware visualization?). And third – it’s the most cool-looking visualization system I’ve found so far (will it make into a movie, like Genome Valence from Ben Fry did?).

2 Comments

Posted by Pawel Szczesny on September 9, 2008 in Software, Visualization

Tags: Graph, Information Visualization, Skyrails, STRING, Visualization

Visualization of internal repeats in proteins (or DNA)

24 Jan

There’s a number of protein families that have internal repeats (like TPR, Armadillo, ankyrin etc.). I’m very interested in many of them for reasons I will explain in other post. Assessing arrangement of these repeats is straightforward in majority of cases – most of them tend to occur next to each other, with little or no insertions between them (finding them at first is completely different story). However, there are proteins where internal repeats are separated by other domains or repeats, which can result in a real mess (or in scientific language: mosaic-like architecture). When couple of months ago I looked for some visualization method that would allow me to have a quick overview of internal structure of such proteins, I’ve stumbled across The Shape of Song – visualization method developed by Martin Wattenberg, researcher at IBM. This fitted my requirements so I’ve implemented it with some help of Processing (and which I’ve added later to a protein analysis server that has a chance to be published next month). Resulting visualization is below:

Internal repeats in a protein

Repeats are colored according to repeat type and are connected according to repeat family. If you think about it in terms of SCOP (Structural Classification of Proteins) hierarchy, colors represent class, while arcs connect superfamilies. The longer and more complicated analysed sequence is, the more useful this approach seems to be, so for short proteins typical domain bubbles would work better.

People that are into genomic sequences may notice similarity of this approach to Circos developed by Martin Krzywinski (whose work I really admire, especially on HDTR). Basically the idea behind both is pretty much the same, but I’ve never thought about straightening that circle until I saw The Shape of Song. My thinking is sometimes dramatically schematic…

8 Comments

Posted by Pawel Szczesny on January 24, 2008 in bioinformatics, Proteins, Research, Visualization

Tags: bioinformatics, java, processing, protein analysis, repeats, Visualization

Blender in visualization of molecules

17 Oct

Yes, you can use Blender to prepare figures for your next paper and the results for sure will look different than the ones obtained with a standard software (hemoglobin [1HBG] as example below)… But given amount of work and really steep learning curve (at least for somebody who tries that for the very first time), I would not recommend Blender that much… 🙂

Hemoglobin

UPDATE: if you look for a way to import a PDB file into Blender, some instructions are at the bottom of this page.

10 Comments

Posted by Pawel Szczesny on October 17, 2007 in Software, Visualization

Tags: blender, hemoglobin, rendering, Visualization

Survey of domain bubbles in protein sequence analysis

24 Sep

One of the key step in the analysis of unknown protein sequence is identification of domains that constitute that protein. There are many online tools that will search for a presence of known domains or identify them ab inito. Usually only the former present results in a graphical way called “domain bubbles”. Below you can find examples of common approaches to presenting results of a sequence annotation. Since most of them use the same domain definitions, names of the hits are the same in almost all cases.

One note: it’s not a comparison of the servers’ performance. The sequence is the same in all cases, but that was to show the differences between visualization methods, not the quality of the annotation.

SMART

SMART domain bubbles

This is example of sequence annotation by the SMART server. Domains are colored according to their source (SMART has a collection of domain definitions from various different sources), and non-domain sequence features (like transmembrane segments, low-complexity, disorder) are clearly differentiated from domains. The picture is generated with GIMP and it’s Perl-Fu extension and the script is available for download from a homepage of Ivica Letunic.

PFAM

PDAM domain bubbles

Color schema by PFAM is quite clear – the same domains have the same colors. PFAM (as well as following two servers) shows in the picture partial hits – this is the case where similarity between the domain and the protein spans only fragment of the domain (that may indicate many things, like genomic rearrangements, frameshifts, weak domain definition, etc). But PFAM script can actually plot many other sequence features onto the picture. You can use the script with your own annotation data here – the input is coded as a xml file conforming PFAM’s schema.

CDD

CDD domain bubbles

CDD looks pretty similar to the PFAM and shares some visual features. However, CD-Search page shows in a graphical way more than one line of hits. Usually the first line contains the best hits for the particular fragment, and following lines show overlapping hits with worse score. Here is shown only the first line.

HHpred

HHpred domain bubbles

OK, I may be biased here, since the HHpred is coded by my former colleagues, but I really like the domain bubbles from this server. Color schema is different from any other servers: bubbles are colored according to the score, from red (the best) to blue (the worst). Also it shows partial and overlapping hits (here are shown only few, the actual results page spans few screens in my browser). Similar to CDD, HHpred does not plot any other sequence features than domains.

So here are the major domain annotation servers which present results of the prediction in a nice graphical way (there are many others, but not all of them are using this simple way of presenting data, just to mention InterPro). Are these, after all pretty similar, approaches exploring all possible ways of presenting domain structure of a protein? I don’t think so. Watch this site, I may have something to add pretty soon.

8 Comments

Posted by Pawel Szczesny on September 24, 2007 in Software, Visualization

Tags: domain bubbles, protein sequence analysis, sequence annotation, Visualization