Tag Archives: Visualization

Another collaborative environment: Project Wonderland

This is a short post on the Sun’s Project Wonderland. Citing from its home page

Project Wonderland is a 100% Java and open source toolkit for creating collaborative 3D virtual worlds. Within those worlds, users can communicate with high-fidelity, immersive audio, share live desktop applications and documents and conduct real business. Wonderland is completely extensible; developers and graphic artists can extend its functionality to create entire new worlds and new features in existing worlds.

In my recent post I’ve mentioned Second Life and Croquet: two platforms that can evolve into decent 3D visualization environments. Obviously I didn’t research the topic enough, as I’ve just found Project Wonderland. It seems to have the best of both worlds – professional team of developers, pretty flexible architecture and possibility of running your own instance of “virtual world”.


Have you spotted "Biogang" written on the whiteboard? 🙂

I didn’t play with it for a long time – current version is not very feature-rich (although it already contains video player with webcam support, PDF viewer, VNC viewer and a crude whiteboard), however the roadmap looks very interesting. I really liked extensive audio features – true stereo, sounds fade out with distance, special “cone of silence” (place where you can have a private conversation) – it proves that Sun is really trying to build an effective collaboration platform.

I haven’t seen yet much about data visualization in Wonderland – although below you can find interesting example of molecular simulation trajectory shown inside Wonderland.

Reblog this post [with Zemanta]
Comments Off on Another collaborative environment: Project Wonderland

Posted by on December 29, 2008 in Education, Research, Visualization


Tags: , , ,

Bioinformatics is a visual analytics (sometimes)

Short description of my research interest is “I do proteins” (I took this phrase from my friend Ana). I try to figure out what particular protein, protein family, or set of proteins does in the wider context. Usually I start where automated methods have ended – I have all kinds of annotation so I try to put data together and form some hypothesis. I recently realized that the process is basically visualizing different kind of data – or rather looking at the same issue from many different perspectives.

It starts with alignments. Lots of alignments. And they all end up in different forms of visual representation. Sometimes it’s a conservation with secondary structure prediction (with AlignmentViewer or Jalview):


Sometimes I look for transmembrane beta-barrels (with ProfTMB):


Sometimes I try to find a pattern in hydrophobicity and side-chain size values across the alignment (Aln2Plot):


Afterwards I seek for patterns and interesting correlations in domain organization (PFAM, Smart):


Sometimes I map all these findings onto a structure or a model that I make somewhere in the meantime based on found data (Pymol, VMD, Chimera):


I also try to make sense out of genomic context (works for eukaryotic organisms as well – The SEED):


I investigate how the proteins cluster together according to their similarity (CLANS):


And figure out how the protein or the system I’m studying fits into interaction or metabolic networks (Cytoscape, Medusa, STRING, STITCH):


If there’s some additional numerical information I dump it into analysis software (R, for simpler things DiVisa):


And I make note along the process in the form of a mindmap (Freemind, recently switched to Xmind, because it allows to store attachments and images in the mindmap file, not just link to them like Freemind does):blog-0010

So it turns out that I mainly do visual analytics. I spend considerable amount of time on preparing various representations of biological data and then the rest of the time I look at the pictures. While that’s not something every bioinformatician does, many of my colleagues have their own workflows that also rely heavily on pictures. For some areas it’s more prominent, for others it’s not, but the fact is that pictures are everywhere.

There are two reasons I use manual workflow with lots looking at intermediate results: I work with weak signals (for example, sometimes I need to run BLAST at E-value of 1000) or I need to deeply understand the system I study. Making connections between two seemingly unrelated biological entities requires wrapping one’s brain around the problem and… lots of looking at it.

And here comes the frustration. I counted that I use more than twenty (!) different programs for visualization. And even if I’m enjoying monitor setup 4500 pixels wide which is almost enough to put all that data onto screen, the main issue is that the software isn’t connected. AlignmentViewer cannot adjust its display automatically based on the domain I’m looking at or a network node I’m investigating – I need to do it by myself. Of course I can couple alignments and structure in Jalview, Chimera or VMD but I don’t find such solution to be usable on the long run. To have the best of all worlds, I need to juggle all these applications.

I’ve been longing for some time already for a generic visualization platform that is able to show 2D and 3D data within the single environment, so I follow development of SecondLife visualization environment and Croquet/Cobalt initiatives. While these don’t look very exciting right now, I hope they will provide a common platform for different visualization methods (and of course visual collaboration environment).

But to be realistic, visual analytics in biology is not going to become a mainstream. It’s far more efficient to improve algorithms for multidimensional data analysis than to spend more time looking at pictures. I had already few such situations when I could see some weak signal and in a year or two it became obvious. But I’m still going to enjoy scientific visualization. I came to science for aesthetic reasons after all. 🙂

Reblog this post [with Zemanta]

Tags: , , , , , , , ,

Photography is not a hobby. Updated CV and feedback request.

Visual resumeYesterday I asked over at FriendFeed for the feedback on my early attempt of making visual CV (big thanks to all who commented). Here’s a revised version that hopefully looks much better. The key to read the image above (click to see larger version) is as follows: Y-axis represents time (with dotted line indicating more or less the present moment); areas of interest are along X-axis; color of the phrases indicates my confidence level; font size denotes amount of time I spent on the topic (so in this case I have spent lots of time using perl, but I still don’t feel very confident about it); placement of the phrases denotes which areas of interest particular project/phrase spans; area below the dotted line shows my approximate plans and hopes for the future.

The first version had “Photography” area instead of “Visualization”, but I needed to change that since it was confusing everybody and raised questions why I put a hobby on a professional CV. Photography (or visual arts) is not my hobby. My hobby is choir singing (which I do for over 14 years already, currently singing jazz and gospel). Visualization/Photography is there to indicate that I consider data visualization one of the most important elements of scientific method. What I’m trying to figure out is what kind of presentation can help us in understanding really complex systems, such as human (genetic, to make it easier) diseases. And when we understand them curing is going to be much easier. At least I hope it will.

Anyway, the true reason to post it is to ask my readers for feedback on missing elements of my plans. So far my ideas for the future research projects split into a few paths. First path is to work further on bacterial systems (or subsystems, such as secretion systems etc.). This work would translate later on into something I call Synthetic Biology Framework, which would be a tool helping in designing new biological systems, and maybe later would result in creating a programming language for a cell. My first ideas about the framework were to design engineered bacteria producing some important compounds, maybe drugs, but now I think the cooler use for the framework would be to design bionano machines. The second path is about modelling of human diseases, with important milestone which is analysis of human genome and metagenome (genobiome as I call it) – if the data will be available. Because I don’t think I could do better here than thousands of scientists if I were using the same information, here’s a moment where synthetic biology comes into play again – I hope that I could design nanomachines that would server as quick diagnostic tools or would be reporting the body state in some mostly non-invasive way (aiming at issue of “how is my cholesterol level building up”). The third path is mostly empty and concerns visualization methods. So far I have no clear idea how to build a system that would visually assist in understanding how cells work. I plan to experiment with 3D printing and 3D visualization of biological networks, but I have no clear idea where this will lead me.

So if you have some opinion, comment, idea how to connect some dots, how to jump from one area to another (for example I have no yet idea how to approach pharmacogenomics), or if you think that it doesn’t make sense at all feel free to comment.

Reblog this post [with Zemanta]

Posted by on November 18, 2008 in Career, Research


Tags: , , , , , , , , , ,

Qutemol and Ubuntu – native support

A Snapshot of the QuteMol open source software...

Image via Wikipedia

A week ago I got an email from a long-time-no-see friend, Marcin Feder, with information that Qutemol works fine on the Hardy and Gutsy versions of Ubuntu (binary packages were prepared by Morten Kjeldgaard; see more, there are some other interesting titles there). According to Marcin following steps are enough to enjoy Qutemol on your linux box:

sudo aptitude install libungif4g  libwxbase2.8-0 libwxgtk2.8-0
sudo dpkg -i libglew1.4_1.4.0-1ubuntu1_i386.deb
sudo dpkg -i qutemol_0.4.1~cvs20080130-0ubuntu1~gutsy~ppa1_i386.deb
I have too ancient Ubuntu version to check it right now, but not all of you are so lazy with upgrades so have fun.
Reblog this post [with Zemanta]

Posted by on October 16, 2008 in Visualization


Tags: , , ,

Many Eyes and literature summary

I’m not the first one to come up with this idea – Ntino posted about it before. However, I didn’t really understand before how powerful it could be. Using Many Eyes visualization capabilities I’ve created a quick browsable summary of abstracts related to a particular protein. I took all abstracts PubMed returned for a particular query (in this case it was “YadA Yersinia”; YadA is a prominent adhesin and important pathogenicity factor in Yersiniae) and uploaded them as text into Many Eyes. I chose “Word Tree” representation and searched for “yada”, which gave a nice graph of the most prominent phases related to this protein/gene name. Maybe it’s not a breakthrough, but compared to the classification/semantification provided by GoPubMed, such approach works much better for entities that aren’t well described in biological ontologies.

Given that the whole concept is pretty straightforward, it would be nice if one of alternative PubMed search engines provided a similar method of summarizing user’s query, don’t you think?

Reblog this post [with Zemanta]

Posted by on October 4, 2008 in Papers, Research, Visualization


Tags: , , , , , , ,

Skyrails and STRING

Of course I couldn’t resist not to play a little bit with Skyrails after I saw it at Flowing Data blog. Skyrails is a graph visualization system that was designed with expandability and awesome look in mind. All menus can be programmed in odd-looking, but quite easy to learn language, which helps in writing customized interface to particular data.

My quick attempt was to take some sample data from STRING, feed it into Skyrails and see if that makes any sense. My choice was #1 example from STRING main page, which was trpA protein from E. coli K12. The main graph on the trpA interactions page looks as follows:

The same graph in Skyrails:

Of course Skyrails has a 3D representation, is fully interactive, with a little work one can filter some of the connections out, put images of structures instead of green dots, etc. etc. It doesn’t look as clear as STRING, because it wasn’t optimized for such use – in practice it’s much clearer. The video below shows the basic interactions with this dataset.

Is it useful? At the moment, not really. It has already lots of features that more mature programs lack (completely programmable menus are great idea), but usage is still crude and in some cases the flashy effects are disturbing. However, it’s worth to keep an eye on Skyrails. First, development is pretty much guaranteed, as the author said he starts a PhD on this project. Second, the basic roadmap includes features that again aren’t present anywhere else, like client-server architecture (so you can talk to Skyrails system from external application – dynamic, time-aware visualization?). And third – it’s the most cool-looking visualization system I’ve found so far (will it make into a movie, like Genome Valence from Ben Fry did?).

Reblog this post [with Zemanta]

Posted by on September 9, 2008 in Software, Visualization


Tags: , , , ,

Visualization of internal repeats in proteins (or DNA)

There’s a number of protein families that have internal repeats (like TPR, Armadillo, ankyrin etc.). I’m very interested in many of them for reasons I will explain in other post. Assessing arrangement of these repeats is straightforward in majority of cases – most of them tend to occur next to each other, with little or no insertions between them (finding them at first is completely different story). However, there are proteins where internal repeats are separated by other domains or repeats, which can result in a real mess (or in scientific language: mosaic-like architecture). When couple of months ago I looked for some visualization method that would allow me to have a quick overview of internal structure of such proteins, I’ve stumbled across The Shape of Song – visualization method developed by Martin Wattenberg, researcher at IBM. This fitted my requirements so I’ve implemented it with some help of Processing (and which I’ve added later to a protein analysis server that has a chance to be published next month). Resulting visualization is below:

Internal repeats in a protein

Repeats are colored according to repeat type and are connected according to repeat family. If you think about it in terms of SCOP (Structural Classification of Proteins) hierarchy, colors represent class, while arcs connect superfamilies. The longer and more complicated analysed sequence is, the more useful this approach seems to be, so for short proteins typical domain bubbles would work better.

People that are into genomic sequences may notice similarity of this approach to Circos developed by Martin Krzywinski (whose work I really admire, especially on HDTR). Basically the idea behind both is pretty much the same, but I’ve never thought about straightening that circle until I saw The Shape of Song. My thinking is sometimes dramatically schematic…


Tags: , , , , ,