Category Archives: Visualization

CLANS – java tool for cluster analysis of sequences

As frequent visitors of this blog have already noticed, I am a big fan of different tools for data visualization. Today I would like to point you to java software called CLANS (CLuster ANalysis of Sequences) developed by my former colleague Tancred Frickey. CLANS runs (PSI)BLAST on your sequences, all vs all, and clusters them in 2D or 3D according to their similarity. This method allows for rapid classification of huge datasets and has the advantage over, lets say, phylogenetic tree, that one can quickly assess results of the clustering in a visual way (I cannot imagine making any sense of looking at phylogenetic tree with 1500 branches, while the graphical output, as on the animation below, is pretty easy to read).

CLANS animation

Beauty of the idea behind CLANS is that you can apply this method almost to any dataset which can be translated into all-vs-all relations. CLANS page has examples from protein clustering, microarray analysis and (which I like the most) image showing how standard aminoacids cluster in space according to BLOSUM62.


Tags: , , ,

Tracking changes in a multiple sequence alignment

I had few free hours during this weekend so I’ve hacked together couple of scripts that in theory could help me visualize changes between subfamilies in the protein multiple sequence alignment. In essence, I took the alignment, chose a master sequence that correspond to a known structure, removed all columns with gaps in the master sequence, and visualized fragments of the alignment (sliding window with 15 sequences) with Weblogo – software for preparing sequence logos from alignments. On the video below you can see:

  • two boxes showing the same template structure (second is just rotated); size of C-alpha atoms correspond to overall conservation at that position; first few residues do not have corresponding positions in the alignment
  • sequence logo of actual alignment window
  • sequence logo of the whole alignment – as a reference

There are several of things I’m not yet happy with. First of all, visualization of changes on the structure is hardly readable, even with video of much higher quality (probably I should do it with Chimera’s “worm” representation). Second thing is that I have no information which species/proteins I’m looking right now at (another box with highlights on a species tree of the family?). Also, I should remove some redundancy from the alignment; sometimes sliding window contains copies of the same protein. But overall it looks promising enough to convince me to spend few more hours on this small project. However, I would probably do the final version with Processing.


Tags: , , , ,

Linux screencasting software

Just a short note today. If you look for screencasting software for your linux box, I recommend two titles: recordMyDesktop and Wink.

The first one is a typical desktop activity recorder – you mark capture area and that’s all. No fancy options: just a pure video stream from your screen. Video has very good quality (theora and vorbis codecs).

Wink is a screencaster oriented towards preparing interactive tutorials and presentations. You can record screen activity, but also pause the video, add text boxes with explanations, buttons waiting for user interaction (for example “Next” buttons). Output formats are: SWF, standalone EXE (for Windows machines only), PDF, PostScript and HTML. No typical video files, which on the other hand is not really a problem, as the framerate of the recording is pretty small. Another issue is that it apparently cannot record properly windows rendered with OpenGL (like molecular viewers) – window’s interior comes black. Even with these limitations I think Wink is better for preparing tutorials (for example on usage of some online bioinformatics service) than typical screencasting software.


Posted by on January 15, 2008 in Software, Visualization


Tags: , ,

Protein cartoons with Pymol

Here is a short tutorial on the protein cartoons with Pymol. I picked as an example a hemoglobin and focused only on the cartoon representation of the protein, but keep in mind it does not necessarily explores all options of this software. Also, since I’m blind to stereo images, I’m not sure if all of following tips make sense with stereo representation of molecules.

  1. Change protein representation to cartoons.
  2. Turn off depth cue (under”Display”) – unless you want to put an accent on some part of the protein this option is unnecessary, because it’s hard to get 3D feeling from a 5cm on 5cm print.
  3. Turn off specular reflections (under “Display”) – most likely printer is able to show less colors than your screen, and will render specular reflections as harsh white blobs
  4. Change background color into white (as above) – that’s obvious, black background is for viewing on screen
  5. Change view to orthoscopic (as above) – maybe it’s a matter of a personal taste, but perspective view (default in Pymol) creates unnecessary distortions, that again do not help in shape perception on the small print
  6. Turn on option of “fancy helices” (“Settings/Cartoons”) – this renders helices with tubular edges like in Molscript (leave it off if you don’t like it)
  7. Turn on option “smooth loops” (as above) – perception of the secondary structure elements arrangement becomes much easier
  8. Turn on option “highlight color” (as above) – again, it’s a matter of a personal taste; this option make an internal surface of helices grey (you may change the color via command line)
  9. Turn shadows off (“Settings/Rendering/Shadows”) – I feel that on a small print they only disturb the image

What I also do is turning on matte finish on the cartoons. While it doesn’t necessarily look better on the screen, when in print it helps to mask printing artefacts (like raster), when looked at from normal viewing distance.

Then you can test these settings by clicking “Ray”. If you like the final image, save it, read its dimensions and multiply them by 3. Then type into command-line box: ray multipliedX, multipliedY and press enter.

Below I embedded a video showing more or less what I’ve just described.

Feel free to comment if you have any suggestions on improving this process.


Posted by on December 12, 2007 in Software, Visualization


Tags: , ,

My gallery of images

Readers of this blog who rely on RSS feeds may have not noticed that I had put a separate page containing computer-generated images of various molecules – Molecular renderings. Any comments, suggestions, critique are always welcome.

From time to time I’ll post new images there – from time to time I need to remind myself that science is pretty too :).

Comments Off on My gallery of images

Posted by on October 28, 2007 in Comments, Visualization



Qutemol rendering

Impressive thing about Qutemol rendering with ambient occlusion is that this method is used in real time. I’ve put a small video showing a difference between typical rendering and Qutemol’s method (well, I hope it’s visible, quality of this video is pretty bad, but it’s my first file posted on YouTube).

The bad thing about Qutemol is that so far it works mostly only on the Windows OS (I’m not the last person having problems running it on the OSX). Linux users are out of luck – Qutemol needs hardware support for 3D rendering, so a virtual machine with Windows is not a solution.


Posted by on October 26, 2007 in bioinformatics, Software, Visualization


Tags: , ,

Blender in visualization of molecules

Yes, you can use Blender to prepare figures for your next paper and the results for sure will look different than the ones obtained with a standard software (hemoglobin [1HBG] as example below)… But given amount of work and really steep learning curve (at least for somebody who tries that for the very first time), I would not recommend Blender that much… 🙂


UPDATE: if you look for a way to import a PDB file into Blender, some instructions are at the bottom of this page.


Posted by on October 17, 2007 in Software, Visualization


Tags: , , ,

Survey of domain bubbles in protein sequence analysis

One of the key step in the analysis of unknown protein sequence is identification of domains that constitute that protein. There are many online tools that will search for a presence of known domains or identify them ab inito. Usually only the former present results in a graphical way called “domain bubbles”. Below you can find examples of common approaches to presenting results of a sequence annotation. Since most of them use the same domain definitions, names of the hits are the same in almost all cases.

One note: it’s not a comparison of the servers’ performance. The sequence is the same in all cases, but that was to show the differences between visualization methods, not the quality of the annotation.


SMART domain bubbles

This is example of sequence annotation by the SMART server. Domains are colored according to their source (SMART has a collection of domain definitions from various different sources), and non-domain sequence features (like transmembrane segments, low-complexity, disorder) are clearly differentiated from domains. The picture is generated with GIMP and it’s Perl-Fu extension and the script is available for download from a homepage of Ivica Letunic.


PDAM domain bubbles

Color schema by PFAM is quite clear – the same domains have the same colors. PFAM (as well as following two servers) shows in the picture partial hits – this is the case where similarity between the domain and the protein spans only fragment of the domain (that may indicate many things, like genomic rearrangements, frameshifts, weak domain definition, etc). But PFAM script can actually plot many other sequence features onto the picture. You can use the script with your own annotation data here – the input is coded as a xml file conforming PFAM’s schema.


CDD domain bubbles

CDD looks pretty similar to the PFAM and shares some visual features. However, CD-Search page shows in a graphical way more than one line of hits. Usually the first line contains the best hits for the particular fragment, and following lines show overlapping hits with worse score. Here is shown only the first line.


HHpred domain bubbles

OK, I may be biased here, since the HHpred is coded by my former colleagues, but I really like the domain bubbles from this server. Color schema is different from any other servers: bubbles are colored according to the score, from red (the best) to blue (the worst). Also it shows partial and overlapping hits (here are shown only few, the actual results page spans few screens in my browser). Similar to CDD, HHpred does not plot any other sequence features than domains.

So here are the major domain annotation servers which present results of the prediction in a nice graphical way (there are many others, but not all of them are using this simple way of presenting data, just to mention InterPro). Are these, after all pretty similar, approaches exploring all possible ways of presenting domain structure of a protein? I don’t think so. Watch this site, I may have something to add pretty soon.


Posted by on September 24, 2007 in Software, Visualization


Tags: , , ,

Visualization software for molecular assemblies by Thomas Goddard and Thomas Ferrin

Recently, among articles “in press” from Current Opinion in Structural Biology I found a paper by Thomas Goddard and Thomas Ferrin about software for visualization of large molecular assemblies. Even if the focus of this paper is not preparation of publication quality pictures, software cited there sounds familiar: Chimera, Pymol, VMD, Qutemol. The authors mention also VISION, which is a visual programming environment capable of presenting molecular data. Molecular Graphics Lab of Scripps Research Institute that works on the VISION has some other interesting tools, including PMV – Python Molecular Viewer, which I hope to cover some other time.

Anyway, this paper actually reminded me of something. I did mention that new version of Chimera can produce input for Povray, but I did not realize that it’s not the only change in this version. After upgrading to the current version I found out that it has also several “presets” of settings suitable for on-screen viewing or producing the figure. That makes preparation of the figure much faster and if you still don’t like the results you get good starting points for some tweaking.

Test image from Chimera

Comments Off on Visualization software for molecular assemblies by Thomas Goddard and Thomas Ferrin

Posted by on September 1, 2007 in Papers, Visualization


RSS is the new WWW

I accidentally found a brilliant way of summarizing news buzz. Picture below documents phrases “x is the new y” collected from various sources in 2005. I’m wondering how this kind of study would look like in science…

Here is the source.

Comments Off on RSS is the new WWW

Posted by on August 21, 2007 in Visualization