Category Archives: Software

CLANS – java tool for cluster analysis of sequences

As frequent visitors of this blog have already noticed, I am a big fan of different tools for data visualization. Today I would like to point you to java software called CLANS (CLuster ANalysis of Sequences) developed by my former colleague Tancred Frickey. CLANS runs (PSI)BLAST on your sequences, all vs all, and clusters them in 2D or 3D according to their similarity. This method allows for rapid classification of huge datasets and has the advantage over, lets say, phylogenetic tree, that one can quickly assess results of the clustering in a visual way (I cannot imagine making any sense of looking at phylogenetic tree with 1500 branches, while the graphical output, as on the animation below, is pretty easy to read).

CLANS animation

Beauty of the idea behind CLANS is that you can apply this method almost to any dataset which can be translated into all-vs-all relations. CLANS page has examples from protein clustering, microarray analysis and (which I like the most) image showing how standard aminoacids cluster in space according to BLOSUM62.


Tags: , , ,

Tracking changes in a multiple sequence alignment

I had few free hours during this weekend so I’ve hacked together couple of scripts that in theory could help me visualize changes between subfamilies in the protein multiple sequence alignment. In essence, I took the alignment, chose a master sequence that correspond to a known structure, removed all columns with gaps in the master sequence, and visualized fragments of the alignment (sliding window with 15 sequences) with Weblogo – software for preparing sequence logos from alignments. On the video below you can see:

  • two boxes showing the same template structure (second is just rotated); size of C-alpha atoms correspond to overall conservation at that position; first few residues do not have corresponding positions in the alignment
  • sequence logo of actual alignment window
  • sequence logo of the whole alignment – as a reference

There are several of things I’m not yet happy with. First of all, visualization of changes on the structure is hardly readable, even with video of much higher quality (probably I should do it with Chimera’s “worm” representation). Second thing is that I have no information which species/proteins I’m looking right now at (another box with highlights on a species tree of the family?). Also, I should remove some redundancy from the alignment; sometimes sliding window contains copies of the same protein. But overall it looks promising enough to convince me to spend few more hours on this small project. However, I would probably do the final version with Processing.


Tags: , , , ,

DNASIS SmartNote – online notebook for bioinformatics analysis

I’ve found recently a video showing new web-based application for scientist. This is DNASIS SmartNote – an online notebook for sequence analysis, project organisation and sharing results, thoughts and data with other users/collaborators.

This service is provided by MiraiBio which belong to Group of Hitachi Software. This company provides instruments and software for biological research.

As soon as I resolve issues with obtaining a working account on the SmartNote (so far I cannot log in), I’ll post more about this service.


Posted by on January 19, 2008 in bioinformatics, Services, Software


Tags: , , , ,

Linux screencasting software

Just a short note today. If you look for screencasting software for your linux box, I recommend two titles: recordMyDesktop and Wink.

The first one is a typical desktop activity recorder – you mark capture area and that’s all. No fancy options: just a pure video stream from your screen. Video has very good quality (theora and vorbis codecs).

Wink is a screencaster oriented towards preparing interactive tutorials and presentations. You can record screen activity, but also pause the video, add text boxes with explanations, buttons waiting for user interaction (for example “Next” buttons). Output formats are: SWF, standalone EXE (for Windows machines only), PDF, PostScript and HTML. No typical video files, which on the other hand is not really a problem, as the framerate of the recording is pretty small. Another issue is that it apparently cannot record properly windows rendered with OpenGL (like molecular viewers) – window’s interior comes black. Even with these limitations I think Wink is better for preparing tutorials (for example on usage of some online bioinformatics service) than typical screencasting software.


Posted by on January 15, 2008 in Software, Visualization


Tags: , ,

Protein cartoons with Pymol

Here is a short tutorial on the protein cartoons with Pymol. I picked as an example a hemoglobin and focused only on the cartoon representation of the protein, but keep in mind it does not necessarily explores all options of this software. Also, since I’m blind to stereo images, I’m not sure if all of following tips make sense with stereo representation of molecules.

  1. Change protein representation to cartoons.
  2. Turn off depth cue (under”Display”) – unless you want to put an accent on some part of the protein this option is unnecessary, because it’s hard to get 3D feeling from a 5cm on 5cm print.
  3. Turn off specular reflections (under “Display”) – most likely printer is able to show less colors than your screen, and will render specular reflections as harsh white blobs
  4. Change background color into white (as above) – that’s obvious, black background is for viewing on screen
  5. Change view to orthoscopic (as above) – maybe it’s a matter of a personal taste, but perspective view (default in Pymol) creates unnecessary distortions, that again do not help in shape perception on the small print
  6. Turn on option of “fancy helices” (“Settings/Cartoons”) – this renders helices with tubular edges like in Molscript (leave it off if you don’t like it)
  7. Turn on option “smooth loops” (as above) – perception of the secondary structure elements arrangement becomes much easier
  8. Turn on option “highlight color” (as above) – again, it’s a matter of a personal taste; this option make an internal surface of helices grey (you may change the color via command line)
  9. Turn shadows off (“Settings/Rendering/Shadows”) – I feel that on a small print they only disturb the image

What I also do is turning on matte finish on the cartoons. While it doesn’t necessarily look better on the screen, when in print it helps to mask printing artefacts (like raster), when looked at from normal viewing distance.

Then you can test these settings by clicking “Ray”. If you like the final image, save it, read its dimensions and multiply them by 3. Then type into command-line box: ray multipliedX, multipliedY and press enter.

Below I embedded a video showing more or less what I’ve just described.

Feel free to comment if you have any suggestions on improving this process.


Posted by on December 12, 2007 in Software, Visualization


Tags: , ,

Software portability and virtual appliances

Bioinformatics can mean developing new algorithms for biological data analysis. Scientists who code and release the software face often an issue of making the program portable. I see three clear solutions to that issue. First, one can spend a lot of time porting the source to other platforms (plus testing, fixing and yelling at incompatibilities). This is not easy even within the linux OSes (remember broken HMMER binary packages with Debian and Ubuntu?), not to mention porting to OSX or Windows. What can we do? Second solution is to build a web interface around the software. This is extremely popular and makes almost everyone’s life easier. However there are drawbacks: maintenence of the service (it costs money and grant agencies are not willing to spend a dime on it) and batch access requests from some users (there’s always somebody who wants to feed into your software 5 millions sequences or 50 thousands structures). The third solution to the software portability issue can address at least the second of these drawbacks: one can create a virtual machine with a proper enviroment for developed software, and release it together. Yes, release a software together with the whole enviroment. And it’s not that difficult, as it seems.

We face computing clouds, internet companies that do not have a single server, virtual appliances for quick installation of, let’s say, blog server with WordPress, without any knowledge about software requirements. Virtual appliances, this is complete virtual machines, can contain already configured software (most trivial example would be LAMP – Linux, Apache, MySQL and PHP). So far I found only one such appliance for bioinformatics: it’s called DNALinux Virtual Desktop Edition and contains, among others, BLAST, EMBOSS, Pymol, BioPerl and Biopython. Since VMWare server is free (although registration is required), this makes pretty nice alternative for those with Windows machines, as it allows for running windowed linux at a speed of ca. two-thirds of a native system. VMWare software can create a virtual machine out of the working system, but I wouldn’t recommend that as we usually have much more software installed than it’s needed to run our own programs. So creating a virtual appliance for, let’s say, BLAST, would mean installing a fresh copy of our favourite linux under VMWare Server with nothing more than necessary libraries, copy of BLAST executables and possibly a web interface. Voilla. Virtual appliance for BLAST, anybody?

While it may seem a bit of overkill at first, I don’t think it is in the long run. Porting the software to other operating systems is only part of the story – maintenance to keep it working with newer version of the libraries is another. There’s a lot of programs that are not actively maintained for a long time. I have two quick examples where virtual appliance approach would save them from forgetting: PovChem (rendering of molecules, depends on some ancient libraries) or MACAW (it doesn’t work on anything but Mac OS 9, Windows version crashes the system). OK, MACAW may be not fair, as we face here legal issues with the operating system, but I believe any heavy software user already didn’t count how many times hadn’t tried some well-thought software because of its requirements.

Have a look and try. I’m already running two operating systems (good bye dual-boot) and this is definitely a future for our desktops with already too much processing power. But honestly I dream about a day, when all possible bioinformatics algorithms and biological data will be available at some computing cloud and running Taverna will be a good alternative to all day data munging.



Posted by on November 27, 2007 in bioinformatics, Services, Software


Tags: , , ,

Wolfram Mathematica 6 – no New Kind of Science (yet)

Not so long ago Animesh Sharma pointed to quite old interview of Steven Wolfram about the book “The New Kind of Science” and asked if concepts concerning a biological framework made their way into Mathematica software.

I’ve just returned from Poland Mathematica Conference, and I can answer that question: no, they didn’t. While there were people using Modelica and Mathematica to model some stochastic processes in cells, Mathematica itself does not provide much of a support for any sophisticated description of biological mechanisms. Implications of concepts from The New Kind of Science book looked very promising – it’s a pity that we are not given tools to verify them ourselves.

1 Comment

Posted by on October 30, 2007 in bioinformatics, Comments, Software


Tags: , ,

Qutemol rendering

Impressive thing about Qutemol rendering with ambient occlusion is that this method is used in real time. I’ve put a small video showing a difference between typical rendering and Qutemol’s method (well, I hope it’s visible, quality of this video is pretty bad, but it’s my first file posted on YouTube).

The bad thing about Qutemol is that so far it works mostly only on the Windows OS (I’m not the last person having problems running it on the OSX). Linux users are out of luck – Qutemol needs hardware support for 3D rendering, so a virtual machine with Windows is not a solution.


Posted by on October 26, 2007 in bioinformatics, Software, Visualization


Tags: , ,

Blender in visualization of molecules

Yes, you can use Blender to prepare figures for your next paper and the results for sure will look different than the ones obtained with a standard software (hemoglobin [1HBG] as example below)… But given amount of work and really steep learning curve (at least for somebody who tries that for the very first time), I would not recommend Blender that much… 🙂


UPDATE: if you look for a way to import a PDB file into Blender, some instructions are at the bottom of this page.


Posted by on October 17, 2007 in Software, Visualization


Tags: , , ,

Healia and third party PubMed/Medline tools

David Rothman describes Healia, easy to use interface to the PubMed. But it’s just one of many third party PubMed/Medline tools David had described. Check out his posts related to the one about Healia.

clipped from
Healia’s PubMed search (currently in beta) might be one of the best interfaces available for clinicians who don’t have the search skills to effectively search PubMed through its native interface.

Some notable features:

Automatic “AND”
By default, Healia inserts a boolean “AND” between all search terms (as Google does). While the expert searcher might find this unpleasantly limiting, it is a familiar behavior for many clinical searchers who view Google as their ideal, preferred search interface.

  blog it
Comments Off on Healia and third party PubMed/Medline tools

Posted by on October 1, 2007 in Clipped, PubMed, Software