Category Archives: bioinformatics

Tracking changes in a multiple sequence alignment

I had few free hours during this weekend so I’ve hacked together couple of scripts that in theory could help me visualize changes between subfamilies in the protein multiple sequence alignment. In essence, I took the alignment, chose a master sequence that correspond to a known structure, removed all columns with gaps in the master sequence, and visualized fragments of the alignment (sliding window with 15 sequences) with Weblogo – software for preparing sequence logos from alignments. On the video below you can see:

  • two boxes showing the same template structure (second is just rotated); size of C-alpha atoms correspond to overall conservation at that position; first few residues do not have corresponding positions in the alignment
  • sequence logo of actual alignment window
  • sequence logo of the whole alignment – as a reference

There are several of things I’m not yet happy with. First of all, visualization of changes on the structure is hardly readable, even with video of much higher quality (probably I should do it with Chimera’s “worm” representation). Second thing is that I have no information which species/proteins I’m looking right now at (another box with highlights on a species tree of the family?). Also, I should remove some redundancy from the alignment; sometimes sliding window contains copies of the same protein. But overall it looks promising enough to convince me to spend few more hours on this small project. However, I would probably do the final version with Processing.


Tags: , , , ,

DNASIS SmartNote – online notebook for bioinformatics analysis

I’ve found recently a video showing new web-based application for scientist. This is DNASIS SmartNote – an online notebook for sequence analysis, project organisation and sharing results, thoughts and data with other users/collaborators.

This service is provided by MiraiBio which belong to Group of Hitachi Software. This company provides instruments and software for biological research.

As soon as I resolve issues with obtaining a working account on the SmartNote (so far I cannot log in), I’ll post more about this service.


Posted by on January 19, 2008 in bioinformatics, Services, Software


Tags: , , , ,

Software portability and virtual appliances

Bioinformatics can mean developing new algorithms for biological data analysis. Scientists who code and release the software face often an issue of making the program portable. I see three clear solutions to that issue. First, one can spend a lot of time porting the source to other platforms (plus testing, fixing and yelling at incompatibilities). This is not easy even within the linux OSes (remember broken HMMER binary packages with Debian and Ubuntu?), not to mention porting to OSX or Windows. What can we do? Second solution is to build a web interface around the software. This is extremely popular and makes almost everyone’s life easier. However there are drawbacks: maintenence of the service (it costs money and grant agencies are not willing to spend a dime on it) and batch access requests from some users (there’s always somebody who wants to feed into your software 5 millions sequences or 50 thousands structures). The third solution to the software portability issue can address at least the second of these drawbacks: one can create a virtual machine with a proper enviroment for developed software, and release it together. Yes, release a software together with the whole enviroment. And it’s not that difficult, as it seems.

We face computing clouds, internet companies that do not have a single server, virtual appliances for quick installation of, let’s say, blog server with WordPress, without any knowledge about software requirements. Virtual appliances, this is complete virtual machines, can contain already configured software (most trivial example would be LAMP – Linux, Apache, MySQL and PHP). So far I found only one such appliance for bioinformatics: it’s called DNALinux Virtual Desktop Edition and contains, among others, BLAST, EMBOSS, Pymol, BioPerl and Biopython. Since VMWare server is free (although registration is required), this makes pretty nice alternative for those with Windows machines, as it allows for running windowed linux at a speed of ca. two-thirds of a native system. VMWare software can create a virtual machine out of the working system, but I wouldn’t recommend that as we usually have much more software installed than it’s needed to run our own programs. So creating a virtual appliance for, let’s say, BLAST, would mean installing a fresh copy of our favourite linux under VMWare Server with nothing more than necessary libraries, copy of BLAST executables and possibly a web interface. Voilla. Virtual appliance for BLAST, anybody?

While it may seem a bit of overkill at first, I don’t think it is in the long run. Porting the software to other operating systems is only part of the story – maintenance to keep it working with newer version of the libraries is another. There’s a lot of programs that are not actively maintained for a long time. I have two quick examples where virtual appliance approach would save them from forgetting: PovChem (rendering of molecules, depends on some ancient libraries) or MACAW (it doesn’t work on anything but Mac OS 9, Windows version crashes the system). OK, MACAW may be not fair, as we face here legal issues with the operating system, but I believe any heavy software user already didn’t count how many times hadn’t tried some well-thought software because of its requirements.

Have a look and try. I’m already running two operating systems (good bye dual-boot) and this is definitely a future for our desktops with already too much processing power. But honestly I dream about a day, when all possible bioinformatics algorithms and biological data will be available at some computing cloud and running Taverna will be a good alternative to all day data munging.



Posted by on November 27, 2007 in bioinformatics, Services, Software


Tags: , , ,

Computational Biology and Evolution – new blog

[via Simon Greenhill at Henry] Alexei Drummond, scientist at Department of Computer Science at University of Auckland and Chief Scientist at Biomatters Ltd, has launched a blog: Computational Biology and Evolution.

We grow stronger…

1 Comment

Posted by on November 21, 2007 in bioinformatics, Community


Bio::Blogs #16 – Halloween edition

Blogs #16

Original image courtesy of Flickr user docman

Welcome to the 16th edition of Bio::Blogs, the monthly digest of highlights from bioinformatics and computational biology blogs. Hat tip to Deepak for suggesting the name (we actually start to absorb many sides of Halloween here in Poland).

Everyday science

This time we have interesting post on day-to-day scientific life in three categories: issues of scientific communication, bioinformatics workspace and software tips and news.

Michael Barton from Bioinformatics Zen posted three stories explaining how web technologies may improve scientific communication, plus he shared his thoughts about developing skills that are rarely taught in a grad school. “As for your research, start a blog”, he writes, “(…) Try alternative communication formats, post videos on your research, persuade other members in your lab as well.” As we speak about alternative formats, you’ve probably heard about Second Life and SciFoo virtual talks. If SL still feels awkward to you, Sandra Porter from Discovering Biology in a Digital World wrote a gentle introduction to attending a Second Life poster session.

From Neil Saunders we have an excellent tutorial (part I and part II) about setting up and using SVN and Trac for tracking bioinformatics projects. In theory, scientists should be able to trace anything they release (not only source code) back to its origins and Neil has ready to implement solution. As Paulo Nuin from Blind.Scientist found Trac a little bit clumsy, he recommended svn-time-lapse instead, since it’s easier to compare two versions of the file (see part I and part II). You can test both approaches with your new project inspired by Tiago from Perfect Storm – he started an interesting journey with Scala for bioinformatics.

There are around 10 new software releases every hour at Fortunately pool of scientific software is more manageable. This month’s highlights are: good news from Noel O’BoyleFrog developers donated their code to OpenBabel (that means flexible command-line converter from 1D/2D descriptors of molecules to 3D structure in a near future), tip from Andrew Perry about starting Qutemol, simple but impressive molecular visualizer, under linux (hint: use Windows version), reports from Animesh Sharma on the new versions of Biopython and Bioconductor.

Outside these categories there is a post of Jason Isheng Tsai from Paradoxus (good timing, only hour and a half before finishing this post) about his usual day in front of a computer. If you do have friends who are not scientists (do you? 🙂 ) asking about your work, you can point them there, although I have a feeling that they may not understand the irony in a difference between “grant version” and “real version” of the scientific work…

Trends, predictions and analysis

October started with announcement from IBM Research about software for 3D visualization of patient’s medical records. Bertalan Meskó from ScienceRoll posted his interview with Andre Elisseeff, who leads the healthcare projects at IBM Zurich Research Lab. Elisseeff says they got very positive feedback from physicians that used this “Google Earth for human body”. Maybe this and other health-related news were inspiration for Pedro Beltrao to write a brilliant story about possible future of personalized genomics. Frightening stuff.

Managing data across therapeutic programs can be challenging, not only because of their amount, but also because of difficulties in sharing information stored in there. Deepak Singh posted his thoughts on a topic of persistent context, this is maintaining context of the information along with the information. For me the eye-opening sentence was: “If you are storing relationships, i.e. your queries, and treating them as pieces of data, you are essentially capturing relationships, and the semantic web provides an elegant framework to do so.”

Open access

I am happy to point you to a post of Michael Kuhn about Max-Planck Society canceling its subscription to Springer journals. Reason for canceling? Way too expensive subscription (if paying individually for each downloaded paper is considered cheaper, such subscription is too expensive). Can you see a smile on faces of publishers of open-access journals?

If this issue resonate with you, have a look at the Evolgen highlight – an editorial in PLoS Biology entitled “When Is Open Access Not Open Access?” (–important update– see Pedro’s comment to that editorial), and point your colleagues who are not yet aware of significant changes we are facing to screencast of

Research highlights

One of the things I found interesting this month were two posts of Ian York from Mystery Rays from Outer Space. He wrote about publication demonstrating a potent epitope from HIV that emerged from a frame-shifting event and raised a question if this is a major phenomenon. As far as I know, out-of-frame sequences were not yet thoroughly analyzed – is it a chance for smart bioinformatics?


Scientific blogosphere is getting older: this month Omics! Omic! written by Keith Robinson turned one.

So, that’s all for this month. I apologize if anybody felt omitted but you have a good chance to pay me back by claiming the next edition of Bio::Blogs (just send an email to bioblogs at gmail dot com). I also want to thank Pedro Beltrao for giving me a chance to host this edition, despite my apparent bastardization of English language.


Posted by on November 1, 2007 in bioblogs, bioinformatics, Community


Wolfram Mathematica 6 – no New Kind of Science (yet)

Not so long ago Animesh Sharma pointed to quite old interview of Steven Wolfram about the book “The New Kind of Science” and asked if concepts concerning a biological framework made their way into Mathematica software.

I’ve just returned from Poland Mathematica Conference, and I can answer that question: no, they didn’t. While there were people using Modelica and Mathematica to model some stochastic processes in cells, Mathematica itself does not provide much of a support for any sophisticated description of biological mechanisms. Implications of concepts from The New Kind of Science book looked very promising – it’s a pity that we are not given tools to verify them ourselves.

1 Comment

Posted by on October 30, 2007 in bioinformatics, Comments, Software


Tags: , ,

Qutemol rendering

Impressive thing about Qutemol rendering with ambient occlusion is that this method is used in real time. I’ve put a small video showing a difference between typical rendering and Qutemol’s method (well, I hope it’s visible, quality of this video is pretty bad, but it’s my first file posted on YouTube).

The bad thing about Qutemol is that so far it works mostly only on the Windows OS (I’m not the last person having problems running it on the OSX). Linux users are out of luck – Qutemol needs hardware support for 3D rendering, so a virtual machine with Windows is not a solution.


Posted by on October 26, 2007 in bioinformatics, Software, Visualization


Tags: , ,

Bio::Blogs #16 – call for submissions

I got the privilege of hosting the next edition of Bio::Blogs. If you have anything you would like to have included please send an email to szczesny dot pawel at gmail dot com or to bioblogs at gmail dot com before 1st of November.


Posted by on October 20, 2007 in bioinformatics, Community


Tags: , , ,