RSS

Monthly Archives: November 2007

Software portability and virtual appliances

Bioinformatics can mean developing new algorithms for biological data analysis. Scientists who code and release the software face often an issue of making the program portable. I see three clear solutions to that issue. First, one can spend a lot of time porting the source to other platforms (plus testing, fixing and yelling at incompatibilities). This is not easy even within the linux OSes (remember broken HMMER binary packages with Debian and Ubuntu?), not to mention porting to OSX or Windows. What can we do? Second solution is to build a web interface around the software. This is extremely popular and makes almost everyone’s life easier. However there are drawbacks: maintenence of the service (it costs money and grant agencies are not willing to spend a dime on it) and batch access requests from some users (there’s always somebody who wants to feed into your software 5 millions sequences or 50 thousands structures). The third solution to the software portability issue can address at least the second of these drawbacks: one can create a virtual machine with a proper enviroment for developed software, and release it together. Yes, release a software together with the whole enviroment. And it’s not that difficult, as it seems.

We face computing clouds, internet companies that do not have a single server, virtual appliances for quick installation of, let’s say, blog server with WordPress, without any knowledge about software requirements. Virtual appliances, this is complete virtual machines, can contain already configured software (most trivial example would be LAMP – Linux, Apache, MySQL and PHP). So far I found only one such appliance for bioinformatics: it’s called DNALinux Virtual Desktop Edition and contains, among others, BLAST, EMBOSS, Pymol, BioPerl and Biopython. Since VMWare server is free (although registration is required), this makes pretty nice alternative for those with Windows machines, as it allows for running windowed linux at a speed of ca. two-thirds of a native system. VMWare software can create a virtual machine out of the working system, but I wouldn’t recommend that as we usually have much more software installed than it’s needed to run our own programs. So creating a virtual appliance for, let’s say, BLAST, would mean installing a fresh copy of our favourite linux under VMWare Server with nothing more than necessary libraries, copy of BLAST executables and possibly a web interface. Voilla. Virtual appliance for BLAST, anybody?

While it may seem a bit of overkill at first, I don’t think it is in the long run. Porting the software to other operating systems is only part of the story – maintenance to keep it working with newer version of the libraries is another. There’s a lot of programs that are not actively maintained for a long time. I have two quick examples where virtual appliance approach would save them from forgetting: PovChem (rendering of molecules, depends on some ancient libraries) or MACAW (it doesn’t work on anything but Mac OS 9, Windows version crashes the system). OK, MACAW may be not fair, as we face here legal issues with the operating system, but I believe any heavy software user already didn’t count how many times hadn’t tried some well-thought software because of its requirements.

Have a look and try. I’m already running two operating systems (good bye dual-boot) and this is definitely a future for our desktops with already too much processing power. But honestly I dream about a day, when all possible bioinformatics algorithms and biological data will be available at some computing cloud and running Taverna will be a good alternative to all day data munging.

9 Comments

Posted by Pawel Szczesny on November 27, 2007 in bioinformatics, Services, Software

Tags: bioinformatics, software development, virtual appliance, vmware

Computational Biology and Evolution – new blog

21 Nov

[via Simon Greenhill at Henry] Alexei Drummond, scientist at Department of Computer Science at University of Auckland and Chief Scientist at Biomatters Ltd, has launched a blog: Computational Biology and Evolution.

We grow stronger…

1 Comment

Posted by Pawel Szczesny on November 21, 2007 in bioinformatics, Community

Ten simple rules for doing your best research – according to Richard Hamming

06 Nov

There’s an editorial in PLoS Computational Biology presenting condensed thoughts on “first-class research” of mathematician Richard Hamming. It is based on a transcript of a brilliant talk given by Hamming in 1986 at the Bell Communications Research Colloquium Seminar. Definitely a must-read.

clipped from compbiol.plosjournals.org

Hamming’s 1986 talk was remarkable. In “You and Your Research,” he addressed the question: How can scientists do great research, i.e., Nobel-Prize-type work? His insights were based on more than forty years of research as a pioneer of computer science and telecommunications who had the privilege of interacting with such luminaries as the physicists Richard Feynman, Enrico Fermi, Edward Teller, Robert Oppenheimer, Hans Bethe, and Walter Brattain, with Claude Shannon, “the father of information theory,” and with the statistician John Tukey.

Comments Off

Posted by Pawel Szczesny on November 6, 2007 in Clipped, Research skills

Tags: plos comp bio, Research, ten simple rules

Bio::Blogs #16 – Halloween edition

01 Nov

Blogs #16

Original image courtesy of Flickr user docman

Welcome to the 16th edition of Bio::Blogs, the monthly digest of highlights from bioinformatics and computational biology blogs. Hat tip to Deepak for suggesting the name (we actually start to absorb many sides of Halloween here in Poland).

Everyday science

This time we have interesting post on day-to-day scientific life in three categories: issues of scientific communication, bioinformatics workspace and software tips and news.

Michael Barton from Bioinformatics Zen posted three stories explaining how web technologies may improve scientific communication, plus he shared his thoughts about developing skills that are rarely taught in a grad school. “As for your research, start a blog”, he writes, “(…) Try alternative communication formats, post videos on your research, persuade other members in your lab as well.” As we speak about alternative formats, you’ve probably heard about Second Life and SciFoo virtual talks. If SL still feels awkward to you, Sandra Porter from Discovering Biology in a Digital World wrote a gentle introduction to attending a Second Life poster session.

From Neil Saunders we have an excellent tutorial (part I and part II) about setting up and using SVN and Trac for tracking bioinformatics projects. In theory, scientists should be able to trace anything they release (not only source code) back to its origins and Neil has ready to implement solution. As Paulo Nuin from Blind.Scientist found Trac a little bit clumsy, he recommended svn-time-lapse instead, since it’s easier to compare two versions of the file (see part I and part II). You can test both approaches with your new project inspired by Tiago from Perfect Storm – he started an interesting journey with Scala for bioinformatics.

There are around 10 new software releases every hour at SourceForge.net. Fortunately pool of scientific software is more manageable. This month’s highlights are: good news from Noel O’Boyle – Frog developers donated their code to OpenBabel (that means flexible command-line converter from 1D/2D descriptors of molecules to 3D structure in a near future), tip from Andrew Perry about starting Qutemol, simple but impressive molecular visualizer, under linux (hint: use Windows version), reports from Animesh Sharma on the new versions of Biopython and Bioconductor.

Outside these categories there is a post of Jason Isheng Tsai from Paradoxus (good timing, only hour and a half before finishing this post) about his usual day in front of a computer. If you do have friends who are not scientists (do you? 🙂 ) asking about your work, you can point them there, although I have a feeling that they may not understand the irony in a difference between “grant version” and “real version” of the scientific work…

Trends, predictions and analysis

October started with announcement from IBM Research about software for 3D visualization of patient’s medical records. Bertalan Meskó from ScienceRoll posted his interview with Andre Elisseeff, who leads the healthcare projects at IBM Zurich Research Lab. Elisseeff says they got very positive feedback from physicians that used this “Google Earth for human body”. Maybe this and other health-related news were inspiration for Pedro Beltrao to write a brilliant story about possible future of personalized genomics. Frightening stuff.

Managing data across therapeutic programs can be challenging, not only because of their amount, but also because of difficulties in sharing information stored in there. Deepak Singh posted his thoughts on a topic of persistent context, this is maintaining context of the information along with the information. For me the eye-opening sentence was: “If you are storing relationships, i.e. your queries, and treating them as pieces of data, you are essentially capturing relationships, and the semantic web provides an elegant framework to do so.”

Open access

I am happy to point you to a post of Michael Kuhn about Max-Planck Society canceling its subscription to Springer journals. Reason for canceling? Way too expensive subscription (if paying individually for each downloaded paper is considered cheaper, such subscription is too expensive). Can you see a smile on faces of publishers of open-access journals?

If this issue resonate with you, have a look at the Evolgen highlight – an editorial in PLoS Biology entitled “When Is Open Access Not Open Access?” (–important update– see Pedro’s comment to that editorial), and point your colleagues who are not yet aware of significant changes we are facing to screencast of Konrad Förstner, explaining what the open science is all about.

Research highlights

One of the things I found interesting this month were two posts of Ian York from Mystery Rays from Outer Space. He wrote about publication demonstrating a potent epitope from HIV that emerged from a frame-shifting event and raised a question if this is a major phenomenon. As far as I know, out-of-frame sequences were not yet thoroughly analyzed – is it a chance for smart bioinformatics?

Events

Scientific blogosphere is getting older: this month Omics! Omic! written by Keith Robinson turned one.

So, that’s all for this month. I apologize if anybody felt omitted but you have a good chance to pay me back by claiming the next edition of Bio::Blogs (just send an email to bioblogs at gmail dot com). I also want to thank Pedro Beltrao for giving me a chance to host this edition, despite my apparent bastardization of English language.

11 Comments

Posted by Pawel Szczesny on November 1, 2007 in bioblogs, bioinformatics, Community