Configuring Torque and InterProScan

10 07 2008
Image via Wikipedia

If by the chance, you want to use InterProScan with Torque Resource Manager (queueing system based on PBS project) it doesn’t work by default (it’s tested with LSF, configuration files are supplied for original PBS and Sun Grid Engine). Fortunately there are two small changes needed in the InterProScan config files to make it work. First, during iprscan configuration, choose PBS54 as your queueing system. Then, in the file pbs54.conf (IPRSCANHOME/conf) remove “-d” switch from following two lines:

asyncsub=qsub [%optqueue][%optresource] -d -o /dev/null -e /dev/null "[%toolcmd]"
syncsub=qsub [%optqueue][%optresource] -d -o /dev/null -e /dev/null -I "[%toolcmd"]

Assumming that Torque binaries are available in the global PATH (qsub, qdel etc., on my machine they sit under /usr/local/bin), change in default shell in the enviroment file pbs54env.sh - from #!/bin/sh to #!/bin/bash. Also, you can add another directories to the PATH in that file (I didn’t). Voilla. InterProScan jobs are now queued.

Zemanta Pixie




Bug tracking systems in science

18 04 2008

I’m not going to describe painful process of correcting entries in biological databases or errors in publications when one is not the author - we all know how difficult and unrewarding it is. All major databases contain wrong entries - I see misannotated (or nonexistent) genes in Genbank, artificial domains in PFAM or poorly solved structures in PDB. It’s even worse in publications, where across the whole spectrum of journals I see errors which in theory shouldn’t slip through peer review (this includes such prominent publishers like NPG).

One of the best idea I heard that addressed this issue was to build a bug tracking system (I would like to give credit to the author, but I cannot find the source; wasn’t that one of biobloggers?). It’s simple and efficient. Something is wrong? Fill a bug report. It would be linking to the original entry, would be available for aggregation (for example to track report’s author activity), and possibly could be closed by somebody else than database maintainers or authors if it’s wrong. Because it would be external to all databases, maybe it could grow to provide “community corrected” versions of these databases?

What do you think? How useful such system could be?





CLANS - java tool for cluster analysis of sequences

22 01 2008

As frequent visitors of this blog have already noticed, I am a big fan of different tools for data visualization. Today I would like to point you to java software called CLANS (CLuster ANalysis of Sequences) developed by my former colleague Tancred Frickey. CLANS runs (PSI)BLAST on your sequences, all vs all, and clusters them in 2D or 3D according to their similarity. This method allows for rapid classification of huge datasets and has the advantage over, lets say, phylogenetic tree, that one can quickly assess results of the clustering in a visual way (I cannot imagine making any sense of looking at phylogenetic tree with 1500 branches, while the graphical output, as on the animation below, is pretty easy to read).

CLANS animation

Beauty of the idea behind CLANS is that you can apply this method almost to any dataset which can be translated into all-vs-all relations. CLANS page has examples from protein clustering, microarray analysis and (which I like the most) image showing how standard aminoacids cluster in space according to BLOSUM62.





Tracking changes in a multiple sequence alignment

20 01 2008

I had few free hours during this weekend so I’ve hacked together couple of scripts that in theory could help me visualize changes between subfamilies in the protein multiple sequence alignment. In essence, I took the alignment, chose a master sequence that correspond to a known structure, removed all columns with gaps in the master sequence, and visualized fragments of the alignment (sliding window with 15 sequences) with Weblogo - software for preparing sequence logos from alignments. On the video below you can see:

  • two boxes showing the same template structure (second is just rotated); size of C-alpha atoms correspond to overall conservation at that position; first few residues do not have corresponding positions in the alignment
  • sequence logo of actual alignment window
  • sequence logo of the whole alignment - as a reference

There are several of things I’m not yet happy with. First of all, visualization of changes on the structure is hardly readable, even with video of much higher quality (probably I should do it with Chimera’s “worm” representation). Second thing is that I have no information which species/proteins I’m looking right now at (another box with highlights on a species tree of the family?). Also, I should remove some redundancy from the alignment; sometimes sliding window contains copies of the same protein. But overall it looks promising enough to convince me to spend few more hours on this small project. However, I would probably do the final version with Processing.





DNASIS SmartNote - online notebook for bioinformatics analysis

19 01 2008

I’ve found recently a video showing new web-based application for scientist. This is DNASIS SmartNote - an online notebook for sequence analysis, project organisation and sharing results, thoughts and data with other users/collaborators.

This service is provided by MiraiBio which belong to Group of Hitachi Software. This company provides instruments and software for biological research.

As soon as I resolve issues with obtaining a working account on the SmartNote (so far I cannot log in), I’ll post more about this service.





Linux screencasting software

15 01 2008

Just a short note today. If you look for screencasting software for your linux box, I recommend two titles: recordMyDesktop and Wink.

The first one is a typical desktop activity recorder - you mark capture area and that’s all. No fancy options: just a pure video stream from your screen. Video has very good quality (theora and vorbis codecs).

Wink is a screencaster oriented towards preparing interactive tutorials and presentations. You can record screen activity, but also pause the video, add text boxes with explanations, buttons waiting for user interaction (for example “Next” buttons). Output formats are: SWF, standalone EXE (for Windows machines only), PDF, PostScript and HTML. No typical video files, which on the other hand is not really a problem, as the framerate of the recording is pretty small. Another issue is that it apparently cannot record properly windows rendered with OpenGL (like molecular viewers) - window’s interior comes black. Even with these limitations I think Wink is better for preparing tutorials (for example on usage of some online bioinformatics service) than typical screencasting software.





Protein cartoons with Pymol

12 12 2007

Here is a short tutorial on the protein cartoons with Pymol. I picked as an example a hemoglobin and focused only on the cartoon representation of the protein, but keep in mind it does not necessarily explores all options of this software. Also, since I’m blind to stereo images, I’m not sure if all of following tips make sense with stereo representation of molecules.

  1. Change protein representation to cartoons.
  2. Turn off depth cue (under”Display”) - unless you want to put an accent on some part of the protein this option is unnecessary, because it’s hard to get 3D feeling from a 5cm on 5cm print.
  3. Turn off specular reflections (under “Display”) - most likely printer is able to show less colors than your screen, and will render specular reflections as harsh white blobs
  4. Change background color into white (as above) - that’s obvious, black background is for viewing on screen
  5. Change view to orthoscopic (as above) - maybe it’s a matter of a personal taste, but perspective view (default in Pymol) creates unnecessary distortions, that again do not help in shape perception on the small print
  6. Turn on option of “fancy helices” (”Settings/Cartoons”) - this renders helices with tubular edges like in Molscript (leave it off if you don’t like it)
  7. Turn on option “smooth loops” (as above) - perception of the secondary structure elements arrangement becomes much easier
  8. Turn on option “highlight color” (as above) - again, it’s a matter of a personal taste; this option make an internal surface of helices grey (you may change the color via command line)
  9. Turn shadows off (”Settings/Rendering/Shadows”) - I feel that on a small print they only disturb the image

What I also do is turning on matte finish on the cartoons. While it doesn’t necessarily look better on the screen, when in print it helps to mask printing artefacts (like raster), when looked at from normal viewing distance.

Then you can test these settings by clicking “Ray”. If you like the final image, save it, read its dimensions and multiply them by 3. Then type into command-line box: ray multipliedX, multipliedY and press enter.

Below I embedded a video showing more or less what I’ve just described.

Feel free to comment if you have any suggestions on improving this process.





Software portability and virtual appliances

27 11 2007

Bioinformatics can mean developing new algorithms for biological data analysis. Scientists who code and release the software face often an issue of making the program portable. I see three clear solutions to that issue. First, one can spend a lot of time porting the source to other platforms (plus testing, fixing and yelling at incompatibilities). This is not easy even within the linux OSes (remember broken HMMER binary packages with Debian and Ubuntu?), not to mention porting to OSX or Windows. What can we do? Second solution is to build a web interface around the software. This is extremely popular and makes almost everyone’s life easier. However there are drawbacks: maintenence of the service (it costs money and grant agencies are not willing to spend a dime on it) and batch access requests from some users (there’s always somebody who wants to feed into your software 5 millions sequences or 50 thousands structures). The third solution to the software portability issue can address at least the second of these drawbacks: one can create a virtual machine with a proper enviroment for developed software, and release it together. Yes, release a software together with the whole enviroment. And it’s not that difficult, as it seems.

We face computing clouds, internet companies that do not have a single server, virtual appliances for quick installation of, let’s say, blog server with WordPress, without any knowledge about software requirements. Virtual appliances, this is complete virtual machines, can contain already configured software (most trivial example would be LAMP - Linux, Apache, MySQL and PHP). So far I found only one such appliance for bioinformatics: it’s called DNALinux Virtual Desktop Edition and contains, among others, BLAST, EMBOSS, Pymol, BioPerl and Biopython. Since VMWare server is free (although registration is required), this makes pretty nice alternative for those with Windows machines, as it allows for running windowed linux at a speed of ca. two-thirds of a native system. VMWare software can create a virtual machine out of the working system, but I wouldn’t recommend that as we usually have much more software installed than it’s needed to run our own programs. So creating a virtual appliance for, let’s say, BLAST, would mean installing a fresh copy of our favourite linux under VMWare Server with nothing more than necessary libraries, copy of BLAST executables and possibly a web interface. Voilla. Virtual appliance for BLAST, anybody?

While it may seem a bit of overkill at first, I don’t think it is in the long run. Porting the software to other operating systems is only part of the story - maintenance to keep it working with newer version of the libraries is another. There’s a lot of programs that are not actively maintained for a long time. I have two quick examples where virtual appliance approach would save them from forgetting: PovChem (rendering of molecules, depends on some ancient libraries) or MACAW (it doesn’t work on anything but Mac OS 9, Windows version crashes the system). OK, MACAW may be not fair, as we face here legal issues with the operating system, but I believe any heavy software user already didn’t count how many times hadn’t tried some well-thought software because of its requirements.

Have a look and try. I’m already running two operating systems (good bye dual-boot) and this is definitely a future for our desktops with already too much processing power. But honestly I dream about a day, when all possible bioinformatics algorithms and biological data will be available at some computing cloud and running Taverna will be a good alternative to all day data munging.

 





Wolfram Mathematica 6 - no New Kind of Science (yet)

30 10 2007

Not so long ago Animesh Sharma pointed to quite old interview of Steven Wolfram about the book “The New Kind of Science” and asked if concepts concerning a biological framework made their way into Mathematica software.

I’ve just returned from Poland Mathematica Conference, and I can answer that question: no, they didn’t. While there were people using Modelica and Mathematica to model some stochastic processes in cells, Mathematica itself does not provide much of a support for any sophisticated description of biological mechanisms. Implications of concepts from The New Kind of Science book looked very promising - it’s a pity that we are not given tools to verify them ourselves.





Qutemol rendering

26 10 2007

Impressive thing about Qutemol rendering with ambient occlusion is that this method is used in real time. I’ve put a small video showing a difference between typical rendering and Qutemol’s method (well, I hope it’s visible, quality of this video is pretty bad, but it’s my first file posted on YouTube).

The bad thing about Qutemol is that so far it works mostly only on the Windows OS (I’m not the last person having problems running it on the OSX). Linux users are out of luck - Qutemol needs hardware support for 3D rendering, so a virtual machine with Windows is not a solution.