RSS

Tag Archives: bioinformatics

Semi-automated workflows – Taverna Interaction Service

I was still thinking about recent Neil’s wondering about possibility of automating every scientific workflow, when I saw this (Bioinformatics Advance Access abstract):

The Taverna Interaction Service: enabling manual interaction in workflows by Anders Lanzén and Tom Oinn

Taverna is an application that eases the integration of tools and databases for life science research by the construction of workflows. The Taverna Interaction Service extends the functionality of Taverna by defining human interaction within a workflow and acting as a mediation layer between the automated workflow engine and one or more users.

I have not tried it yet but this Taverna plugin is very likely an answer to doubts I often have when automation of bioinformatics workflows is discussed: we shouldn’t always remove ourselves from the workflow, as interaction with software can be often critical in making a discovery. For example conscious decision about which sequences should go in during PSI-BLAST search can dramatically influence quality of resulting profile. So I agree with Neil that not every workflow can be automated, but more importantly not every workflow should be. Possibility of wrapping one’s mind around a problem is gone when there’s no feedback loop on the process.

Comments Off

Posted by Pawel Szczesny on March 12, 2008 in bioinformatics, Papers, PubMed

Tags: bioinformatics, Research, Taverna

Mining PubMed – another tools available

05 Mar

There are two new tools available that mine semantically PubMed abstracts, e-LiSe and Anne O’Tate. First one was made by my colleagues from Institute of Biochemistry and Biophysics in Warsaw, while the second is from University of Illinois, Chicago. Female-sounding names is not the only thing that makes them look similar, they both provide analogous functionality, like keywords or author names associated with user query.

There’s quite a lot of third party interfaces to PubMed (see David Rothman’s excellent list), so I couldn’t resist to run few queries on both servers and compare them to GoPubmed, which currently wins hands down in terms of features and interface. I wasn’t surprised to see that results overlap significantly, although not completely. Each of servers found valuable keywords other two did not have – that’s understandable, they use different algorithms. I wonder if we will see a meta-server of PubMed data-miners, like there are for protein structure prediction (for example meta.bioinfo.pl). In theory, exhaustive search for meaningful keywords by different methods and then their classification and analysis should work better than any single method, but this is just a guess.

5 Comments

Posted by Pawel Szczesny on March 5, 2008 in bioinformatics, Data mining, PubMed

Tags: bioinformatics, Data mining, literature search, PubMed

Importance of null models – slides by Kevin Karplus

21 Feb

Again, a short note today (but I have some longer posts on the way). I’ve just fished reading slides of the talk Kevin Karplus had given on the 3DSig (satellite conference of the last ISMB in Vienna). The talk was entitled: Better than chance: the importance of null models. If you haven’t been there, I hope take-home messages will convince you to have a look:

Base your null models on biologically meaningful null hypotheses, not just computationally convenient math.
Generative models and simulation can be useful for more complicated models.
Picking the right model remains more art than science.

Very good connection of math skills and a feeling of biological problems.

Comments Off

Posted by Pawel Szczesny on February 21, 2008 in bioinformatics, Structure prediction

Tags: bioinformatics, Kevin Karplus, null models, slides, talk

Can a biologist fix a radio?

05 Feb

[via Molecule of the Day] Go and read (if you haven’t before) this brilliant piece on modern biology: Can a Biologist Fix a Radio? Read it twice if you call yourself bioinformatician…

2 Comments

Posted by Pawel Szczesny on February 5, 2008 in bioinformatics, Fun

Tags: bioinformatics, biology, humor

Jane – Journal/Author Name Estimator

28 Jan

Jane – Journal/Author Name Estimator is a new web based application that can suggest potential reviewers or target journals for a manuscript based on its title and abstract. It was just published by Bioinformatics under Advance Access (but unfortunately it’s not an open access article). I have tested two of my upcoming publications and Jane performed well: I wasn’t surprised by most of predicted names and journal titles. The topic I’m writing about in these papers is rather narrow, so don’t treat it as any performance measure – test it yourself, if you are interested.

Probably I’m not going to use it as authors suggested – I consider this application a helpful literature research tool.

1 Comment

Posted by Pawel Szczesny on January 28, 2008 in bioinformatics, Papers, Research, Services

Tags: bioinformatics, publication, semantics, service

Visualization of internal repeats in proteins (or DNA)

24 Jan

There’s a number of protein families that have internal repeats (like TPR, Armadillo, ankyrin etc.). I’m very interested in many of them for reasons I will explain in other post. Assessing arrangement of these repeats is straightforward in majority of cases – most of them tend to occur next to each other, with little or no insertions between them (finding them at first is completely different story). However, there are proteins where internal repeats are separated by other domains or repeats, which can result in a real mess (or in scientific language: mosaic-like architecture). When couple of months ago I looked for some visualization method that would allow me to have a quick overview of internal structure of such proteins, I’ve stumbled across The Shape of Song – visualization method developed by Martin Wattenberg, researcher at IBM. This fitted my requirements so I’ve implemented it with some help of Processing (and which I’ve added later to a protein analysis server that has a chance to be published next month). Resulting visualization is below:

Internal repeats in a protein

Repeats are colored according to repeat type and are connected according to repeat family. If you think about it in terms of SCOP (Structural Classification of Proteins) hierarchy, colors represent class, while arcs connect superfamilies. The longer and more complicated analysed sequence is, the more useful this approach seems to be, so for short proteins typical domain bubbles would work better.

People that are into genomic sequences may notice similarity of this approach to Circos developed by Martin Krzywinski (whose work I really admire, especially on HDTR). Basically the idea behind both is pretty much the same, but I’ve never thought about straightening that circle until I saw The Shape of Song. My thinking is sometimes dramatically schematic…

8 Comments

Posted by Pawel Szczesny on January 24, 2008 in bioinformatics, Proteins, Research, Visualization

Tags: bioinformatics, java, processing, protein analysis, repeats, Visualization

CLANS – java tool for cluster analysis of sequences

22 Jan

As frequent visitors of this blog have already noticed, I am a big fan of different tools for data visualization. Today I would like to point you to java software called CLANS (CLuster ANalysis of Sequences) developed by my former colleague Tancred Frickey. CLANS runs (PSI)BLAST on your sequences, all vs all, and clusters them in 2D or 3D according to their similarity. This method allows for rapid classification of huge datasets and has the advantage over, lets say, phylogenetic tree, that one can quickly assess results of the clustering in a visual way (I cannot imagine making any sense of looking at phylogenetic tree with 1500 branches, while the graphical output, as on the animation below, is pretty easy to read).

CLANS animation

Beauty of the idea behind CLANS is that you can apply this method almost to any dataset which can be translated into all-vs-all relations. CLANS page has examples from protein clustering, microarray analysis and (which I like the most) image showing how standard aminoacids cluster in space according to BLOSUM62.

1 Comment

Posted by Pawel Szczesny on January 22, 2008 in bioinformatics, Research, Software, Visualization

Tags: bioinformatics, java, sequence analysis, Software

DNASIS SmartNote – online notebook for bioinformatics analysis

19 Jan

I’ve found recently a video showing new web-based application for scientist. This is DNASIS SmartNote – an online notebook for sequence analysis, project organisation and sharing results, thoughts and data with other users/collaborators.

This service is provided by MiraiBio which belong to Group of Hitachi Software. This company provides instruments and software for biological research.

As soon as I resolve issues with obtaining a working account on the SmartNote (so far I cannot log in), I’ll post more about this service.

4 Comments

Posted by Pawel Szczesny on January 19, 2008 in bioinformatics, Services, Software

Tags: bioinformatics, collaboration, dnasis, online service, video

Software portability and virtual appliances

27 Nov

Bioinformatics can mean developing new algorithms for biological data analysis. Scientists who code and release the software face often an issue of making the program portable. I see three clear solutions to that issue. First, one can spend a lot of time porting the source to other platforms (plus testing, fixing and yelling at incompatibilities). This is not easy even within the linux OSes (remember broken HMMER binary packages with Debian and Ubuntu?), not to mention porting to OSX or Windows. What can we do? Second solution is to build a web interface around the software. This is extremely popular and makes almost everyone’s life easier. However there are drawbacks: maintenence of the service (it costs money and grant agencies are not willing to spend a dime on it) and batch access requests from some users (there’s always somebody who wants to feed into your software 5 millions sequences or 50 thousands structures). The third solution to the software portability issue can address at least the second of these drawbacks: one can create a virtual machine with a proper enviroment for developed software, and release it together. Yes, release a software together with the whole enviroment. And it’s not that difficult, as it seems.

We face computing clouds, internet companies that do not have a single server, virtual appliances for quick installation of, let’s say, blog server with WordPress, without any knowledge about software requirements. Virtual appliances, this is complete virtual machines, can contain already configured software (most trivial example would be LAMP – Linux, Apache, MySQL and PHP). So far I found only one such appliance for bioinformatics: it’s called DNALinux Virtual Desktop Edition and contains, among others, BLAST, EMBOSS, Pymol, BioPerl and Biopython. Since VMWare server is free (although registration is required), this makes pretty nice alternative for those with Windows machines, as it allows for running windowed linux at a speed of ca. two-thirds of a native system. VMWare software can create a virtual machine out of the working system, but I wouldn’t recommend that as we usually have much more software installed than it’s needed to run our own programs. So creating a virtual appliance for, let’s say, BLAST, would mean installing a fresh copy of our favourite linux under VMWare Server with nothing more than necessary libraries, copy of BLAST executables and possibly a web interface. Voilla. Virtual appliance for BLAST, anybody?

While it may seem a bit of overkill at first, I don’t think it is in the long run. Porting the software to other operating systems is only part of the story – maintenance to keep it working with newer version of the libraries is another. There’s a lot of programs that are not actively maintained for a long time. I have two quick examples where virtual appliance approach would save them from forgetting: PovChem (rendering of molecules, depends on some ancient libraries) or MACAW (it doesn’t work on anything but Mac OS 9, Windows version crashes the system). OK, MACAW may be not fair, as we face here legal issues with the operating system, but I believe any heavy software user already didn’t count how many times hadn’t tried some well-thought software because of its requirements.

Have a look and try. I’m already running two operating systems (good bye dual-boot) and this is definitely a future for our desktops with already too much processing power. But honestly I dream about a day, when all possible bioinformatics algorithms and biological data will be available at some computing cloud and running Taverna will be a good alternative to all day data munging.

9 Comments

Posted by Pawel Szczesny on November 27, 2007 in bioinformatics, Services, Software

Tags: bioinformatics, software development, virtual appliance, vmware

Wolfram Mathematica 6 – no New Kind of Science (yet)

30 Oct

Not so long ago Animesh Sharma pointed to quite old interview of Steven Wolfram about the book “The New Kind of Science” and asked if concepts concerning a biological framework made their way into Mathematica software.

I’ve just returned from Poland Mathematica Conference, and I can answer that question: no, they didn’t. While there were people using Modelica and Mathematica to model some stochastic processes in cells, Mathematica itself does not provide much of a support for any sophisticated description of biological mechanisms. Implications of concepts from The New Kind of Science book looked very promising – it’s a pity that we are not given tools to verify them ourselves.

1 Comment

Posted by Pawel Szczesny on October 30, 2007 in bioinformatics, Comments, Software

Tags: bioinformatics, mathematica, wolfram