It’s already a while since IBM launched their data visualization framework Many Eyes. While initially resistant (there’s nothing that Gnuplot/R/Graphviz cannot do) I’ve decided to have a closer look. Obviously I didn’t get it before – Many Eyes is not about making the visualization easier (although IBM did quite a lot in that direction). It’s about sharing both data and approach to that data.
Many Eyes encourages to test things. A single perl one-liner and we see the most often occuring domains in proteins of Bacillus anthracis:

Or maybe we want to know which of these domains co-occur (nex to each other) in a single protein (only the biggest cluster shown):

(note that this is the output of quick hacks – I wouldn’t call it a scientific analysis)
Many Eyes is a service for general data. What about making such thing for the biological data analysis? The workflows may be shared on the myExperiment, and the data (input and output, and a visualization of the latter) on a site like Many Eyes? And deposition of the data would be required for certain papers? So far the results of the bioinformatic analysis are (sometimes) attached as a supplementary material in some weird format (pdf or doc). This at least make it accessible for years, but there’s no access to the original data and no way to verify if the analysis was correct other than looking at the results (and usually that’s not enough). Is there anything like that available? If not, do you think it would be valuable to build a service like that?





On the scripting skills
The interview with dr Alexei Drummond inspired an interesting discussion. While I agree that some level of training in programming would be very beneficial for the biologists, I think that there’s something more important people working at the bench should learn – using the tools for biological data analysis. The scripting skills are fine, they save often enormous amount of time, however not willing to learn how to do a BLAST search (or any other basic tool in the field) and interpret results, leads to publishing papers with errors (the best case) or with completely wrong conclusions (that is more often). I’m not talking about becoming an expert – this can take years, like in programming and this should be left to people spending the whole day doing data analysis (aka bioinformaticians). I’m talking about “scripting” equivalent of programming and this level is currently taught on bioinformatics undergraduate courses at most of the universities. Such training would save the world from papers comparing multiple sequence alignments from Clustal and… BLAST (if some readers do not know – BLAST at best can produce multiple pairwise alignment; it does not align all the sequences together).
These are my two cents. I hope to hear your opinion on that.
Posted by Pawel Szczesny on August 19, 2007 in Comments, Research skills