Comments | Freelancing science

Wolfram Mathematica 6 – no New Kind of Science (yet)

Not so long ago Animesh Sharma pointed to quite old interview of Steven Wolfram about the book “The New Kind of Science” and asked if concepts concerning a biological framework made their way into Mathematica software.

I’ve just returned from Poland Mathematica Conference, and I can answer that question: no, they didn’t. While there were people using Modelica and Mathematica to model some stochastic processes in cells, Mathematica itself does not provide much of a support for any sophisticated description of biological mechanisms. Implications of concepts from The New Kind of Science book looked very promising – it’s a pity that we are not given tools to verify them ourselves.

1 Comment

Posted by Pawel Szczesny on October 30, 2007 in bioinformatics, Comments, Software

Tags: bioinformatics, mathematica, wolfram

My gallery of images

28 Oct

Readers of this blog who rely on RSS feeds may have not noticed that I had put a separate page containing computer-generated images of various molecules – Molecular renderings. Any comments, suggestions, critique are always welcome.

From time to time I’ll post new images there – from time to time I need to remind myself that science is pretty too :).

Comments Off

Posted by Pawel Szczesny on October 28, 2007 in Comments, Visualization

Tags: gallery molecules images

Thoughts on CASP – Critical assessment of methods of protein structure prediction

10 Oct

I’ve just read an introduction to the supplemental issue of the journal PROTEINS, dedicated to the most recent round of the CASP experiment. It describes the progress of the protein structure prediction over the last few CASP editions.

The list of advancements include:

improvement of the homology modelling: one of the issues in template-based modelling of protein structures was that a final model wasn’t closer to the real structure than a template; now we have statistically significant (although very small) improvement thanks to the multi-template based modelling
fully automated methods are much closer to human predictors than ever: many groups use models from servers as their starting point and usually they don’t improve them that much

I believe that this was possible thanks to the progress that has been made in the area of sequence homology searches. Finding similarity between two sequences well beyond any reasonable identity thresholds is now doable thanks to profile-to-profile comparison, meta-servers (joining predictions from many different methods) or recent hmm-to-hmm algorithms (comparison of Hidden Markov Models). If you can find a suitable template for your protein, the rest is then much easier, isn’t it?

There are of course fields that still need some work. One of these often stirs a lot of discussion: automated assessing of model similarity to the real structure. The current methods have proven their suitability, I definitely agree. However I hope that at some point the protein structure comparison software will refuse to superimpose eight- and ten-stranded beta-barrels or left- and right-handed coiled-coil with a message: “It doesn’t make sense.”

CASP 7 logo

Comments Off

Posted by Pawel Szczesny on October 10, 2007 in Comments, Papers, Research, Structure prediction

Tags: bioinformatics, casp, Proteins, Research, Structure prediction

Manual sequence analysis – some common mistakes

25 Sep

This is a topic I probably will come back to on many occasions. Publication with very wrong sequence analysis like the one Stephen Spiro pointed out on his blog is not an exception. I may agree that large scale analysis can stand quick and dirty treatment of protein sequence (and some error propagation at the same time). In large scale analysis nobody cares if the domain assignment is 100% right (it isn’t), if there are false positives (there are) or even if the material to begin with (protein sequences for example) is free of errors (it is not) – as long as the overall quality of the work is acceptable. However, this optimistic approach cannot be applied to the manual protein sequence analysis. Simply errors introduced in such cases are a way more important. How to avoid some of these errors? A few common mistakes that come to my mind are:

lack, not accurate or quick and dirty domain annotation: this probably is a topic for separate post, but in short – relying on a single method or strict E-value, excluding overlaps, ignoring internal repeats, forgetting about structural elements like transmembrane helices etc. lead to mistakes in domain annotation
running PSI-BLAST search on unclustered databases: the profile for many query sequences will get biased and diverge in a random direction if the PSI-BLAST runs on the unclustered database (remember 500 copies of the same protein in the results?); after all these years I still don’t get why NCBI does not provide nr90 (non-redundant db clustered at 90% identity threshold) for the PSI-BLAST
running PSI-BLAST without looking at the results of each run: if you don’t assess what goes in, you risk allowing some garbage
masking low-complexity, coiled-coils and transmembrane regions in BLAST search on every single occasion: while most of the times this is a valid approach, there are cases where the answer is revealed after turning the masking off
skipping other tools for sequence analysis like predictors of signal sequences, motifs, functional sites
skipping analysis of a genomic context: while not applicable to all systems, analysis of the genomic context may influence dramatically function prediction

It’s so far all I could think of. Do you have any other suggestions? Let me know.

1 Comment

Posted by Pawel Szczesny on September 25, 2007 in Comments, Research, Research skills

On the scripting skills

19 Aug

The interview with dr Alexei Drummond inspired an interesting discussion. While I agree that some level of training in programming would be very beneficial for the biologists, I think that there’s something more important people working at the bench should learn – using the tools for biological data analysis. The scripting skills are fine, they save often enormous amount of time, however not willing to learn how to do a BLAST search (or any other basic tool in the field) and interpret results, leads to publishing papers with errors (the best case) or with completely wrong conclusions (that is more often). I’m not talking about becoming an expert – this can take years, like in programming and this should be left to people spending the whole day doing data analysis (aka bioinformaticians). I’m talking about “scripting” equivalent of programming and this level is currently taught on bioinformatics undergraduate courses at most of the universities. Such training would save the world from papers comparing multiple sequence alignments from Clustal and… BLAST (if some readers do not know – BLAST at best can produce multiple pairwise alignment; it does not align all the sequences together).
These are my two cents. I hope to hear your opinion on that.

3 Comments

Posted by Pawel Szczesny on August 19, 2007 in Comments, Research skills

Why freelancing science?

07 Aug

There are many definitions of bioinformatics. They range from “handling biological data with a computer” to a very extensive and precise descriptions, including even subdivisions. In general they agree on one thing: bioinformatics is used for virtually unlimited number of tasks. Whether it’s a sequence analysis, handling microarray data, juggling chemical reaction parameters – as long as it’s around living things, it’s considered a bio(chem)informatics.

I don’t see a need to invent yet another name for it. But freelancing science keeps coming to my mind all the time. Switching the system your are working on during a coffee break or doing something in your own way instead of following “the protocol” has its “freelancing” feeling, doesn’t it? 🙂

3 Comments

Posted by Pawel Szczesny on August 7, 2007 in Comments