PhD thesis in LaTeX

19 06 2008

For the record: here you can see a single (still unfinished) page of my PhD thesis prepared in LaTeX. I used PhD thesis style prepared by Jamie Stevens and wrote the whole thing using Kile editor. An image on the margin can be inserted with command:

\marginpar{
     \centering{
         \includegraphics[width=3cm]{image.pdf}
     }
     Caption text
}




Blogging overtaken by life streaming

15 05 2008

I don’t post new things as often as I used to couple of months ago, but it’s not all my fault. FriendFeed and Google Reader (especially the newest feature of adding notes to shared things) create so much better space for rapid thoughts exchange than a blog, that I comment, link and share most of the things over there, and that includes even making scientific collaborations. This blog is going to loose a little of its dynamics, but already after few weeks I see advantages (like saving time) of moving micro-posts to World Wide Talk Show, as Robert Scoble calls FF.

Amount of interesting conversations at FF and Twitter combined is so huge that I don’t do random web browsing anymore (and I’m not the only one who says that). And I don’t even subscribe to thousands of people - it’s less than a hundred in total on both services. This list includes scientists (here’s probably already outdated list at Nature’s blog Nascent of scientist at FF), technologists and other interesting chaps.

So join us at Twitter or FriendFeed - my login at both services is “freesci”. Life is about interesting conversations, isn’t it? :)

UPDATE: Pierre Lindenbaum has obviously similar thoughts.





Bug tracking systems in science

18 04 2008

I’m not going to describe painful process of correcting entries in biological databases or errors in publications when one is not the author - we all know how difficult and unrewarding it is. All major databases contain wrong entries - I see misannotated (or nonexistent) genes in Genbank, artificial domains in PFAM or poorly solved structures in PDB. It’s even worse in publications, where across the whole spectrum of journals I see errors which in theory shouldn’t slip through peer review (this includes such prominent publishers like NPG).

One of the best idea I heard that addressed this issue was to build a bug tracking system (I would like to give credit to the author, but I cannot find the source; wasn’t that one of biobloggers?). It’s simple and efficient. Something is wrong? Fill a bug report. It would be linking to the original entry, would be available for aggregation (for example to track report’s author activity), and possibly could be closed by somebody else than database maintainers or authors if it’s wrong. Because it would be external to all databases, maybe it could grow to provide “community corrected” versions of these databases?

What do you think? How useful such system could be?





Changes and updates

21 03 2008

Here’s a summary of changes that happened in the meantime on Freelancing Science blog:

  • Freelancing Science has its own domain (freelancingscience.com in case you wonder), but for readers nothing changed: all links, feed urls seem to work as they did so far. The main change is in your browser’s address bar and my email (pawel at new domain name).
  • I have added another box with Google Reader starred items. Feel free to subscribe, although I star something in GR based on my loose impression (not necessarily valid) that particular piece is worth coming back to. Because my reading list is constantly enlarging (more on this below), expect large amount of items in this feed.
  • I’ve added a number of blogs to link list in the sidebar (this list is constantly expanding). Among others there are: blog of Daniel Lemire, computer science professor from Montreal, Reasonable Deviations - everything scientific and challenging, from math to finances, Molgraph3D - visualization (of course!) of molecules with various techniques by Ludovic Autin and Biosingularity - news blog about advances in bioengineering.
  • I keep updating Images of molecules page, although I put all new stuff also in my Flickr “Molecular renderings” set, which is easy to track as it provides RSS feeds (to track people’s lifestreams I recommend FriendFeed - couple of fellow biobloggers have their account there).




Wolfram Mathematica 6 - no New Kind of Science (yet)

30 10 2007

Not so long ago Animesh Sharma pointed to quite old interview of Steven Wolfram about the book “The New Kind of Science” and asked if concepts concerning a biological framework made their way into Mathematica software.

I’ve just returned from Poland Mathematica Conference, and I can answer that question: no, they didn’t. While there were people using Modelica and Mathematica to model some stochastic processes in cells, Mathematica itself does not provide much of a support for any sophisticated description of biological mechanisms. Implications of concepts from The New Kind of Science book looked very promising - it’s a pity that we are not given tools to verify them ourselves.





My gallery of images

28 10 2007

Readers of this blog who rely on RSS feeds may have not noticed that I had put a separate page containing computer-generated images of various molecules - Molecular renderings. Any comments, suggestions, critique are always welcome.

From time to time I’ll post new images there - from time to time I need to remind myself that science is pretty too :).





Thoughts on CASP - Critical assessment of methods of protein structure prediction

10 10 2007

I’ve just read an introduction to the supplemental issue of the journal PROTEINS, dedicated to the most recent round of the CASP experiment. It describes the progress of the protein structure prediction over the last few CASP editions.

The list of advancements include:

  • improvement of the homology modelling: one of the issues in template-based modelling of protein structures was that a final model wasn’t closer to the real structure than a template; now we have statistically significant (although very small) improvement thanks to the multi-template based modelling
  • fully automated methods are much closer to human predictors than ever: many groups use models from servers as their starting point and usually they don’t improve them that much

I believe that this was possible thanks to the progress that has been made in the area of sequence homology searches. Finding similarity between two sequences well beyond any reasonable identity thresholds is now doable thanks to profile-to-profile comparison, meta-servers (joining predictions from many different methods) or recent hmm-to-hmm algorithms (comparison of Hidden Markov Models). If you can find a suitable template for your protein, the rest is then much easier, isn’t it?

There are of course fields that still need some work. One of these often stirs a lot of discussion: automated assessing of model similarity to the real structure. The current methods have proven their suitability, I definitely agree. However I hope that at some point the protein structure comparison software will refuse to superimpose eight- and ten-stranded beta-barrels or left- and right-handed coiled-coil with a message: “It doesn’t make sense.”

CASP 7 logo





Manual sequence analysis - some common mistakes

25 09 2007

This is a topic I probably will come back to on many occasions. Publication with very wrong sequence analysis like the one Stephen Spiro pointed out on his blog is not an exception. I may agree that large scale analysis can stand quick and dirty treatment of protein sequence (and some error propagation at the same time). In large scale analysis nobody cares if the domain assignment is 100% right (it isn’t), if there are false positives (there are) or even if the material to begin with (protein sequences for example) is free of errors (it is not) - as long as the overall quality of the work is acceptable. However, this optimistic approach cannot be applied to the manual protein sequence analysis. Simply errors introduced in such cases are a way more important. How to avoid some of these errors? A few common mistakes that come to my mind are:

  • lack, not accurate or quick and dirty domain annotation: this probably is a topic for separate post, but in short - relying on a single method or strict E-value, excluding overlaps, ignoring internal repeats, forgetting about structural elements like transmembrane helices etc. lead to mistakes in domain annotation
  • running PSI-BLAST search on unclustered databases: the profile for many query sequences will get biased and diverge in a random direction if the PSI-BLAST runs on the unclustered database (remember 500 copies of the same protein in the results?); after all these years I still don’t get why NCBI does not provide nr90 (non-redundant db clustered at 90% identity threshold) for the PSI-BLAST
  • running PSI-BLAST without looking at the results of each run: if you don’t assess what goes in, you risk allowing some garbage
  • masking low-complexity, coiled-coils and transmembrane regions in BLAST search on every single occasion: while most of the times this is a valid approach, there are cases where the answer is revealed after turning the masking off
  • skipping other tools for sequence analysis like predictors of signal sequences, motifs, functional sites
  • skipping analysis of a genomic context: while not applicable to all systems, analysis of the genomic context may influence dramatically function prediction

It’s so far all I could think of. Do you have any other suggestions? Let me know.





On the scripting skills

19 08 2007

The interview with dr Alexei Drummond inspired an interesting discussion. While I agree that some level of training in programming would be very beneficial for the biologists, I think that there’s something more important people working at the bench should learn - using the tools for biological data analysis. The scripting skills are fine, they save often enormous amount of time, however not willing to learn how to do a BLAST search (or any other basic tool in the field) and interpret results, leads to publishing papers with errors (the best case) or with completely wrong conclusions (that is more often). I’m not talking about becoming an expert - this can take years, like in programming and this should be left to people spending the whole day doing data analysis (aka bioinformaticians). I’m talking about “scripting” equivalent of programming and this level is currently taught on bioinformatics undergraduate courses at most of the universities. Such training would save the world from papers comparing multiple sequence alignments from Clustal and… BLAST (if some readers do not know - BLAST at best can produce multiple pairwise alignment; it does not align all the sequences together).
These are my two cents. I hope to hear your opinion on that.





Why freelancing science?

7 08 2007

There are many definitions of bioinformatics. They range from “handling biological data with a computer” to a very extensive and precise descriptions, including even subdivisions. In general they agree on one thing: bioinformatics is used for virtually unlimited number of tasks. Whether it’s a sequence analysis, handling microarray data, juggling chemical reaction parameters - as long as it’s around living things, it’s considered a bio(chem)informatics.

I don’t see a need to invent yet another name for it. But freelancing science keeps coming to my mind all the time. Switching the system your are working on during a coffee break or doing something in your own way instead of following “the protocol” has its “freelancing” feeling, doesn’t it? :)