Freelancing science

Bio::Blogs #16 – call for submissions

I got the privilege of hosting the next edition of Bio::Blogs. If you have anything you would like to have included please send an email to szczesny dot pawel at gmail dot com or to bioblogs at gmail dot com before 1st of November.

2 Comments

Posted by Pawel Szczesny on October 20, 2007 in bioinformatics, Community

Tags: Bio::Blog, bioblogs, bioinformatics, Community

Genome Commons – knowledgebase of human genetic variation

17 Oct

The title says it all – have a look at Steve Brenner’s commentary in Nature (looks like its freely accessible) and the Genome Commons web page.

clipped from genomecommons.berkeley.edu

The Genome Commons and Genome Commons Navigators are open resources I propose to assist with personal genome interpretation. A commentary describing these has been published in Nature, and additional versions of those musings and more details may be found on the about page of this site.

Thank you very much for your interest. Please explore the site and offer your thoughts. The background page offers some historical context for the Genome Commons idea. More valuable context is given by the resources page, which summarizes some existing resources for personal genome interpretation, with links to much larger lists of resources. The blog will have updates and discussions.

1 Comment

Posted by Pawel Szczesny on October 17, 2007 in Clipped, Research

Tags: genome, open-science

Blender in visualization of molecules

17 Oct

Yes, you can use Blender to prepare figures for your next paper and the results for sure will look different than the ones obtained with a standard software (hemoglobin [1HBG] as example below)… But given amount of work and really steep learning curve (at least for somebody who tries that for the very first time), I would not recommend Blender that much… 🙂

Hemoglobin

UPDATE: if you look for a way to import a PDB file into Blender, some instructions are at the bottom of this page.

10 Comments

Posted by Pawel Szczesny on October 17, 2007 in Software, Visualization

Tags: blender, hemoglobin, rendering, Visualization

Thoughts on CASP – Critical assessment of methods of protein structure prediction

10 Oct

I’ve just read an introduction to the supplemental issue of the journal PROTEINS, dedicated to the most recent round of the CASP experiment. It describes the progress of the protein structure prediction over the last few CASP editions.

The list of advancements include:

improvement of the homology modelling: one of the issues in template-based modelling of protein structures was that a final model wasn’t closer to the real structure than a template; now we have statistically significant (although very small) improvement thanks to the multi-template based modelling
fully automated methods are much closer to human predictors than ever: many groups use models from servers as their starting point and usually they don’t improve them that much

I believe that this was possible thanks to the progress that has been made in the area of sequence homology searches. Finding similarity between two sequences well beyond any reasonable identity thresholds is now doable thanks to profile-to-profile comparison, meta-servers (joining predictions from many different methods) or recent hmm-to-hmm algorithms (comparison of Hidden Markov Models). If you can find a suitable template for your protein, the rest is then much easier, isn’t it?

There are of course fields that still need some work. One of these often stirs a lot of discussion: automated assessing of model similarity to the real structure. The current methods have proven their suitability, I definitely agree. However I hope that at some point the protein structure comparison software will refuse to superimpose eight- and ten-stranded beta-barrels or left- and right-handed coiled-coil with a message: “It doesn’t make sense.”

CASP 7 logo

Comments Off

Posted by Pawel Szczesny on October 10, 2007 in Comments, Papers, Research, Structure prediction

Tags: bioinformatics, casp, Proteins, Research, Structure prediction

Type VII secretion system

09 Oct

Yet another secretion system was described, this time from Gram-positive bacteria (types I to VI were from Gram-negative). I expect that the further microbiology will go from E. coli, the more secretion systems will be found. Within the large spectrum of bacterial species we still know very little on bacteria outside proteobacterial group.

This is from Nature Reviews Microbiology, and subscription may be required.

clipped from www.nature.com

Recent evidence shows that mycobacteria have developed novel and specialized secretion systems for the transport of extracellular proteins across their hydrophobic, and highly impermeable, cell wall. Strikingly, mycobacterial genomes encode up to five of these transport systems. Two of these systems, ESX-1 and ESX-5, are involved in virulence — they both affect the cell-to-cell migration of pathogenic mycobacteria. Here, we discuss this novel secretion pathway and consider variants that are present in various Gram-positive bacteria. Given the unique composition of this secretion system, and its general importance, we propose that, in line with the accepted nomenclature, it should be called type VII secretion.

Comments Off

Posted by Pawel Szczesny on October 9, 2007 in Clipped, Proteins, Secretion system

Tags: microbiology, protein, Secretion system

iBioSeminars

01 Oct

Another example of web-based educational site: iBioSeminars was launched by The American Society for Cell Biology and contains seminars on medicine, cell biology and biological mechanisms. All available for download in QuickTime, mp4, iPodVideo or Powerpoint formats. Via ScienceRoll.

iBioSeminars is a freely available library of seminars from outstanding scientists. Our mission is to host lectures that describe on-going research in leading laboratories (they are not basic, survey-style lectures as might be found in undergraduate or graduate student biology courses). However, iBioSeminars features a more extensive introduction into the subject matter than a typical 50 min university seminar. Thus, these lectures are intended to be more accessible than many typical department seminars to advanced undergraduates/beginning graduate students and researchers outside of the specific field.

Comments Off

Posted by Pawel Szczesny on October 1, 2007 in Education

Healia and third party PubMed/Medline tools

01 Oct

David Rothman describes Healia, easy to use interface to the PubMed. But it’s just one of many third party PubMed/Medline tools David had described. Check out his posts related to the one about Healia.

clipped from davidrothman.net

Healia’s PubMed search (currently in beta) might be one of the best interfaces available for clinicians who don’t have the search skills to effectively search PubMed through its native interface.

Some notable features:

Automatic “AND”
By default, Healia inserts a boolean “AND” between all search terms (as Google does). While the expert searcher might find this unpleasantly limiting, it is a familiar behavior for many clinical searchers who view Google as their ideal, preferred search interface.

Comments Off

Posted by Pawel Szczesny on October 1, 2007 in Clipped, PubMed, Software

Tenure dossier

27 Sep

Janet D. Stemwedel from Adventures in Ethics and Science publishes photographs of the three-ring binder containing her tenure dossier. She ends this post with the sentence: “I seem to recall that there are important aspects of life that you can’t cram into a three-hole punch.”

clipped from scienceblogs.com

The dossier itself more or less captures the teaching/scholarship/service categories impressed on us as faculty newbies. The faculty member preparing a dossier is handed a set of eight uniform dividers for a three-ring binder. Four of these impose the main structure on the materials the faculty member assembles, marking out sections dealing with teaching effectiveness, service to students and the university, scholarly or creative activity, and what amounts to service within or related to your field of scholarship.

Comments Off

Posted by Pawel Szczesny on September 27, 2007 in Career, Clipped

Manual sequence analysis – some common mistakes

25 Sep

This is a topic I probably will come back to on many occasions. Publication with very wrong sequence analysis like the one Stephen Spiro pointed out on his blog is not an exception. I may agree that large scale analysis can stand quick and dirty treatment of protein sequence (and some error propagation at the same time). In large scale analysis nobody cares if the domain assignment is 100% right (it isn’t), if there are false positives (there are) or even if the material to begin with (protein sequences for example) is free of errors (it is not) – as long as the overall quality of the work is acceptable. However, this optimistic approach cannot be applied to the manual protein sequence analysis. Simply errors introduced in such cases are a way more important. How to avoid some of these errors? A few common mistakes that come to my mind are:

lack, not accurate or quick and dirty domain annotation: this probably is a topic for separate post, but in short – relying on a single method or strict E-value, excluding overlaps, ignoring internal repeats, forgetting about structural elements like transmembrane helices etc. lead to mistakes in domain annotation
running PSI-BLAST search on unclustered databases: the profile for many query sequences will get biased and diverge in a random direction if the PSI-BLAST runs on the unclustered database (remember 500 copies of the same protein in the results?); after all these years I still don’t get why NCBI does not provide nr90 (non-redundant db clustered at 90% identity threshold) for the PSI-BLAST
running PSI-BLAST without looking at the results of each run: if you don’t assess what goes in, you risk allowing some garbage
masking low-complexity, coiled-coils and transmembrane regions in BLAST search on every single occasion: while most of the times this is a valid approach, there are cases where the answer is revealed after turning the masking off
skipping other tools for sequence analysis like predictors of signal sequences, motifs, functional sites
skipping analysis of a genomic context: while not applicable to all systems, analysis of the genomic context may influence dramatically function prediction

It’s so far all I could think of. Do you have any other suggestions? Let me know.

1 Comment

Posted by Pawel Szczesny on September 25, 2007 in Comments, Research, Research skills

Survey of domain bubbles in protein sequence analysis

24 Sep

One of the key step in the analysis of unknown protein sequence is identification of domains that constitute that protein. There are many online tools that will search for a presence of known domains or identify them ab inito. Usually only the former present results in a graphical way called “domain bubbles”. Below you can find examples of common approaches to presenting results of a sequence annotation. Since most of them use the same domain definitions, names of the hits are the same in almost all cases.

One note: it’s not a comparison of the servers’ performance. The sequence is the same in all cases, but that was to show the differences between visualization methods, not the quality of the annotation.

SMART

SMART domain bubbles

This is example of sequence annotation by the SMART server. Domains are colored according to their source (SMART has a collection of domain definitions from various different sources), and non-domain sequence features (like transmembrane segments, low-complexity, disorder) are clearly differentiated from domains. The picture is generated with GIMP and it’s Perl-Fu extension and the script is available for download from a homepage of Ivica Letunic.

PFAM

PDAM domain bubbles

Color schema by PFAM is quite clear – the same domains have the same colors. PFAM (as well as following two servers) shows in the picture partial hits – this is the case where similarity between the domain and the protein spans only fragment of the domain (that may indicate many things, like genomic rearrangements, frameshifts, weak domain definition, etc). But PFAM script can actually plot many other sequence features onto the picture. You can use the script with your own annotation data here – the input is coded as a xml file conforming PFAM’s schema.

CDD

CDD domain bubbles

CDD looks pretty similar to the PFAM and shares some visual features. However, CD-Search page shows in a graphical way more than one line of hits. Usually the first line contains the best hits for the particular fragment, and following lines show overlapping hits with worse score. Here is shown only the first line.

HHpred

HHpred domain bubbles

OK, I may be biased here, since the HHpred is coded by my former colleagues, but I really like the domain bubbles from this server. Color schema is different from any other servers: bubbles are colored according to the score, from red (the best) to blue (the worst). Also it shows partial and overlapping hits (here are shown only few, the actual results page spans few screens in my browser). Similar to CDD, HHpred does not plot any other sequence features than domains.

So here are the major domain annotation servers which present results of the prediction in a nice graphical way (there are many others, but not all of them are using this simple way of presenting data, just to mention InterPro). Are these, after all pretty similar, approaches exploring all possible ways of presenting domain structure of a protein? I don’t think so. Watch this site, I may have something to add pretty soon.

8 Comments

Posted by Pawel Szczesny on September 24, 2007 in Software, Visualization

Tags: domain bubbles, protein sequence analysis, sequence annotation, Visualization