RSS

Monthly Archives: September 2007

Tenure dossier

Janet D. Stemwedel from Adventures in Ethics and Science publishes photographs of the three-ring binder containing her tenure dossier. She ends this post with the sentence: “I seem to recall that there are important aspects of life that you can’t cram into a three-hole punch.”

clipped from scienceblogs.com

The dossier itself more or less captures the teaching/scholarship/service categories impressed on us as faculty newbies. The faculty member preparing a dossier is handed a set of eight uniform dividers for a three-ring binder. Four of these impose the main structure on the materials the faculty member assembles, marking out sections dealing with teaching effectiveness, service to students and the university, scholarly or creative activity, and what amounts to service within or related to your field of scholarship.

Comments Off

Posted by Pawel Szczesny on September 27, 2007 in Career, Clipped

Manual sequence analysis – some common mistakes

25 Sep

This is a topic I probably will come back to on many occasions. Publication with very wrong sequence analysis like the one Stephen Spiro pointed out on his blog is not an exception. I may agree that large scale analysis can stand quick and dirty treatment of protein sequence (and some error propagation at the same time). In large scale analysis nobody cares if the domain assignment is 100% right (it isn’t), if there are false positives (there are) or even if the material to begin with (protein sequences for example) is free of errors (it is not) – as long as the overall quality of the work is acceptable. However, this optimistic approach cannot be applied to the manual protein sequence analysis. Simply errors introduced in such cases are a way more important. How to avoid some of these errors? A few common mistakes that come to my mind are:

lack, not accurate or quick and dirty domain annotation: this probably is a topic for separate post, but in short – relying on a single method or strict E-value, excluding overlaps, ignoring internal repeats, forgetting about structural elements like transmembrane helices etc. lead to mistakes in domain annotation
running PSI-BLAST search on unclustered databases: the profile for many query sequences will get biased and diverge in a random direction if the PSI-BLAST runs on the unclustered database (remember 500 copies of the same protein in the results?); after all these years I still don’t get why NCBI does not provide nr90 (non-redundant db clustered at 90% identity threshold) for the PSI-BLAST
running PSI-BLAST without looking at the results of each run: if you don’t assess what goes in, you risk allowing some garbage
masking low-complexity, coiled-coils and transmembrane regions in BLAST search on every single occasion: while most of the times this is a valid approach, there are cases where the answer is revealed after turning the masking off
skipping other tools for sequence analysis like predictors of signal sequences, motifs, functional sites
skipping analysis of a genomic context: while not applicable to all systems, analysis of the genomic context may influence dramatically function prediction

It’s so far all I could think of. Do you have any other suggestions? Let me know.

1 Comment

Posted by Pawel Szczesny on September 25, 2007 in Comments, Research, Research skills

Survey of domain bubbles in protein sequence analysis

24 Sep

One of the key step in the analysis of unknown protein sequence is identification of domains that constitute that protein. There are many online tools that will search for a presence of known domains or identify them ab inito. Usually only the former present results in a graphical way called “domain bubbles”. Below you can find examples of common approaches to presenting results of a sequence annotation. Since most of them use the same domain definitions, names of the hits are the same in almost all cases.

One note: it’s not a comparison of the servers’ performance. The sequence is the same in all cases, but that was to show the differences between visualization methods, not the quality of the annotation.

SMART

SMART domain bubbles

This is example of sequence annotation by the SMART server. Domains are colored according to their source (SMART has a collection of domain definitions from various different sources), and non-domain sequence features (like transmembrane segments, low-complexity, disorder) are clearly differentiated from domains. The picture is generated with GIMP and it’s Perl-Fu extension and the script is available for download from a homepage of Ivica Letunic.

PFAM

PDAM domain bubbles

Color schema by PFAM is quite clear – the same domains have the same colors. PFAM (as well as following two servers) shows in the picture partial hits – this is the case where similarity between the domain and the protein spans only fragment of the domain (that may indicate many things, like genomic rearrangements, frameshifts, weak domain definition, etc). But PFAM script can actually plot many other sequence features onto the picture. You can use the script with your own annotation data here – the input is coded as a xml file conforming PFAM’s schema.

CDD

CDD domain bubbles

CDD looks pretty similar to the PFAM and shares some visual features. However, CD-Search page shows in a graphical way more than one line of hits. Usually the first line contains the best hits for the particular fragment, and following lines show overlapping hits with worse score. Here is shown only the first line.

HHpred

HHpred domain bubbles

OK, I may be biased here, since the HHpred is coded by my former colleagues, but I really like the domain bubbles from this server. Color schema is different from any other servers: bubbles are colored according to the score, from red (the best) to blue (the worst). Also it shows partial and overlapping hits (here are shown only few, the actual results page spans few screens in my browser). Similar to CDD, HHpred does not plot any other sequence features than domains.

So here are the major domain annotation servers which present results of the prediction in a nice graphical way (there are many others, but not all of them are using this simple way of presenting data, just to mention InterPro). Are these, after all pretty similar, approaches exploring all possible ways of presenting domain structure of a protein? I don’t think so. Watch this site, I may have something to add pretty soon.

8 Comments

Posted by Pawel Szczesny on September 24, 2007 in Software, Visualization

Tags: domain bubbles, protein sequence analysis, sequence annotation, Visualization

Visualization software for molecular assemblies by Thomas Goddard and Thomas Ferrin

01 Sep

Recently, among articles “in press” from Current Opinion in Structural Biology I found a paper by Thomas Goddard and Thomas Ferrin about software for visualization of large molecular assemblies. Even if the focus of this paper is not preparation of publication quality pictures, software cited there sounds familiar: Chimera, Pymol, VMD, Qutemol. The authors mention also VISION, which is a visual programming environment capable of presenting molecular data. Molecular Graphics Lab of Scripps Research Institute that works on the VISION has some other interesting tools, including PMV – Python Molecular Viewer, which I hope to cover some other time.

Anyway, this paper actually reminded me of something. I did mention that new version of Chimera can produce input for Povray, but I did not realize that it’s not the only change in this version. After upgrading to the current version I found out that it has also several “presets” of settings suitable for on-screen viewing or producing the figure. That makes preparation of the figure much faster and if you still don’t like the results you get good starting points for some tweaking.

Test image from Chimera