RSS

Monthly Archives: April 2008

Bug tracking systems in science

I’m not going to describe painful process of correcting entries in biological databases or errors in publications when one is not the author – we all know how difficult and unrewarding it is. All major databases contain wrong entries – I see misannotated (or nonexistent) genes in Genbank, artificial domains in PFAM or poorly solved structures in PDB. It’s even worse in publications, where across the whole spectrum of journals I see errors which in theory shouldn’t slip through peer review (this includes such prominent publishers like NPG).

One of the best idea I heard that addressed this issue was to build a bug tracking system (I would like to give credit to the author, but I cannot find the source; wasn’t that one of biobloggers?). It’s simple and efficient. Something is wrong? Fill a bug report. It would be linking to the original entry, would be available for aggregation (for example to track report’s author activity), and possibly could be closed by somebody else than database maintainers or authors if it’s wrong. Because it would be external to all databases, maybe it could grow to provide “community corrected” versions of these databases?

What do you think? How useful such system could be?

 
10 Comments

Posted by on April 18, 2008 in Comments, Community, Software

 

Tags: , , ,

Domain annotation in trimeric autotransporter adhesins

First major outcome of my PhD project has just appeared in the Bioinformatics (open access). It describes a system we have design to annotate specific group of bacterial proteins.

Trimeric autotransporter adhesins (TAAs) form one of the many families of bacterial surface proteins. In medically relevant species they adhere to host cells (in non-pathogenic species we don’t know what they adhere to), therefore they are considered essential virulence factors. They are autotransporters, which means that they are passing the outer membrane by themselves – C-terminal part makes a pore through which the rest of the protein goes out. In contrary to many other autotransporters, exported part is not cut but stays attached to the membrane by the C-terminal autotransport domain. TAAs are also trimeric – the pore is made of three subunits and the exported fiber is also a trimer. The last feature is pretty unique – so far it’s the only family of bacterial surface proteins which forms fibrous trimers. Interestingly, they differ in length between few hundred and five thousands residues.

What’s so special about these proteins for bioinformatician? Structure of the fiber is not homogenous – it is a mixture of globular domains and coiled-coils. On a sequence level, they have lots of internal repeats (see the picture), heavily biased residue composition, their domain composition and architecture varies by protein. The only conserved part in all TAAs in the autotransport domain. Systems designed to identify and annotate typical protein domains (such as PFAM) don’t handle them very well – average coverage of PFAM annotation of TAAs is about 30%. The server we have built relies on the fact that domains of TAAs are exclusive for this family (they do not appear anywhere else because its unique structural constrains). Therefore we could use different thresholds, manually curated alignments and domain-context derived rules to improve the annotation.

Manual analysis of TAAs sequences is pretty tedious (well, it was, now we have the server), but on the other hand I have learnt a lot about how to read a protein sequence. I mean really read and understand how particular combination of letters influences its structure.

 
4 Comments

Posted by on April 10, 2008 in bioinformatics

 

Freelancing science – today and tomorrow

In response to recent Neil’s comment and questions that repeat in emails, I’ve decided to describe in little more detail my status as a freelancing scientist. However keep in mind that I have no idea about such arrangement outside of Poland, so it is likely that some things may look different in other countries.

First of all, I need to explain my unemployment: I have a academic affiliation, but I’m not formally employed and I don’t get a salary, but I do get non-financial support and I am able to apply for grants, access free software and journals the institute is subscribing. I was told that’s similar to a tenure in US – you get your office and lab space, but little or no salary. But the difference would be that instead of applying for an independent position, you just take it :).

My income comes from grants and subcontracting other people projects. As a bioinformatician, I don’t have huge needs, so grants I applied for were pretty cheap compared to grants for experimental biology. However, it can take as long as half a year to a year to get an initial cash flow – it’s all about the time between a call and awarding the grant. Many times your degree doesn’t matter when applying for a grant, especially if you are not a principal investigator in the application. I still do not have a PhD degree, and while I hope to get one sometime this year (finally), I’m not pushing this that much.

Instead of carefully listing all good and bad sides of my freelancing status (or explaining reasons why I did such move) I will try to answer a question which I also hear often, which is: where is this heading?

In my probably skewed view of science to do things which are very novel and very cool one needs to be or a recognized genius, or a big shot in particular field. Otherwise, it’s hard to get enough money to fund one’s completely crazy projects. I’m neither a genius nor a big shot but I have bunch of ideas I consider cool and which I’d like to get funded. It looks like for that I need to step out of academic money-flow system, and apply for funding to people who are less conservative and who can take a risk of supporting non-established ideas (Deepak, thank you for the inspiration). And that’s the plan: leave academic (and competitive) funding system and shift to an outcome oriented one, similar in essence to a startup. And instead of waiting 15 years to get recognition in academia, I hope to get my stuff running within the next few years.

One can argue that it’s risky and one could achieve similar outcome following traditional academic career path within a similar time. Probably that’s true – all of the things I’ve just written are not really supported by long term evidence. But on the other hand, even if the whole idea doesn’t make sense at all, compared to my colleagues, I am having much more fun…

 
23 Comments

Posted by on April 5, 2008 in Career

 

Tags: , , ,

BioBrick as a functional role

Genetics

When I initially saw The MIT Registry of Standard Biological Parts, I just fell in love with the idea. However, after closer inspection I realized that it’s not what I hoped to find. The Registry collects an interchangeable functional modules that can be assembled into novel biological systems. And it does it as good as it sounds, but to a certain extent. Pedro wrote some time ago about unavoidable complexity and potential issues with collected parts. I completely agree with his arguments but I have even more doubts about the Registry’s current approach.

First of all, my feeling is that DNA-centric view of life starts to limit us in understanding what is happening at a molecular level. It moved forward science a lot and it is still extremely useful, but with genetics we are not going to understand and avoid emergent properties of biological systems. DNA, RNA, proteins at a sequence and structure level are all interacting with each other. This properties are encoded in DNA, I agree. However, as Pedro pointed out, we have no way to predict what happens after transferring a part to other organism. It is possible to select for mutations that will render this part usable in the other organism, but I don’t think this approach would be of much use if we deal with organisms that are hard to grow (imagine you want to insert a specific system into extremophile organism). And what is more, it’s not necessarily practical if we transfer the part to an organism which already has a similar element encoded in the genome.

In my humble opinion, the Registry can be extended in two directions, transforming parts into a containers that have a specific functional role and include sub-gene elements, like domains or tectons. Let me describe both in more detail.

Currently a BioBrick is assigned a function and a sequence. I would rather see a functional role, that can be fulfilled by many different sequences. For example, if we have an enzymatic function the BioBrick would include not only single DNA sequence from a single organism, but also a protein sequence, domains, sequence motifs and a structure (whatever is available), and all these should be available for all organisms for which we can assign reliably this information. To clarify, I’m far from populating the Registry with BLAST results. I would rather have it done manually, or at least in the way The SEED allows experts to create subsystems and assign a functional roles to proteins. In this way we could just take a gene from a target organism instead of mutating the original one. Having a container would mean that we could include there different flavors of the same gene (for example, after optimization).

For the second thing, I’m a big fan of creating novel functions out of existing elements. That’s a reason why I believe the Registry should include building blocks of proteins as well as other fancy things, like riboswitches. One of the obvious example would be a signal transduction element, where one can attach different receptor domains to the same membrane component. This has been done already thousands of times – why not to standardize it?

Maybe with these two changes maybe we could finally connects some dots and make a complexity of biological systems more understandable or at least traceable. Future directions of the Registry are not very well defined, so I believe there’s a space for at least discussion about such ideas.

 

Tags: , ,