RSS

Human genobiome and disease risk assesment

Schematic diagram of the life cycle of {{w|Esc...Image via Wikipedia

I’ve recently attended a talk on the advancements of human metagenomics projects. As the speaker admitted, the whole field is a researchers’ gold mine – almost all they find is new and interesting. There were couple of interesting points – mainly concerning how limited our knowledge about things in here is. For example, there was a unconfirmed feeling among microbiologists that in fact all modern microbiology is nothing more than biology of E. coli and relatives. Now we know that for sure – number of known to us microbial species is estimated at 0.5% of all existing microbial species. Also, I heard a nice story about polish doctor who described in 19th century Helicobacter pylori and its role in gastric diseases (there was a Nobel prize for that in 2005), wrote a book and then trashed the whole thing because he couldn’t grow the bacteria in a pure culture. Another important issue was amount of data and lack of new ways of handling them.

But the most interesting for me was a connection between human microbiome and diseases. Or rather a possibility of such connection. I am not aware of any single case when composition of human microbiome have been proven to influence chance of getting ill and I don’t think there will be a lots of such correlations found soon. My impression is that correlations are to be found when we have both, a complete human genome and a complete metagenome of all that lives on particular person – a human genobiome, as I’ve called it (BTW, word “genobiome” is not present in Google – is there a better word for that?). And I believe that getting the first full human genobiome will be the achievement compared to sequencing human genome for the first time. Not because of technical difficulties – because of the all discoveries that need to be made to make it happen. For example, human gut of all people carries a species doing some sulfur reaction – but  its population is only up to few thousands cells. How many such cases are we have in our organisms? That is very good question. The field is brand new, and possibilities of speculations are endless.

Zemanta Pixie
 
Comments Off on Human genobiome and disease risk assesment

Posted by on July 6, 2008 in bioinformatics, Research

 

Tags: , , ,

PhD thesis in LaTeX

For the record: here you can see a single (still unfinished) page of my PhD thesis prepared in LaTeX. I used PhD thesis style prepared by Jamie Stevens and wrote the whole thing using Kile editor. An image on the margin can be inserted with command:

\marginpar{
     \centering{
         \includegraphics[width=3cm]{image.pdf}
     }
     Caption text
}
 
6 Comments

Posted by on June 19, 2008 in bioinformatics, Comments

 

Tags: , , ,

Structure of usher pore is available

Structure of usher pore

Some time ago I posted breaking news about solved structure of usher pore. And few days ago it was deposited into PDB as 2VQI (publication appeared in Cell, here’s the abstract). The structure is a beatiful dimer (see above) of 24 stranded beta-barrel, the first of its kind. The paper contains also structures of the whole complex reconstructed based on cryo-EM data.

Interestingly, while the structure of the native dimer is symmetrical, the function of the units is not. Both of twinned pores are involved in alternating recruitment of chaperone:pili-subunit complexes, but only one actually transports pili subunits out. Overall, given large amount of detailed studies on the mechanistic properties of pili transport and formation, this is the best understood translocation process at a structural level.

Read the paper and draw your own conclusions, but for me it changes the way of thinking about protein translocation in bacteria. We learnt a lot on bacterial secretion by observing how similar proteins are involved in fundamentally different processes (for example DNA export and toxin secretion may use the same system). Similarly, usher pore is going to serve as an exemplar for newly found translocation elements.

 
Comments Off on Structure of usher pore is available

Posted by on May 31, 2008 in Papers, Proteins

 

Tags: , , ,

Surprises in biological databases – nr

If you wonder why clustering with cd-hit of a recent nr database from NCBI takes ages, here’s an answer:

>gi|10955428|ref|NP_053140.1| hypothetical protein pB171_078 [Escherichia coli]gi|16082681|ref|NP_395228.1|
 transposase/IS protein [Yersinia pestis CO92]gi|16082847|ref|NP_395401.1| transposase/IS protein [Yersinia
 pestis CO92]gi|16120383|ref|NP_403696.1| transposase/IS protein [Yersinia pestis CO92]gi|16120444|ref|NP_4
03757.1| transposase/IS protein [Yersinia pestis CO92]gi|16120514|ref|NP_403827.1| transposase/IS protein [
Yersinia pestis CO92]gi|16120586|ref|NP_403899.1| transposase/IS protein [Yersinia pestis CO92]gi|16120719|
ref|NP_404032.1| transposase/IS protein [Yersinia pestis CO92]gi|16120857|ref|NP_404170.1| transposase/IS p
rotein [Yersinia pestis CO92]gi|16120894|ref|NP_404207.1| transposase/IS protein [Yersinia pestis CO92]gi|1
6120962|ref|NP_404275.1| transposase/IS protein [Yersinia pestis CO92]gi|16121092|ref|NP_404405.1| transpos
ase/IS protein [Yersinia pestis CO92]gi|16121136|ref|NP_404449.1| transposase/IS protein [Yersinia pestis C
O92]gi|16121228|ref|NP_404541.1| transposase/IS protein [Yersinia pestis CO92]gi|16121314|ref|NP_404627.1|
transposase/IS protein [Yersinia pestis CO92]gi|16121385|ref|NP_404698.1| transposase/IS protein [Yersinia
pestis CO92]gi|16121430|ref|NP_404743.1| transposase/IS protein [Yersinia pestis CO92]gi|16121620|ref|NP_40
4933.1| transposase/IS protein [Yersinia pestis CO92]gi|16121706|ref|NP_405019.1| transposase/IS protein [Y
ersinia pestis CO92]gi|16121792|ref|NP_405105.1| transposase/IS protein [Yersinia pestis CO92]gi|16121890|r
ef|NP_405203.1| transposase/IS protein [Yersinia pestis CO92]gi|16121951|ref|NP_405264.1| transposase/IS pr
otein [Yersinia pestis CO92]gi|16121988|ref|NP_405301.1| transposase/IS protein [Yersinia pestis CO92]gi|16
122008|ref|NP_405321.1| transposase/IS protein [Yersinia pestis CO92]gi|16122148|ref|NP_405461.1| transposa
se/IS protein [Yersinia pestis CO92]gi|16122266|ref|NP_405579.1| transposase/IS protein [Yersinia pestis CO
92]gi|16122324|ref|NP_405637.1| transposase/IS protein [Yersinia pestis CO92]gi|16122408|ref|NP_405721.1| t
ransposase/IS protein [Yersinia pestis CO92]gi|16122588|ref|NP_405901.1| transposase/IS protein [Yersinia p
estis CO92]gi|16122620|ref|NP_405933.1| transposase/IS protein [Yersinia pestis CO92]gi|16122738|ref|NP_406
051.1| transposase/IS protein [Yersinia pestis CO92]gi|16122852|ref|NP_406165.1| transposase/IS protein [Ye
rsinia pestis CO92]gi|16122926|ref|NP_406239.1| transposase/IS protein [Yersinia pestis CO92]gi|16123007|re
f|NP_406320.1| transposase/IS protein [Yersinia pestis CO92]gi|16123118|ref|NP_406431.1| transposase/IS pro
tein [Yersinia pestis CO92]gi|16123368|ref|NP_406681.1| transposase/IS protein [Yersinia pestis CO92]gi|161
23410|ref|NP_406723.1| transposase/IS protein [Yersinia pestis CO92]gi|16123439|ref|NP_406752.1| transposas
e/IS protein [Yersinia pestis CO92]gi|16123584|ref|NP_406897.1| transposase/IS protein [Yersinia pestis CO9
2]gi|16123688|ref|NP_407001.1| transposase/IS protein [Yersinia pestis CO92]gi|16123734|ref|NP_407047.1| tr
ansposase/IS protein [Yersinia pestis CO92]gi|16123839|ref|NP_407152.1| transposase/IS protein [Yersinia pe
stis CO92]gi|16123892|ref|NP_407205.1| transposase/IS protein [Yersinia pestis CO92]gi|16123908|ref|NP_4072
21.1| transposase/IS protein [Yersinia pestis CO92]gi|16124133|ref|NP_407446.1| transposase/IS protein [Yer
sinia pestis CO92]gi|22123963|ref|NP_667386.1| transposase/IS protein [Yersinia pestis KIM]gi|22124031|ref|
NP_667454.1| transposase/IS protein [Yersinia pestis KIM]gi|22124203|ref|NP_667626.1| transposase/IS protei
n [Yersinia pestis KIM]gi|22124372|ref|NP_667795.1| transposase/IS protein [Yersinia pestis KIM]gi|22124391
|ref|NP_667814.1| transposase/IS protein [Yersinia pestis KIM]gi|22124420|ref|NP_667843.1| transposase/IS p
rotein [Yersinia pestis KIM]gi|22124556|ref|NP_667979.1| transposase/IS protein [Yersinia pestis KIM]gi|221
24665|ref|NP_668088.1| transposase/IS protein [Yersinia pestis KIM]gi|22124814|ref|NP_668237.1| transposase
/IS protein [Yersinia pestis KIM]gi|22124844|ref|NP_668267.1| transposase/IS protein [Yersinia pestis KIM]g
i|22124913|ref|NP_668336.1| transposase/IS protein [Yersinia pestis KIM]gi|22125025|ref|NP_668448.1| transp
osase/IS protein [Yersinia pestis KIM]gi|22125118|ref|NP_668541.1| transposase/IS protein [Yersinia pestis
KIM]gi|22125219|ref|NP_668642.1| transposase/IS protein [Yersinia pestis KIM]gi|22125447|ref|NP_668870.1| t
ransposase/IS protein [Yersinia pestis KIM]gi|22125565|ref|NP_668988.1| transposase/IS protein [Yersinia pe
stis KIM]gi|22125833|ref|NP_669256.1| transposase/IS protein [Yersinia pestis KIM]gi|22125913|ref|NP_669336
.1| transposase/IS protein [Yersinia pestis KIM]gi|22126032|ref|NP_669455.1| transposase/IS protein [Yersin
ia pestis KIM]gi|22126111|ref|NP_669534.1| transposase/IS protein [Yersinia pestis KIM]gi|22126227|ref|NP_6
69650.1| transposase/IS protein [Yersinia pestis KIM]gi|22126294|ref|NP_669717.1| transposase/IS protein [Y
ersinia pestis KIM]gi|22126458|ref|NP_669881.1| transposase/IS protein [Yersinia pestis KIM]gi|22126621|ref
|NP_670044.1| transposase/IS protein [Yersinia pestis KIM]gi|22126672|ref|NP_670095.1| transposase/IS prote
in [Yersinia pestis KIM]gi|22126967|ref|NP_670390.1| transposase/IS protein [Yersinia pestis KIM]gi|2212702
6|ref|NP_670449.1| transposase/IS protein [Yersinia pestis KIM]gi|22127088|ref|NP_670511.1| transposase/IS
protein [Yersinia pestis KIM]gi|22127284|ref|NP_670707.1| transposase/IS protein [Yersinia pestis KIM]gi|22
127489|ref|NP_670912.1| transposase/IS protein [Yersinia pestis KIM]gi|22127607|ref|NP_671030.1| transposas
e/IS protein [Yersinia pestis KIM]gi|22127670|ref|NP_671093.1| transposase/IS protein [Yersinia pestis KIM]
gi|22127690|ref|NP_671113.1| transposase/IS protein [Yersinia pestis KIM]gi|22127900|ref|NP_671323.1| trans
posase/IS protein [Yersinia pestis KIM]gi|31795384|ref|NP_857837.1| transposase/IS protein [Yersinia pestis
 KIM]gi|31795462|ref|NP_857912.1| transposase/IS protein [Yersinia pestis KIM]gi|32470047|ref|NP_862989.1|
putative ATP-binding protein [Escherichia coli]gi|45439896|ref|NP_991435.1| transposase/IS protein [Yersini
a pestis biovar Microtus str. 91001]gi|45439948|ref|NP_991487.1| transposase/IS protein [Yersinia pestis bi
ovar Microtus str. 91001]gi|45440109|ref|NP_991648.1| transposase/IS protein [Yersinia pestis biovar Microt
us str. 91001]gi|45440257|ref|NP_991796.1| transposase/IS protein [Yersinia pestis biovar Microtus str. 910
01]gi|45440297|ref|NP_991836.1| transposase/IS protein [Yersinia pestis biovar Microtus str. 91001]gi|45440
401|ref|NP_991940.1| transposase/IS protein [Yersinia pestis biovar Microtus str. 91001]

But to tell the honest true, this is not a problem – this is less than 10% of only one of many other problems. This particular protein (gi number: 10955428) has over three hundred other gi numbers in its header in non-redundant database from NCBI, which apparently made cd-hit stand still in amusement of such a lengthy description for weeks. Quick fix in Perl, and now the clustering is going to be finished within few hours, as it should.

 
2 Comments

Posted by on May 25, 2008 in bioinformatics

 

Tags: ,

Blogging overtaken by life streaming

I don’t post new things as often as I used to couple of months ago, but it’s not all my fault. FriendFeed and Google Reader (especially the newest feature of adding notes to shared things) create so much better space for rapid thoughts exchange than a blog, that I comment, link and share most of the things over there, and that includes even making scientific collaborations. This blog is going to loose a little of its dynamics, but already after few weeks I see advantages (like saving time) of moving micro-posts to World Wide Talk Show, as Robert Scoble calls FF.

Amount of interesting conversations at FF and Twitter combined is so huge that I don’t do random web browsing anymore (and I’m not the only one who says that). And I don’t even subscribe to thousands of people – it’s less than a hundred in total on both services. This list includes scientists (here’s probably already outdated list at Nature’s blog Nascent of scientist at FF), technologists and other interesting chaps.

So join us at Twitter or FriendFeed – my login at both services is “freesci”. Life is about interesting conversations, isn’t it? 🙂

UPDATE: Pierre Lindenbaum has obviously similar thoughts.

 
10 Comments

Posted by on May 15, 2008 in Comments

 

Tags: , , , ,

Joining ONS club – classification and prediction of bacteriocins

It’s finally the time to jump in into Open Notebook Science pool with my small project: classification and prediction of bacteriocins. Main page of this project is on Freelancing Science wiki: freelancingscience.wikispaces.com/bacteriocins. After reading recent post by Michael Barton on ONS , I’ve decided to stick only to wiki – I had already another blog set up for this project, but if blog doesn’t work very well for Michael, I doubt it will work for me. Since it’s completely side project, updates on the project blog on would be embarassingly rare. So far the wiki doesn’t contain much of a data, nothing more than a plan in fact. But I think it’s important to at least start somewhere.
Direct inspiration for the project was this post at Microbiology Blog. It describes results of some experiments on growth inhibition of bacteria by haloarcheal organisms, which could be in some cases explained by novel archeocins, peptide or protein antibiotics from Archea. After quick look I realised, that I see sequence similarity between seemingly non-related bacteriocins. That of course lead to a question if I am able to repeat the procedure from my PhD project – understand the protein family, and then write an annotation/prediction tool. I don’t expect outstanding results but at least this will be a good occasion to document my approach to protein sequence annotation. So if not scientific, it should have at least a little of educational value.

 
4 Comments

Posted by on May 3, 2008 in bioinformatics, Research

 

Tags: , , , ,

Bug tracking systems in science

I’m not going to describe painful process of correcting entries in biological databases or errors in publications when one is not the author – we all know how difficult and unrewarding it is. All major databases contain wrong entries – I see misannotated (or nonexistent) genes in Genbank, artificial domains in PFAM or poorly solved structures in PDB. It’s even worse in publications, where across the whole spectrum of journals I see errors which in theory shouldn’t slip through peer review (this includes such prominent publishers like NPG).

One of the best idea I heard that addressed this issue was to build a bug tracking system (I would like to give credit to the author, but I cannot find the source; wasn’t that one of biobloggers?). It’s simple and efficient. Something is wrong? Fill a bug report. It would be linking to the original entry, would be available for aggregation (for example to track report’s author activity), and possibly could be closed by somebody else than database maintainers or authors if it’s wrong. Because it would be external to all databases, maybe it could grow to provide “community corrected” versions of these databases?

What do you think? How useful such system could be?

 
10 Comments

Posted by on April 18, 2008 in Comments, Community, Software

 

Tags: , , ,

Domain annotation in trimeric autotransporter adhesins

First major outcome of my PhD project has just appeared in the Bioinformatics (open access). It describes a system we have design to annotate specific group of bacterial proteins.

Trimeric autotransporter adhesins (TAAs) form one of the many families of bacterial surface proteins. In medically relevant species they adhere to host cells (in non-pathogenic species we don’t know what they adhere to), therefore they are considered essential virulence factors. They are autotransporters, which means that they are passing the outer membrane by themselves – C-terminal part makes a pore through which the rest of the protein goes out. In contrary to many other autotransporters, exported part is not cut but stays attached to the membrane by the C-terminal autotransport domain. TAAs are also trimeric – the pore is made of three subunits and the exported fiber is also a trimer. The last feature is pretty unique – so far it’s the only family of bacterial surface proteins which forms fibrous trimers. Interestingly, they differ in length between few hundred and five thousands residues.

What’s so special about these proteins for bioinformatician? Structure of the fiber is not homogenous – it is a mixture of globular domains and coiled-coils. On a sequence level, they have lots of internal repeats (see the picture), heavily biased residue composition, their domain composition and architecture varies by protein. The only conserved part in all TAAs in the autotransport domain. Systems designed to identify and annotate typical protein domains (such as PFAM) don’t handle them very well – average coverage of PFAM annotation of TAAs is about 30%. The server we have built relies on the fact that domains of TAAs are exclusive for this family (they do not appear anywhere else because its unique structural constrains). Therefore we could use different thresholds, manually curated alignments and domain-context derived rules to improve the annotation.

Manual analysis of TAAs sequences is pretty tedious (well, it was, now we have the server), but on the other hand I have learnt a lot about how to read a protein sequence. I mean really read and understand how particular combination of letters influences its structure.

 
4 Comments

Posted by on April 10, 2008 in bioinformatics

 

Freelancing science – today and tomorrow

In response to recent Neil’s comment and questions that repeat in emails, I’ve decided to describe in little more detail my status as a freelancing scientist. However keep in mind that I have no idea about such arrangement outside of Poland, so it is likely that some things may look different in other countries.

First of all, I need to explain my unemployment: I have a academic affiliation, but I’m not formally employed and I don’t get a salary, but I do get non-financial support and I am able to apply for grants, access free software and journals the institute is subscribing. I was told that’s similar to a tenure in US – you get your office and lab space, but little or no salary. But the difference would be that instead of applying for an independent position, you just take it :).

My income comes from grants and subcontracting other people projects. As a bioinformatician, I don’t have huge needs, so grants I applied for were pretty cheap compared to grants for experimental biology. However, it can take as long as half a year to a year to get an initial cash flow – it’s all about the time between a call and awarding the grant. Many times your degree doesn’t matter when applying for a grant, especially if you are not a principal investigator in the application. I still do not have a PhD degree, and while I hope to get one sometime this year (finally), I’m not pushing this that much.

Instead of carefully listing all good and bad sides of my freelancing status (or explaining reasons why I did such move) I will try to answer a question which I also hear often, which is: where is this heading?

In my probably skewed view of science to do things which are very novel and very cool one needs to be or a recognized genius, or a big shot in particular field. Otherwise, it’s hard to get enough money to fund one’s completely crazy projects. I’m neither a genius nor a big shot but I have bunch of ideas I consider cool and which I’d like to get funded. It looks like for that I need to step out of academic money-flow system, and apply for funding to people who are less conservative and who can take a risk of supporting non-established ideas (Deepak, thank you for the inspiration). And that’s the plan: leave academic (and competitive) funding system and shift to an outcome oriented one, similar in essence to a startup. And instead of waiting 15 years to get recognition in academia, I hope to get my stuff running within the next few years.

One can argue that it’s risky and one could achieve similar outcome following traditional academic career path within a similar time. Probably that’s true – all of the things I’ve just written are not really supported by long term evidence. But on the other hand, even if the whole idea doesn’t make sense at all, compared to my colleagues, I am having much more fun…

 
23 Comments

Posted by on April 5, 2008 in Career

 

Tags: , , ,

BioBrick as a functional role

Genetics

When I initially saw The MIT Registry of Standard Biological Parts, I just fell in love with the idea. However, after closer inspection I realized that it’s not what I hoped to find. The Registry collects an interchangeable functional modules that can be assembled into novel biological systems. And it does it as good as it sounds, but to a certain extent. Pedro wrote some time ago about unavoidable complexity and potential issues with collected parts. I completely agree with his arguments but I have even more doubts about the Registry’s current approach.

First of all, my feeling is that DNA-centric view of life starts to limit us in understanding what is happening at a molecular level. It moved forward science a lot and it is still extremely useful, but with genetics we are not going to understand and avoid emergent properties of biological systems. DNA, RNA, proteins at a sequence and structure level are all interacting with each other. This properties are encoded in DNA, I agree. However, as Pedro pointed out, we have no way to predict what happens after transferring a part to other organism. It is possible to select for mutations that will render this part usable in the other organism, but I don’t think this approach would be of much use if we deal with organisms that are hard to grow (imagine you want to insert a specific system into extremophile organism). And what is more, it’s not necessarily practical if we transfer the part to an organism which already has a similar element encoded in the genome.

In my humble opinion, the Registry can be extended in two directions, transforming parts into a containers that have a specific functional role and include sub-gene elements, like domains or tectons. Let me describe both in more detail.

Currently a BioBrick is assigned a function and a sequence. I would rather see a functional role, that can be fulfilled by many different sequences. For example, if we have an enzymatic function the BioBrick would include not only single DNA sequence from a single organism, but also a protein sequence, domains, sequence motifs and a structure (whatever is available), and all these should be available for all organisms for which we can assign reliably this information. To clarify, I’m far from populating the Registry with BLAST results. I would rather have it done manually, or at least in the way The SEED allows experts to create subsystems and assign a functional roles to proteins. In this way we could just take a gene from a target organism instead of mutating the original one. Having a container would mean that we could include there different flavors of the same gene (for example, after optimization).

For the second thing, I’m a big fan of creating novel functions out of existing elements. That’s a reason why I believe the Registry should include building blocks of proteins as well as other fancy things, like riboswitches. One of the obvious example would be a signal transduction element, where one can attach different receptor domains to the same membrane component. This has been done already thousands of times – why not to standardize it?

Maybe with these two changes maybe we could finally connects some dots and make a complexity of biological systems more understandable or at least traceable. Future directions of the Registry are not very well defined, so I believe there’s a space for at least discussion about such ideas.

 

Tags: , ,