RSS

Category Archives: Research

Transitions, transitions

Quite a few things happened while I was away. If you’re interested, here’s not so short summary of my internet hiatus:

Research area

I think I’m done with bioinformatics. My current research area seems to be located somewhere between systems biology, theoretical biology and information/complex systems theory. I hope to build on Dawkins work, deal with emergence in biology and study subtle effects in biological systems. While I’m not sure if I will have anything interesting to show ever, I don’t have energy to do yet another project which involves programming/web interfaces/dealing with data/annotations/modelling etc. I’m done with analytics, time for synthesis :).

Carrer

Last year I wrote a post dreaming about small non-profit contract research organisation. This model of Research-as-a-Service has materialized in a virtual research institute which we have finally launched few days ago (materialized in something virtual, sign of times? 😉 ). The setup is quite simple – the institute gets a project (or applies for such) and then it searches for researchers/institutions/freelancers which are willing to subcontract parts of the project. We have outsourced not only research part, even money gathering (writing grants, etc.) is done by external company. The setup is quite flexible and pretty transparent – for example, we may represent somebody’s rights, but no intellectual property is owned by the institute. Why such institution? We become a single point of contact for a large and diverse group of scientists, which are willing to do some research for real money but don’t have time and energy to hunt for gigs by themselves. While I have an academic job, I’m in the middle of transition from being a freelancer, to being a jobs provider for freelance scientists. More on that in some other post.

Open science

I plan to spend way more time on advocating open science (all of its flavors), but… in Polish. This step is out of large frustration that even prominent figures in Polish science have no idea about changes in the science internet-aware researchers are watching and creating. Knowledge about even basic things like Open Access is dramatically low in Poland (a number of people here equals OA with low quality publications which have not been peer-reviewed). With few friends, we have a number of projects in the pipeline (for example, we hope to launch a nation-wide, created by professionals  promotional campaign – bilboards, TV commercials etc. – for open science). If any of these actually works, I will let you know if we have any measureable success 😉 .

Labels, labels

Robert Anton Wilson tells a nice story in his book Prometheus Rising:

William James, father of American psychology, tells of meeting an old lady who told him the Earth rested on the back of a huge
turtle.

“But, my dear lady,” Professor James asked, as politely aspossible, “what holds up the turtle?”
“Ah,” she said, “that’s easy. He is standing on the back of another turtle.”
“Oh, I see,” said Professor James, still being polite. “But would you be so good as to tell me what holds up the second turtle?”
“It’s no use, Professor,” said the old lady, realizing he was trying to lead her into a logical trap. “It’s turtles-turtles-turtles, all the way!”

Another story is a comment from my advisor about putting my real research plans in some proposal (he supports these plans):

The most likely a reaction from reviewers will be something like this: “Nice start, some decent papers, PhD looks good. And then he got crazy.”

I feel like screaming “Labels, labels, labels, all the way!” when facing stiff schemas of what scientists “is” or what artists “is” etc. It’s a hard task by itself to integrate multiple passions and multiple interests into a coherent structure. I don’t need another set of issues because of labels people attach to seemingly creative professions. But limiting myself only to topics consistent with the image of an online scientist became even more frustrating. Therefore expect that this blog (or any other venue I choose to express myself) is going to become a lot more diverse in topics and form.

 
1 Comment

Posted by on October 28, 2009 in Comments, Research, Science and Art

 

HMMER3 testing notes – my skills are (finally) becoming obsolete

Hidden Markov Model with Output
Image via Wikipedia

It’s already quite a while since I’ve started to extensively test performance of HMMER3. As many other people noticed before, speed of the search has improved dramatically – I’m really impressed how fast it is. However, it’s only part of the story. The smaller part actually.

As some of readers may know, most of my projects so far were revolving around protein sequence analysis and sequence-structure relationships. Mainly I was doing analysis of sequences that had no clear similarity to anything known, without functional annotation. Usual task was to run sequence comparison software and look at the end of the hit list, trying to make sense from hits beyond any reasonable E-value thresholds (for example I often run BLAST at E-value of 100 or 1000). I use very limited number of tools, because it takes quite a while to understand on which specific patterns a particular software fails.

The high-end tool I use most often is HHpred – HMM-HMM comparison software. It’s slow but very sensitive – my personal benchmarks show that it is able to identify very subtle patterns in sequence formed slightly above level of similar secondary structures (in other words, from the set of equally dissimilar sequences with identical secondary structure order, it correctly identifies the ones with similar tertiary structure).

The most surprising thing about HMMER3 is that in my personal benchmarks it’s almost as sensitive as HHpred. I wasn’t expecting that HMM-sequence comparison can be as good as HMM-HMM.  This observation suggests that there’s still a room for improvement for the latter approach, however it has already big implications.

PFAM will soon migrate to HMMER3 (the PFAM team is now resolving overlaps between families that arose due to increased sensitivity) and the moment it is be available, it will make a huge number of publications obsolete, or simply wrong. There are thousands of articles that discuss in detail evolutionary history of some particular domain (many of these will become obsolete) or draw some conclusions from the observation that some domain is not present in analyzed sequence/system (many of these will need to be revised). It will also make my skills quite obsolete, but that is always to be expected, no matter in what branch of science one is working. I also imagine that systems biology people will be very happy to have much better functional annotation of proteins.

I don’t want to call development of HMMER3 a revolution, but it will definitely have similar impact on biology as BLAST and HMMER2 had. Not only because of its speed, but also because it will create a picture of similarities between all proteins comparable to the picture state-of-the-art methods could only calculate for their small subset.

Reblog this post [with Zemanta]
 
3 Comments

Posted by on April 22, 2009 in bioinformatics, Research, Software

 

Tags: , , , ,

The Future of (Life) Scientists

This post is directly inspired by excellent essay by Michael Nielsen entitiled “The Future of Science“. While Michael writes about science itself (and how openness will be playing big role in scientific process) I wanted to write few words about how and where I see scientists in a near future (or rather how the research will be done – I’m not even touching the broad topic of alternative careers for scientists). While it sounds like a complementary essay to Michael’s work, I wouldn’t dare to call it so – think of it as a collection of loose notes gathered over months of learning from online science community. Also, please keep in mind that it’s written by a biologist and as such biased towards life sciences.


It’s no news that academic environment has changed so much that a joy of research spans only small fraction of day-to-day scientists’ life. “Publish or perish“, bureaucracy, money hunting, lack of tenure track positions, impact factor, ever-postdoc are only few of many issues within academic system. There’s quite a lot of interesting initiatives that aim at improving the system and some of them will certainly succeed by solving directly some of the issues above or more likely, by creating a niche within academia in which these issues will not apply. However, I think in the long run academia is not going to be the main environment where the research is being done and more importantly, there will be infinite gradation of research jobs, allowing people from many different fields with many different skills to contribute to scientific projects.

That said, I also believe that amount of data and knowledge produced will lead to enormous specialization of scientists. This does not contradict the previous statement: I don’t think that some teenager will design and develop in his spare time a new molecular dynamics algorithm, but finding new genetic associations or inventing another way to modify bacterial genome so it has better biodegradation features sounds to me like a reachable project for many people. Specialization will be one of many factors influencing creation of new types of scientists. And what are these types? Let me describe a few.

Mind, brain, intelligence amplification – future Nobel Prize winners

This category emerged pretty recently, after reading Deepak’s post on uniqueness (or lack of it) of someone’s contribution to science. I always had this notion that no matter what I did, it would be done in a near future by someone else, but this time I could put it into words: science is like sports –  winner takes it all and there’s always a winner. Because prestige of an institution or fame of a scientist plays a big role in getting one’s research funded, competition for money will lead to development of procedures that will aim at producing Nobel Prize winners (or equivalents) analogous to sports training programs.

1933 Nobel Peace Prize awarded to Norman Angel...
Image via Wikipedia

Techniques like neurolinguistic programming, biofeedback or binaural bits (just to name a few) are surrounded by such a hype, that it’s hard to believe they are worth something. However I think there’s a solid field emerging from these inventions that aims at dealing with issues we create in our lives. Have you heard that Google had opened School of Personal Growth as a part of the Google University, teaching things like mental development, emotional development, holistic health, well-being  and finally a Buddhist notion, “beyond the self”? I think it’s no mistake – it’s an attempt to help employees to consistently work at their optimal speed. And there’s story is published by Nature in April last year results of a poll of using brain-doping drugs among scientists. And there’s an inspiring talk by Juan Enriquez on arrival of Homo evolutis. I believe it’s just a matter of time big universities will launch (probably secretly) their own programs for training high profile scientists. And judging from the comments to the Nature’s poll I don’t think many people will object – science, unlike sports, doesn’t have to pretend it’s fair.

Getting research done – staff scientist

This type doesn’t require introduction. If one doesn’t have to waste time on advancing career and hunting for money, one becomes a very efficient scientist. Staff scientist positions are available in many countries and I wish it could be more of them in the future, especially in bioinformatics – where a single person can be trained to do everything from microarrays analysis to molecular dynamics in a relatively (!) short time (and become then a very important asset in the lab).

Experienced specialist – nomadic freelancer

Nissan_NV200 photographed in Tokyo Motor Show 2007
Image via Wikipedia

This is category I was aspiring to. Here you can read little details how I tried, and here when and why it failed. I still think it can be done, although not in every field and not all the time. My hope was that telecommuting is the future of freelance scientists, but Bora offered entirely different solution: co-researching spaces/science hostels:

A coworking space has three important components: the physical space, the technological infrastructure, and the people. A Science Hostel that accommodates people who need more than armchairs and wifi, would need to be topical – rooms designed as labs of a particular kind, common equipment that will be used by most people there, all the people being in roughly the same field who use roughly the same tools.

From what I’ve seen, people doing structural biology (especially NMR-related research) tend to enjoy similar to a freelancer status: they can do a crucial high tech task, which takes no more than several weeks to finish and often the task is needed so rarely that there’s no point in employing the specialist  full time (or to do in-house training).

The main disadvantage of this mode is something called “consultant’s dilemma” (hat tip Harold Jarche): when you’re working you’re not generating new ideas or business, and vice versa.

In a  failure of interdisciplinary approach – translator, integrator

I expect that lots of people will disagree with me on that, but I think on the long run interdisciplinary approaches are going to fail. The area where a reason for failure is most visible is genome sequencing. Deep knowledge about single simple organism such as bacteria is beyond capability of most (if not all) laboratories and teams and that’s why publishing a genome is just a starting point, not end to a process. It takes years of work of experts in their own small fields to extract all useful information from the single sequence.

Once this situation becomes more of an issue, scientific translators may emerge. Such person will track scientific literature in two (or three or four, such as language translators) small fields and will tell group of researchers from one field what important has been published in other field. Will similar service become part of libraries or such people will become independent consultants? I have no idea.

I don’t think that gaps in knowledge will be corrected by talking to colleagues or by review process. Here’s a perfect example (in used-to-be prestigious journal): neither authors nor reviewers have noticed that the structure containing so-called trimerization “octads” is a perfectly fine, quite regular, heptad-based coiled-coil (you guess it right, these “octads” were separated by six residues, giving together fourteen – two coiled-coil heptads). It was already visible in the sequence figure – but only if you knew that things like coiled-coils exist and were already studied by Francis Crick. After almost a year and a half correction wasn’t submitted which means the community does not care either.

Bioentrepreneur

As soon as we have our own Paul Graham and a clear, well-described path of how to make a startup in life sciences successful, we will have a bloom of bioentrepreneurs. Life science is a field comparable to high-tech, not software industry. It requires different skills and different approach, but no one has so far put it into words that we can follow. Also, we need more hardware providers in area of life sciences. If you want to build a mobile phone, it’s a matter of days to order its every single part. If you want to build your own sequencing machine, I wish you good luck, because it will take considerably longer (you need to wait until respective companies are built and offer their products).

Nevertheless, I’m sure it will happen. Streamlining life sciences is something that lots of people are talking about.

Clean data needed – biocurator

The more data the more errors. Recently, I’ve stumbled upon interesting functional annotation of a protein: will die slowly. Search on NCBI reveals few dozens of proteins with such annotation. This is a terse description of a phenotype, however I don’t think should be used as a protein name. Paul Davis suggested that this propagated from Drosophila, since fruit fly gene names have a long history of names blurb:

Early work refers to the gene as fruity, an apparent pun on both the common name of D. melanogaster, the fruit fly, as well as a slang word for homosexual. As social attitudes towards homosexuality changed, fruity came to be regarded as offensive, or at best, not politically correct. Thus, the gene was re-dubbed fruitless, alluding to the lack of offspring produced by flies with the mutation.

It’s nothing new that to reach holy grail of many fields (text mining, ontologies, automated discoveries, predictions), we need manual curation of biological data (even Wolfram Alpha is based on curated data). Similarly to staff scientists, biocurator jobs are already appearing in science job listing.

Science as creative hobby – “not even a scientist”

In the introduction I’ve mentioned a teenager inventing new genetic modification of an organism. While to some it may sound difficult, unquestionable success of iGEM competition shows that it doesn’t require 20 years of research experience to come up with such ideas. Lots of knowledge and lots of data create opportunity for people outside academia to jump in and make a valuable contribution. The necessary requirement in “openness” – as long as the data and publications are freely available, there’s a space for outsiders.

I expect (or I hope) amateur science to grow in the following years – especially in the less bureaucratic countries. If we don’t see many of such examples yet, it’s the education system to blame – kids don’t realize that remixing data and remixing video are very similar things that differ only by a target audience, but both can be cool :).

Knowing your position – “lighthouse” scientist

Lighthouse’s primary role is to assists in navigation – it helps you find your position on the map. Lighthouse is not a point of reference – as a point on the map is usually no more important than any other points. Lighthouse helps you understand where you are. Tech crowd has its own “lighthouse” people, for example Tim O’Reilly. Our small online science community has Bill Hooker. Neither of them seem to have outstanding resume (sorry to write that, I’ve seen better ones), but to understand where you are it’s worth to pay attention to what they say. They seem to understand particular part of our world much better than anybody else.

To put it in other words, a lighthouse scientist isn’t necessarily a person with the biggest achievements or a person who has a brilliant vision of the future – it’s a person who sees trends and movements, has a wider perspective and most importantly knows what’s important. In recent discussions on the blogosphere about bioinformatics as a field of science, Sean Eddy didn’t express his opinion – which I think is a very meaningful response.

Final thoughts

I’ve sketched this map to organize lots of thoughts and discussions around future directions of science. It is far from being complete and full of wishful thinking, but still helped me to wrap my mind around couple of issues in this area. Probably the most important thing I’ve realized is what was put into introduction: that the future may open lots of options for people willing to stay close to science. Those who realize this will benefit from them as first.

Update: there are interesting comments over at FriendFeed already.

Reblog this post [with Zemanta]
 
3 Comments

Posted by on March 26, 2009 in Comments, Community, Research

 

Tags: , , , , , , ,

Structure prediction without structure – visual inspection of BLAST results

portschemaMy recent post on visual analytics in bioinformatics lacked a specific example, but I’m happy to finally provide one (happiness comes also from the fact that respective publication is finally in press). The image above shows a multiple pairwise alignment from BLAST of a putative inner membrane protein from Porphyromonas gingivalis. Image is small but it does not really matter – colour patches seem to be visible anyway.

Regions marked with ovals are clearly less conserved, than other part of the protein. There are five hydrophobic (green patches, underlined with blue lines) regions in this alignment (I ignore N-terminus, as this is likely the signal peptide), however the three inner ones appear to be of similar length, while the outer ones seem to be of the half as long as the inner ones. If we assume that the single unit is the short one, we can summarize the protein as follows: 8 beta structures, four long loops, for short loops. It looks like an eight-stranded outer membrane beta-barrel. Almost structure prediction, but without a structure.

I could end the story here, but the model didn’t fit previously published data. Its localization in the inner membrane was confirmed by an experiment, however pores in the inner membrane are considered very harmfull 😉 . Fortunately, one of my colleagues explained to me that particular localization technique is not 100% reliable, so I gathered more evidence, created detailed description of topology and the other group has designed experiments which confirmed my visual analysis.

Lessons learned? Maybe without this feedback on quality of that experimental technique, I would still claim that this is OM beta-barrel. Or maybe not. But I’ve learned that to safely ignore experimental results, one needs a more than a intuition. Also, it shows that sometimes looking at the results, is all one needs to make a reasonable prediction (I still have no idea what were E-values of these BLAST hits, but does it matter?).

Reblog this post [with Zemanta]
 
7 Comments

Posted by on February 3, 2009 in bioinformatics, Research, Visualization

 

Tags: , , , , ,

Another collaborative environment: Project Wonderland

This is a short post on the Sun’s Project Wonderland. Citing from its home page

Project Wonderland is a 100% Java and open source toolkit for creating collaborative 3D virtual worlds. Within those worlds, users can communicate with high-fidelity, immersive audio, share live desktop applications and documents and conduct real business. Wonderland is completely extensible; developers and graphic artists can extend its functionality to create entire new worlds and new features in existing worlds.

In my recent post I’ve mentioned Second Life and Croquet: two platforms that can evolve into decent 3D visualization environments. Obviously I didn’t research the topic enough, as I’ve just found Project Wonderland. It seems to have the best of both worlds – professional team of developers, pretty flexible architecture and possibility of running your own instance of “virtual world”.

)

Have you spotted "Biogang" written on the whiteboard? 🙂

I didn’t play with it for a long time – current version is not very feature-rich (although it already contains video player with webcam support, PDF viewer, VNC viewer and a crude whiteboard), however the roadmap looks very interesting. I really liked extensive audio features – true stereo, sounds fade out with distance, special “cone of silence” (place where you can have a private conversation) – it proves that Sun is really trying to build an effective collaboration platform.

I haven’t seen yet much about data visualization in Wonderland – although below you can find interesting example of molecular simulation trajectory shown inside Wonderland.

Reblog this post [with Zemanta]
 
Comments Off on Another collaborative environment: Project Wonderland

Posted by on December 29, 2008 in Education, Research, Visualization

 

Tags: , , ,

Bioinformatics is a visual analytics (sometimes)

Short description of my research interest is “I do proteins” (I took this phrase from my friend Ana). I try to figure out what particular protein, protein family, or set of proteins does in the wider context. Usually I start where automated methods have ended – I have all kinds of annotation so I try to put data together and form some hypothesis. I recently realized that the process is basically visualizing different kind of data – or rather looking at the same issue from many different perspectives.

It starts with alignments. Lots of alignments. And they all end up in different forms of visual representation. Sometimes it’s a conservation with secondary structure prediction (with AlignmentViewer or Jalview):

blog-0005

Sometimes I look for transmembrane beta-barrels (with ProfTMB):

blog-0005

Sometimes I try to find a pattern in hydrophobicity and side-chain size values across the alignment (Aln2Plot):

blog-0005

Afterwards I seek for patterns and interesting correlations in domain organization (PFAM, Smart):

blog-0008

Sometimes I map all these findings onto a structure or a model that I make somewhere in the meantime based on found data (Pymol, VMD, Chimera):

blog-0006

I also try to make sense out of genomic context (works for eukaryotic organisms as well – The SEED):

blog-0005

I investigate how the proteins cluster together according to their similarity (CLANS):

blog-0013

And figure out how the protein or the system I’m studying fits into interaction or metabolic networks (Cytoscape, Medusa, STRING, STITCH):

blog-0007

If there’s some additional numerical information I dump it into analysis software (R, for simpler things DiVisa):

blog-0005

And I make note along the process in the form of a mindmap (Freemind, recently switched to Xmind, because it allows to store attachments and images in the mindmap file, not just link to them like Freemind does):blog-0010

So it turns out that I mainly do visual analytics. I spend considerable amount of time on preparing various representations of biological data and then the rest of the time I look at the pictures. While that’s not something every bioinformatician does, many of my colleagues have their own workflows that also rely heavily on pictures. For some areas it’s more prominent, for others it’s not, but the fact is that pictures are everywhere.

There are two reasons I use manual workflow with lots looking at intermediate results: I work with weak signals (for example, sometimes I need to run BLAST at E-value of 1000) or I need to deeply understand the system I study. Making connections between two seemingly unrelated biological entities requires wrapping one’s brain around the problem and… lots of looking at it.

And here comes the frustration. I counted that I use more than twenty (!) different programs for visualization. And even if I’m enjoying monitor setup 4500 pixels wide which is almost enough to put all that data onto screen, the main issue is that the software isn’t connected. AlignmentViewer cannot adjust its display automatically based on the domain I’m looking at or a network node I’m investigating – I need to do it by myself. Of course I can couple alignments and structure in Jalview, Chimera or VMD but I don’t find such solution to be usable on the long run. To have the best of all worlds, I need to juggle all these applications.

I’ve been longing for some time already for a generic visualization platform that is able to show 2D and 3D data within the single environment, so I follow development of SecondLife visualization environment and Croquet/Cobalt initiatives. While these don’t look very exciting right now, I hope they will provide a common platform for different visualization methods (and of course visual collaboration environment).

But to be realistic, visual analytics in biology is not going to become a mainstream. It’s far more efficient to improve algorithms for multidimensional data analysis than to spend more time looking at pictures. I had already few such situations when I could see some weak signal and in a year or two it became obvious. But I’m still going to enjoy scientific visualization. I came to science for aesthetic reasons after all. 🙂

Reblog this post [with Zemanta]
 

Tags: , , , , , , , ,