Author Archives: Pawel Szczesny

Structure prediction without structure – visual inspection of BLAST results

portschemaMy recent post on visual analytics in bioinformatics lacked a specific example, but I’m happy to finally provide one (happiness comes also from the fact that respective publication is finally in press). The image above shows a multiple pairwise alignment from BLAST of a putative inner membrane protein from Porphyromonas gingivalis. Image is small but it does not really matter – colour patches seem to be visible anyway.

Regions marked with ovals are clearly less conserved, than other part of the protein. There are five hydrophobic (green patches, underlined with blue lines) regions in this alignment (I ignore N-terminus, as this is likely the signal peptide), however the three inner ones appear to be of similar length, while the outer ones seem to be of the half as long as the inner ones. If we assume that the single unit is the short one, we can summarize the protein as follows: 8 beta structures, four long loops, for short loops. It looks like an eight-stranded outer membrane beta-barrel. Almost structure prediction, but without a structure.

I could end the story here, but the model didn’t fit previously published data. Its localization in the inner membrane was confirmed by an experiment, however pores in the inner membrane are considered very harmfull 😉 . Fortunately, one of my colleagues explained to me that particular localization technique is not 100% reliable, so I gathered more evidence, created detailed description of topology and the other group has designed experiments which confirmed my visual analysis.

Lessons learned? Maybe without this feedback on quality of that experimental technique, I would still claim that this is OM beta-barrel. Or maybe not. But I’ve learned that to safely ignore experimental results, one needs a more than a intuition. Also, it shows that sometimes looking at the results, is all one needs to make a reasonable prediction (I still have no idea what were E-values of these BLAST hits, but does it matter?).

Reblog this post [with Zemanta]

Posted by on February 3, 2009 in bioinformatics, Research, Visualization


Tags: , , , , ,

Microblogging in PLoS

I don’t usually repost news, as my FriendFeed stream (also available from the sidebar of this blog)  is a more efficient way to let you know about interesting things, but this one deserves a special mention. Recent coverage of ISMB 2008 conference over at FriendFeed ended up as a publication in PLoS Computational Biology:

Saunders, N., Beltrão, P., Jensen, LJ, Jurczak, D., Krause, R., Kuhn, M. and Wu, S. (2009).
Microblogging the ISMB: A New Approach to Conference Reporting.
PLoS Comput Biol 5(1): e1000263

This is very exciting, but it also has some interesting implications. Of course it means that more and more people will participate in our community and finally BioGang projects will start to take off (hopefully), but I think also about something else. Do you remember Neil’s post about why you should have online presence? I think there’s one more thing to add to this list. Authors of this publication and lots of others scientists over at FriendFeed will sooner or later climb to to a PI-equivalent positions, where they can decide about hiring somebody. And strong online presence will be for them an important asset in CV. Much more important than you’d think today ;).

Reblog this post [with Zemanta]
Comments Off on Microblogging in PLoS

Posted by on January 30, 2009 in Community


Tags: , , , ,

Timestamped FriendFeed activity – really public “profile”

Accidentaly, I have found a simple way for obtaining a time stamp for each entry and comment any person with publicly available lifestream makes on FriendFeed (except “Likes”, which do not seem to be timestamped at all). Activity of semi-randomly choosen person during the day (summarized over couple of weeks (!))  is shown below:

FriendFeed usage during 24 hours, summarized over couple of days.

FriendFeed usage during 24 hours, summarized over couple of days.

While relation between AM and PM periods is correct, time-zone is manually shifted, so it’s more difficult to guess who’s this activity is (but it’s not Robert Scoble if you want to ask). What does it tell? Basically, this person does not close FriendFeed window for the most of the day. Additionally, there’s a period of the day in which “catching-up” has place. Nothing interesting so far? Original data has much more details. It is possible for example to collect information when during the day particular person usually watches videos on YouTube. Guess – is that during working hours? 🙂

Ability to get that data for couple of weeks back without any trouble (I didn’t need to track this person’s activity for such period) was kind of disturbing. I knew it’s very simple to start tracking my habits, but I wasn’t aware of the fact that it’s also easy to see what I was doing over the last three weeks. Do you think it makes a difference?

Reblog this post [with Zemanta]
Comments Off on Timestamped FriendFeed activity – really public “profile”

Posted by on January 29, 2009 in Comments, Visualization


Tags: , , ,

Science & Art: what language do you use?

TED 08
Image by cr8it via Flickr

I’ve just realized where is the important difference between artists and scientists – and probably the biggest challenge of the merging or communicating between these two areas. When we do research, we tend to think in words. When we paint, we tend to think in colors. When we compose, we tend to think in sounds. Our right hemisphere thinks in colors, images, feelings or sounds, while the left thinks almost exclusively in words/in symbols. This is of course an over-generalization, but still I think it’s very important point when discussing relations between science and art. Putting right hemisphere experience into words is so difficult task, that most of such attempts sounds like gibberish. Have you watched TED talk “My stroke of insight”? Jill Bolte Taylor shared her first person observations from the stroke, which turned off her left (logical and analytical) hemisphere. While she did great job (also of not going too much into details), still some commenters were complaining about scientific quality of these observations (or that she sounded like she were on drugs, which is by the way not a coincidence).

If that sound too abstract to you, consider history of discovery of benzene. Kekulé had a day dream of snake  seizing its own tail – and interpreted it correctly. And I believe this is not a single example, where solution to a scientific problem presents itself to a researcher in some non-linguistic form (or rather right hemisphere sends solution to left hemisphere). However, such stories are rare for a couple of reasons: we are not usually aware of the fact that “artistic” hemisphere can “solve” scientific problems, we lack skills to identify and translate such messages, and finally it seems unprofessional to admit that we had a “vision” that led to a successful solution.

I’m not sure about correctness of these speculations. It has been quite difficult to get to that point, exactly because of limits of linguistic description of the Art (I rarely can stand an artist’s statement), so it’s likely I’ve made some mistakes on the way. Therefore I would appreciate any help along the way.

Reblog this post [with Zemanta]

Posted by on January 27, 2009 in Science and Art, Visualization


Tags: , , , , , , ,

Database query and ranked results

The Autophagy network extracted from the recen...
Image via Wikipedia

Already some time ago I’ve  read a piece by Marcelo Calbucci: Is it a database or a search engine?. While it deals with search information within a real estate database, I think his comments are applicable in the many areas of life sciences.

In short, Marcelo points out that people miss a lot of interesting entries while looking for a house, because of inflexibility of the query; number of bedrooms, price, distance from some point – these are all set. However, users are flexible and in such case need rather a search engine that gives them close enough answer or allows to specify weight to each filter.

In life sciences we do search for similarities and analogies all the time. Sometimes it’s direct comparison of sequences, on other occasion is high-level meta-comparison between two systems. And while we have various (statistical) metrics of similarities and they sometimes become a part of a database designs, interfaces of biological databases don’t allow to rank query results according to these metrics. For example I can easily find all human proteins related to disease X or disease Y or disease Z, although I cannot specify that I want proteins related to Z AND Y first on the list. Other example would be searching PubMed – I can look for articles related to “synthetic biology”, but I have no way to specify, that I want papers by James Collins from HHMI AND articles related to these papers to be first on the list. I guess it is possible to obtain such results without going through the whole list, but I doubt the method will be very simple. Filtering still seems to be neglected aspect of database design in life sciences.

My dream biological search engine would have a series of sliders (or ideally, I would like to have a device with series of mechanical knobs attached to the computer) and would allow me to dynamically change weights of various aspects of the query and see immediately how it affects the results. It would be something resembling interactivity of Gapminder World, but on dynamically generated data. Technology and proof of concept seems to be there, but I guess we need to wait quite a few years before this approach will be adopted within life sciences.

Reblog this post [with Zemanta]

Posted by on January 22, 2009 in bioinformatics, Data mining, Software


Tags: , , , ,

Collanos Workplace and scientific collaboration

One of my woskpaces in Collanos

One of my workspaces in Collanos

For some time already I was looking for a tool that would eliminate a need for sending files back and forth between people collaborating on a the same project. While I’m perfectly aware of various solutions such as wikis, version control systems or online office suites, I didn’t feel like I could convince my collaborators to use any of these. One of the reasons is always a feeling of insecurity when using publicly hosted platform (BTW, this is not that uncommon among scientists – I know at least one scientific institution in Western Europe that explicitly forbids using Google apps, especially Gmail for work-related stuff, because of Google’s privacy policy). The other reason was that such solutions are not the best choice when working on binary files (most of my projects do not involve collaborative programming). When I stumbled across Collanos Workplace, which offers peer-to-peer synchronization (although without revision control), instead of a central-server based, I’ve decided to give it a try. For the last couple of weeks I’ve been using Collanos to collaborate on one relatively simple project and the experience was quite positive.

At first, I thought that Collanos may serve mainly as a tool for secure peer-to-peer files sharing with an information who changed what etc. It turned out that this is a capable project management application, that has a chat and discussion panel, one can post notes, links add tasks and assign them to team members. Files are stored is a separate directory – after one adds a file to Collanos, it should be opened from the application, not from original folder. This seemed a mistake in design at first, but I appreciated it very quickly. Synchronization of project directory would mean sharing all of its contents and that can be sometimes in the range of many GBs. From time to time some bug appeared here or there, but overall it worked as expected. Peer-to-peer sharing means that both people have to be online for synchronization, but so far situation that I switched computer off before a person could download my changes happened only once and it was during a weekend.

As a side note, it’s nice to see that Eclipse becomes an application platform for quite a number of programs. See for example this list of Eclipse-based software.

Reblog this post [with Zemanta]
1 Comment

Posted by on January 15, 2009 in Research skills, Software


Tags: , , ,

Science and art. New theme for the new year.

Bose–Einstein condensate In the July 14, 1995 ...
Image via Wikipedia

In 2007 this blog was mainly scientific. Last year I’ve explored possibilities of being a freelance scientist. As I’ve announced earlier on Twitter, theme for this year will be science and art. And I should already explain: I’m not going to write about such extraordinary artistic endeavours like creating music from DNA/protein sequence, try to convince you that science is beautiful or state that my pictures of molecules are the true art. I’m more interested to see if there’s anything I can learn from The Art, its history and its approach. While I’m not yet sure what I will end up writing about, here are two topics I may start with to see in which direction this theme unfolds.

Holistic approach to science

This is something I was thinking about for a while. I didn’t come up with anything interesting, but I think it’s worth exploring further. Some first ideas were coming from reading Wikipedia entry about lateralization of brain functions or Steve Brenner’s comments about “middle-out approach” (as opposed to top-bottom or bottom-up). I’ve also found peculiar Mihaly Csikszentmihalyi‘s answer to Edge 2009 question, where he wrote about “The end of analytic science”. Very recently I’ve also found interesting interview with Daniel Tammet, autistic savant, who explains his theory of exceptional creativity coming from “hyper-connectivity” of distinct brain regions. I have no yet idea whether there’s anything practical to find in such theories, but their exploration will be appealing enough.

Dashboard design for scientific data

This is something more practical, although again I expect to get no points for that topic. Information dashboard is a very cool concept rarely used in life sciences. One of the best known examples in bioinformatics may be InterPro domain page (here’s example entry on pore-forming lobe of aerolysins) – almost everything is on the single page, it has some nice graphical overviews of particular features (like species distribution), etc. It’s not the prettiest dashboard around, but at least you don’t need to click anywhere to have an overview of stored information (compare it to PFAM approach to similar domain). I hope to learn what makes a great dashboard, experiment a little and see if the result is worth the effort.

Other topics

I still will be blogging about bioinformatics, visualizations and open science – that stays in place. Especially the last topic is something I expect to write about quite a lot – my feeling is that this year will bring couple of interesting events in this area (and I hope to initiate some of them). So if you don’t like the “science and art” theme, I think I will give you some other reasons to visit this blog once in a while.

Reblog this post [with Zemanta]

Posted by on January 11, 2009 in bioinformatics


Another collaborative environment: Project Wonderland

This is a short post on the Sun’s Project Wonderland. Citing from its home page

Project Wonderland is a 100% Java and open source toolkit for creating collaborative 3D virtual worlds. Within those worlds, users can communicate with high-fidelity, immersive audio, share live desktop applications and documents and conduct real business. Wonderland is completely extensible; developers and graphic artists can extend its functionality to create entire new worlds and new features in existing worlds.

In my recent post I’ve mentioned Second Life and Croquet: two platforms that can evolve into decent 3D visualization environments. Obviously I didn’t research the topic enough, as I’ve just found Project Wonderland. It seems to have the best of both worlds – professional team of developers, pretty flexible architecture and possibility of running your own instance of “virtual world”.


Have you spotted "Biogang" written on the whiteboard? 🙂

I didn’t play with it for a long time – current version is not very feature-rich (although it already contains video player with webcam support, PDF viewer, VNC viewer and a crude whiteboard), however the roadmap looks very interesting. I really liked extensive audio features – true stereo, sounds fade out with distance, special “cone of silence” (place where you can have a private conversation) – it proves that Sun is really trying to build an effective collaboration platform.

I haven’t seen yet much about data visualization in Wonderland – although below you can find interesting example of molecular simulation trajectory shown inside Wonderland.

Reblog this post [with Zemanta]
Comments Off on Another collaborative environment: Project Wonderland

Posted by on December 29, 2008 in Education, Research, Visualization


Tags: , , ,

Bioinformatics is a visual analytics (sometimes)

Short description of my research interest is “I do proteins” (I took this phrase from my friend Ana). I try to figure out what particular protein, protein family, or set of proteins does in the wider context. Usually I start where automated methods have ended – I have all kinds of annotation so I try to put data together and form some hypothesis. I recently realized that the process is basically visualizing different kind of data – or rather looking at the same issue from many different perspectives.

It starts with alignments. Lots of alignments. And they all end up in different forms of visual representation. Sometimes it’s a conservation with secondary structure prediction (with AlignmentViewer or Jalview):


Sometimes I look for transmembrane beta-barrels (with ProfTMB):


Sometimes I try to find a pattern in hydrophobicity and side-chain size values across the alignment (Aln2Plot):


Afterwards I seek for patterns and interesting correlations in domain organization (PFAM, Smart):


Sometimes I map all these findings onto a structure or a model that I make somewhere in the meantime based on found data (Pymol, VMD, Chimera):


I also try to make sense out of genomic context (works for eukaryotic organisms as well – The SEED):


I investigate how the proteins cluster together according to their similarity (CLANS):


And figure out how the protein or the system I’m studying fits into interaction or metabolic networks (Cytoscape, Medusa, STRING, STITCH):


If there’s some additional numerical information I dump it into analysis software (R, for simpler things DiVisa):


And I make note along the process in the form of a mindmap (Freemind, recently switched to Xmind, because it allows to store attachments and images in the mindmap file, not just link to them like Freemind does):blog-0010

So it turns out that I mainly do visual analytics. I spend considerable amount of time on preparing various representations of biological data and then the rest of the time I look at the pictures. While that’s not something every bioinformatician does, many of my colleagues have their own workflows that also rely heavily on pictures. For some areas it’s more prominent, for others it’s not, but the fact is that pictures are everywhere.

There are two reasons I use manual workflow with lots looking at intermediate results: I work with weak signals (for example, sometimes I need to run BLAST at E-value of 1000) or I need to deeply understand the system I study. Making connections between two seemingly unrelated biological entities requires wrapping one’s brain around the problem and… lots of looking at it.

And here comes the frustration. I counted that I use more than twenty (!) different programs for visualization. And even if I’m enjoying monitor setup 4500 pixels wide which is almost enough to put all that data onto screen, the main issue is that the software isn’t connected. AlignmentViewer cannot adjust its display automatically based on the domain I’m looking at or a network node I’m investigating – I need to do it by myself. Of course I can couple alignments and structure in Jalview, Chimera or VMD but I don’t find such solution to be usable on the long run. To have the best of all worlds, I need to juggle all these applications.

I’ve been longing for some time already for a generic visualization platform that is able to show 2D and 3D data within the single environment, so I follow development of SecondLife visualization environment and Croquet/Cobalt initiatives. While these don’t look very exciting right now, I hope they will provide a common platform for different visualization methods (and of course visual collaboration environment).

But to be realistic, visual analytics in biology is not going to become a mainstream. It’s far more efficient to improve algorithms for multidimensional data analysis than to spend more time looking at pictures. I had already few such situations when I could see some weak signal and in a year or two it became obvious. But I’m still going to enjoy scientific visualization. I came to science for aesthetic reasons after all. 🙂

Reblog this post [with Zemanta]

Tags: , , , , , , , ,

End of freelancing as scientist (for now)

The patchwork landscape of Masuria
Image via Wikipedia

Almost a year ago I wrote a post about officially becoming “freelance scientist”. I didn’t really know what I was doing, but taking over the world from a small flat in the middle of nowhere in Poland sounded like a good idea. And it definitely was a good idea, however not in a way I thought it would be. Today I am hoping that all things will go fine and I’ll be employed since February in a calm academic environment.

What I aimed for?

My plan was to become freelance scientist – to be able to thrive financially and intelectually without relying on gaming grant systems. I had hoped to secure support from private sources, form a virtual institute and live happily in the middle of nowhere while still having an impact on the world’s science, possibly all under “open research” badge.

What didn’t work?

Side comment: if you read excellent piece by Hugh MacLeod entitled “How to be creative”, you shouldn’t find the issues below surprising :).

The main reason I started to look for a job already some weeks ago was that freelancing as a scientist turned out to be unsustainable financially. And don’t get me wrong – money wasn’t an issue, as long as I was willing to put all my time into other’s people projects. All. My. Time. Booking some time to work on my own ideas meant burning savings at a quick rate. But I had to work on my own ideas. I didn’t feel like I’m learning very much, because I worked on things I was already quite good at. Intellectual stretching was not that big.

Because of the issue above, I’ve put together far less work than I aimed to. I have lots of posts, manuscripts and presentations which I didn’t have time to finish. I was too busy doing freelance work, finishing the projects I had promised to do, inventing new projects and hiding under a bed worrying about where this is all going.

Working in the middle of nowhere was a plain mistake. It sounds nice, but f2f networking (“showing up”) is far more important than I’ve thought. Working in Poland is an issue on its own (no matter if you’re a freelancing or an academic scientist); working outside of any major city makes it even worse.

Partially connected to the former issue was the fact that I tried to do all things alone. Wrong. Very wrong. Things like virtual institute will not work, unless there’s a team. Period.

And finally, I didn’t give myself enough time to make the whole system work. It turned out that I had no idea about so many things influencing money-flow in the system, that it’s not surprising at all that it didn’t click in so short (12 months) time.

What worked?

One of two biggest advantages of this crazy 12 months was that it was a great learning experience. When I look at my older colleagues working in academic environment, I’m pretty sure they don’t experience “felling like an idiot” moments all that often. Such moments happen quite frequently in grad school, but seem to become rarer the further science career advances. On a contrary, I had such moments all the time in the last year. I was experimenting with blog posts, stupid ideas, unbalanced opinions and I was scared as hell each time. And I have learnt much more than I would do playing safe. Have you watched Ken Robinson’s talk at TED? He put a beatiful phrase – “prepared to be wrong”. Keep that in mind.

People were second most important factor here. I was amazed by a number of people that have helped me along the way. Lots of them have encouraged me, pointed to useful resources or invested significant amount of time into answering my silly questions. Many times I was blown away by the help I had not expected. Biogang/Life Scientists community rules.

Frequently quoted phrase from Bill Hooker’s essay, “I’ve never had an idea that couldn’t be improved by sharing it with as many people as possible — and I don’t think anyone else has, either.”, turned out to be absolutely true. Each time I presented my ideas, people were interacting with them, not judging them. I was given suggestions I would not come up with by myself, even if it was clear that we’re not going to do business together.

What now?

The job I hope to land next year is going to address the things I’ve written about above. I hope to have some financial stability and necessary time to advance my plans. It will also provide a support for such events like “Startup weekend in science”, which I plan to invite you all later next year.

The main goal is still valid and I don’t give up on it. I’ve found (or rather the other way round) a real-life example, ProTech Institute from Lithuania, which means that it can be done – it’s just a little harder if you’re a (still,  but not for long) PhD student.

So it’s end of freelancing for now. Lessons learned. Back to real-life™ again :).

Reblog this post [with Zemanta]

Posted by on December 9, 2008 in Career, Comments