Freelancing science

RSS

How far one can push online collaboration in research?

A week or so ago I’ve asked on the FriendFeed if there’s an interest in writing a cyclic publication on the status of Science 2.0. I thought that summarizing every year advancements in openness of science would be a good idea. During discussion it turned out that there’s a need to write a first major publication on Science 2.0 concepts because there isn’t one published in the life sciences field. The final conclusion of interested people was to meet face to face on the upcoming conference and discuss things in detail.

And this made me think.

I started to wonder why people who live, breathe and do research online still need to meet in person to plan and discuss some stuff. The very obvious explanation is that the conference was only two weeks ahead, so there was nothing more than that, but some patterns (brainstorm online, then meet in person, then finish online) repeat so often that I started to believe an online collaboration can only go to a certain point.

The excuse is not in the tools, especially given how fast the new ones appear. As an examples may serve recently launched Adobe Acrobat site, which contains online editor and live collaboration suite (contains screen sharing, notes, chat, audio, video) or its analog for programmers: Assembla (svn, git, trac, wiki, milestones).

My feeling is that what makes a difference is not a quality of interaction while working on a certain project, but the possibility of discussing things not directly related to this project. Online collaboration is usually very focused. When editing a document online it’s usually hard to side-track it, so at the end it’s about something else than it was planned. Calls or videoconferences have usually a schedule. No place for non-related stuff. No place for a beer/coffee/glass of water and a chat about how life was good in old times. No place for discussions on random things and coming to the main project from a different angle and with completely new ideas.

I’ve been trying for quite a long to work online with other people on some projects from my home office (which is in the middle of nowhere). As you can guess, it works to a certain point and majority of them were improved upon meeting face to face. And this left me wondering how far can I push online collaborations. It isn’t usually an issue in IT field, but so far it doesn’t look very promising in research.

3 Comments

Posted by Pawel Szczesny on August 27, 2008 in Comments, Community, Research

Tags: FriendFeed, Science 2.0

Ubiquity – coding something useful in less than 20 minutes

27 Aug

Ubiquity is the new experimental extension to Firefox that will (I’m sure it will) make enormous impact on the way we use the browser. It allows to remix various services and extend functionality of the browser in very easy way (if you don’t get the point of Ubiquity yet, I recommend watching the video that came with official announcement; I needed to see that – description didn’t tell me much about how powerful it can be).

I didn’t have much time to play with it yet, but in spare 20 minutes I attempted to code a command that would show me the image of a structure from PDB given its code and eventually take me to its homepage. Suprisingly it was very easy (and I’m not a JS coder). The source is pasted below.

CmdUtils.CreateCommand({
  name: "pdb",
  description: "Goes to Protein Data Bank given PDB code.",
  icon: "http://www.rcsb.org/favicon.ico",
  help: "You can specify the PDB code and pressing enter will take you to particular structure's homepage." +
    " If you type pdb code and press arrow down, you should see an image from PDB site.",

  takes: {"PDB code": noun_arb_text},

  execute: function( directObj) {
    var pdbcode = directObj.text;
    Utils.openUrlInBrowser("http://www.rcsb.org/pdb/explore/explore.do?structureId="+pdbcode);
  },

  preview: function( pblock, directObj ) {
    var pdbcode = directObj.text;

    pblock.innerHTML = "Preview of the structure:<br/>";
    pblock.innerHTML += "<img src=\"http://www.rcsb.org/pdb/images/" + pdbcode + "_bio_r_250.jpg\" />";

  }
})

It of course could be improved by using also a selected text, or allowing to keyword search the PDB (or basically any other biological database), but its current functionality suits me just fine. Ubiquity is not yet such a stable platform as Greasemonkey (or Chickenfoot), but it’s worth to keep an eye on it. I’m sure we will read sooner or later an article in peer-reviewed journal describing Ubiquity commands for life sciences :).

Mozilla gives the passionates one with Ubiquity

1 Comment

Posted by Pawel Szczesny on August 27, 2008 in Software

Tags: Firefox, Mozilla, Mozilla Labs, Protein Data Bank, protein structure, Ubiquity

Relaxing before weekend – PDB file and Panda3D

15 Aug

Software for visualization of molecules is in majority of cases very focused on its job and rarely allows for something outside its scope (one of exceptions is VMD – you can plot 3d surfaces using its graphic engine). Every couple of months I check status of various 3D engines to see how they are suited for molecular visualization. Recently, I had another look at Panda3D, free 3D engine Disney is using to do some of its games. As an exercise in Python I’m learning right now, I’ve tried to import a PDB file into Panda3D and rotate it.

Panda3D doesn’t have a native support for molecules, instead it supports its own egg format for models. Fortunately, there’s an egg format exporter for Blender, so I imported hemoglobin molecule in cartoon representation into Blender (procedure described at the bottom of this page) and then exported in Panda3D format. The rest was pure Python (and extensive copy/paste from tutorials found on the web). Following code will load model from hbg.egg file, set up some lights and rotate camera around it.

import direct.directbase.DirectStart
from direct.showbase.DirectObject import DirectObject
from pandac.PandaModules import *
from direct.task import Task
import math

#Load the protein model
protein = loader.loadModel("hbg")
protein.reparentTo(render)
protein.setScale(1.4)
protein.setPos(0,0,2)

#setup lights
light1 = AmbientLight('light1')
light1.setColor(VBase4(0.12, 0.12, 0.12, 1))
plnp = render.attachNewNode(light1)
render.setLight(plnp)

light2 = PointLight('pointlight')
plnp2 = render.attachNewNode(light2)
plnp2.setPos(0,0,2)
render.setLight(plnp2)

#Task to move the camera
def SpinCameraTask(task):
  angledegrees = task.time * 6.0
  angleradians = angledegrees * (math.pi / 180.0)
  base.camera.setPos(20*math.sin(angleradians),-20.0*math.cos(angleradians),2)
  base.camera.setHpr(angledegrees, 0, 0)
  return Task.cont

base.setBackgroundColor(0.0,0.0,0.0)
taskMgr.add(SpinCameraTask, "SpinCameraTask")

run()

Not so impressive screenshot is shown at the top. It’s not a rocket science and state-of-the art visualization, but I’m positively surprised how easy is today to get such thing up and running. Game industry is a large one and even proprietary engines are quite cheap (for non-commercial purposes one can have them for small hundreds of dollars), so I expect quite a few scientific projects built on such platforms coming soon. SL engine is not the last one to be used for such purpose.

2 Comments

Posted by Pawel Szczesny on August 15, 2008 in Visualization

Tags: 3D modeling, blender, Game engine, Panda3D, PDB

One year of blogging – plans for ten years

11 Aug

Ethos Roundtable at Bob Doyle's Home - July 18...

Image by Pathfinder Linden via Flickr

Following BioBarCamp I missed one year anniversary of this blog. With sixty something posts I cannot say I’m very productive blogger, but I didn’t aim at being one, as probably all other science-types. My goal was to be engaged in the conversation and I have reached it much faster than I expected. I got lots of help and encouragement from people I wouldn’t even dear to email a year ago. I know much more than I did on things outside of my research area. While it sounds all pathetic, advantages of being part of this community are hard to overestimate – I wrote about it couple of times already (and Neils did a great job summarizing why you should have a web presence).

Where is “Freelancing science” heading? That’s a question I asked myself pretty often during last 12 months. At first, I just blogged about interesting stuff around bioinformatics. Then I made a jump into freelancing as scientists (and this experiment goes pretty well). Statistics on keywords people are using to find this blog clearly show that there’s some interest within bioinformatics community in following this path. But the idea for this blog I have right now is not about freelancing anymore. Or rather it’s about freelancing on the next level, because today I think about starting a non-profit institute.

I believe that small research groups formed as a non-profit organizations will have enourmous impact on science within next ten, twenty years (more about it in upcoming post about the future of scientists). In spirit of freelancing they will jump from one project to another (see Deepak’s post about bursty work and follow-ups), developing solutions and making discoveries much faster (or cheaper) than it is possible in beaurocratic environment. We do have tools for effective collaboration online, we have new generation scientists that do not feel attached to academic system and we have science which starts to evolve about undestanding data, not performing experiments. Is it time to try such approach?

So, watch this space to see how the idea develops. I’m also interested in your opinions and experiences with starting and cooperating with non-profits if you have any.

6 Comments

Posted by Pawel Szczesny on August 11, 2008 in Career, Community, Research

Tags: bursty work, Conversation, FriendFeed, Non-profit organization, Research

BadA head structure

09 Aug

Modularity is one of the most interesting features of the trimeric autotransporter adhesins, and probably one of the most frustrating. As I wrote before, domain annotation is quite difficult, especially that these proteins can have often few thousands residues in length.

BadA, the major adhesin of Bartonella henselae, is probably the best known large TAA out there. Its sequence served us as a unofficial benchmark for domain annotation tool. Its head consist of three domains, one resembling head of YadA and two others which we claimed are similar to Hia head domains. The claim at the moment of starting this project wasn’t supported very well – Evalues of HHpred alignments were around 1 (of course all less sensitive tools didn’t see anything), but we knew they must be similar (because that two,three conserved residues were at exactly where we expected). Crystal structure of these two domains from BadA couldn’t be solved directly, so we’ve attempted molecular replacement and that worked. On the picture above you can see three known head structures for TAAs, BadA (ours), Hia and YadA (full BadA head model in on the right) and arrangement of corresponding domains in all three proteins. The whole story and lots of pretty pictures (you must see EM figures) was published today yesterday in PLoS Pathogens (OA).

Today the story isn’t so exciting as it was at the beginning. Currently HHpred easily finds domains from Hia and BadA similar with high probability – it’s an advantage of bigger database size and more mediating sequences. But I’m still pretty happy about how it went – such projects build confidence in one’s analysis skills.

Domain annotation in trimeric autotransporter adhesins

2 Comments

Posted by Pawel Szczesny on August 9, 2008 in bioinformatics

Tags: Annotation, bioinformatics, protein, Protein domain, protein structure

By any measure I’m average at most

08 Aug

Image via Wikipedia

As you have probably noticed, yesterday’s BioBarCamp was covered in depth over at FriendFeed and additionally Cameron was streaming video live from the event (it’s still available under the same address). One particular session drawed my attention, because it was about measuring impact of scientists. It’s something I have very strong opinion about since couple of weeks, so forgive me this rant.

Peter Binfield (PLoS) and Pedro Beltrao did a great job on presenting current status of the issue and presented potential way to measure impact of a publication (quoting after Shirley – “your article received x citations, viewed x times, received x comments, bookmarked x times, rated x by experts, discussed on x respected blogs, appeared in x news media, etc etc” – instead of single “your article was published in journal with IF of X”). And while two months ago I was really interested in such discussions and willing to help, today I simply don’t care. The reason is simple and is presented in the post title: by any measure, I’m average at most.

That’s absolutely obvious that majority of scientists is at most average by any standard or measure. And that is not going to change, at least not much. Those who are at the top by Impact Factor today, will be at the top by other measure. Those who do some not-that-important stuff like me, will be still pretty average by other measure. One of the reasons may be all kinds of issues with normalization of the field size (there’s too much problems with biological ontologies to believe that dividing science space into fields is going to work much better). Another thing may be relative importance of the field (that’s something different from field size) – human research will always draw more attention than electrochemistry. And I could go on and on – all these issues aren’t novel and have been described and discussed in thousands of blog posts. The point is that even if such new ideal measure is going to be fair, it will not change life of majority of scientists. Not only because some of us do average things, but also because some of us have average money (BTW, I haven’t found much discussion on including in the measure research budget, which surprises me given the fact that amount of money spent on a project correlates pretty much with the IF of the journal it is published in afterwards).

So, I don’t really care if IF stays or not (although people working on improving measuring get my deep respect). Reputation-wise I’m going to be in the middle unless I will make something extraordinary. But honestly to make a scientific breakthrough the last thing I need is a number describing quality of my thinking.

8 Comments

Posted by Pawel Szczesny on August 8, 2008 in Career, Comments, Research

Tags: Academic publishing, BioBarCamp, FriendFeed, Impact Factor, Streaming media

FriendFeed: where the conversation happens

28 Jul

The start of this post (see the image above) may be a good reason for many people for not to join FriendFeed 🙂 . It shows what happened to number of visitors to this blog after I joined FF – it had dropped by half (actual numbers aren’t relevant, graph shows monthly statistics). The reason is pretty obvious for any long-time blogger – no posts, no visitors. I don’t post as often as before for a good reason – sharing news, interesting links and the whole conversation around these happens on the FriendFeed. While I didn’t set up a dream system I wanted to (see my comment on previous post on the FF), I don’t have any issues with so-called “information overload“. Actually, I don’t believe in any information overload – we are just pretty bad at managing incoming information – but that’s a story for another post.

Rooms are neat feature of FriendFeed – they act a filter and keep the conversation focused. Instead of looking at a stream of titles ranging from linux hacks, through hardcore programming stuff and other bioinformatics-related topics, up to cancer research and science philosophy, I can just go into one of the rooms and see only items related to a particular topic. Yesterday Deepak wrote on the new rooms at FF (for Python, Ruby and R for Bioinformatics) that were created by people from life-science community. There is also a room for Science 2.0 and Open Science, DIYBiology and even a room which collects links to a must-read material – BioGang classics (since I started this post, Ricardo had created OpenWetWare FriendFeed room).

Rooms help in keeping the flow of links under control, but the conversation is the key point of using FriendFeed. Almost every single item posted into The-Life-Scientists room generates comments, sometimes turning into pretty long discussion. Because FF aggregates Twitter updates, majority of “Dear Lazyweb” Twitter requests result in FriendFeed based conversations. And there’s more and more people participating (The Life Scientists room has over 200 members). As usual, there’s a catch – focus and depth are not good sides of FF comments (for example, compare reaction to the recently posted very nice essay by Michael Nielsen on The Future of Science: number of comments on his blog and on the FF are comparable, although discussion/arguing with the essay points happened mostly on the blog). But that’s not a problem – it’s just a result of a speed with which items appear and disappear on the FriendFeed (some of you have seen that tracking real-time stream from concurrent sessions on the recent ISMB conference).

Even such shallow and quick interactions with people on the FriendFeed generate some level of trust, and that I think will lead to couple of interesting things:

more people will try how does the online collaboration work (for example, in reflection after recent Cameron’s talk Brian Kelly from UKOLN wants to write his article online)
PI-level scientists will join FF to participate in the discussion (we see that already, although so far there’s only very few of them)
there will be serious articles why FriendFeed, Twitter and online collaboration are bad for scientists and how these can break their academic career, in similar way as there were for blogs (see recent Pedro’s post)
we will see (and read, since it’s going to be open-access) first peer-reviewed publication from an idea that originated at FF/Twitter

Is FriendFeed going to be a hub for science? I don’t really think so. At the time, when mainstream science will pick up FriendFeed I think we are going to be already somewhere else, because there will be more interesting and more useful platforms for scientific collaborations (like for example cyn.in – looks promising, although it’s not yet optimized product). But the time spent at FF will give us an advantage: connections, collaborations, wide spectrum of information and advice from smart people.

6 Comments

Posted by Pawel Szczesny on July 28, 2008 in Comments, Community

Tags: Conversation, FriendFeed, OpenWetWare, Twitter, Web 2.0

Growing in open source business model

21 Jul

Image by *w* via Flickr

Last couple of months I’ve been quite busy with writing PhD thesis and few other projects, but also I was trying to start an open project balancing between academia and industry. This balance sounds like an opportunity, but in fact it was an issue instead. The issue wasn’t in the money – I was lucky to find people willing to help me in getting funding. The issue was rather in what I need to give away in exchange for the money – openness, control over the project or all intellectual property rights. Being already established scientist or a business person would solve such issues immediately, but I am still PhD student, so I need to face it. And while I have still plenty of people to talk to (I think it will take another month or two), that left me thinking about career on the border between industry and academia.

On both sides, in academia and industry, career path (and I’m not talking here only about having a job, but also about starting a business by yourself) is somehow clear and one can get a significant help along the way, but I haven’t found such clear path on the border between these two. Open source business model seems to work well mostly for very established players (such as Apache or RedHat) – growing in such model looks much more difficult than on either of sides. Probably Antony Williams from ChemSpider (who was one of the people that inspired and encouraged me to follow this path) would say much more in here, especially how easy (is not) to get a financial support for working on a project like ChemSpider.

I don’t think about working in one or the other environment anymore. Being freelancing scientist has a lot of good sides and growing wouldn’t be an issue (for example I have enough collaborations and ideas to cover financially next 3-4 years from grants; publications would follow). But, as I wrote before,some of the projects I’d love to work on are unlikely to be funded in academic system. On the other hand, openness is too important for me to give it away, so only a merger of these two sounds interesting. There are few examples of successful merging industry and academia, but they all seem to operate on different principles, compared to my recent attempts. Craig Venter’s model was as far as I know most of the time double-sided – he had a non-profit search unit and a company that commercialized its discoveries. Pretty similar has also David E. Shaw. So I have started to wonder if sticking to borderline is actually the very best idea. Being involved on two fronts at the same time sounds pretty overwhelming, but so far these are the only examples when this whole idea seems to work. Are you aware of any others?

My other hope is that new ways of growing on the borderline will very soon emerge. There’s quite a lot happening right now on the front of supporting innovations (including open models), so maybe over there I will find my niche. We’ll see.

(The image above is not my desk. While I work in a home office, mine doesn’t look so clean.)

Further reading:

A microfunding system for research and innovation.

Pharma looks at new ways of innovate.

Discussion around business model around Open Data is building up.

Comments Off

Posted by Pawel Szczesny on July 21, 2008 in Career, Comments, Research

Tags: ChemSpider, Craig Venter, David E. Shaw, Intellectual property, Open Data, Open source

Bioinformatics Career Survey – two weeks left

21 Jul

If you haven’t filled the survey yet, please spend few minutes to do so over at Bioinformatics Zen. There are only two weeks left.

Comments Off

Posted by Pawel Szczesny on July 21, 2008 in bioinformatics

Configuring Torque and InterProScan

10 Jul

Image via Wikipedia

If by the chance, you want to use InterProScan with Torque Resource Manager (queueing system based on PBS project) it doesn’t work by default (it’s tested with LSF, configuration files are supplied for original PBS and Sun Grid Engine). Fortunately there are two small changes needed in the InterProScan config files to make it work. First, during iprscan configuration, choose PBS54 as your queueing system. Then, in the file pbs54.conf (IPRSCANHOME/conf) remove “-d” switch from following two lines:

asyncsub=qsub [%optqueue][%optresource] -d -o /dev/null -e /dev/null "[%toolcmd]"
syncsub=qsub [%optqueue][%optresource] -d -o /dev/null -e /dev/null -I "[%toolcmd"]

Assumming that Torque binaries are available in the global PATH (qsub, qdel etc., on my machine they sit under /usr/local/bin), change in default shell in the enviroment file pbs54env.sh – from #!/bin/sh to #!/bin/bash. Also, you can add another directories to the PATH in that file (I didn’t). Voilla. InterProScan jobs are now queued.

3 Comments

Posted by Pawel Szczesny on July 10, 2008 in bioinformatics, Software

Tags: InterProScan, qsub, queueing system, torque