RSS

Category Archives: Research skills

Collanos Workplace and scientific collaboration

One of my workspaces in Collanos

For some time already I was looking for a tool that would eliminate a need for sending files back and forth between people collaborating on a the same project. While I’m perfectly aware of various solutions such as wikis, version control systems or online office suites, I didn’t feel like I could convince my collaborators to use any of these. One of the reasons is always a feeling of insecurity when using publicly hosted platform (BTW, this is not that uncommon among scientists – I know at least one scientific institution in Western Europe that explicitly forbids using Google apps, especially Gmail for work-related stuff, because of Google’s privacy policy). The other reason was that such solutions are not the best choice when working on binary files (most of my projects do not involve collaborative programming). When I stumbled across Collanos Workplace, which offers peer-to-peer synchronization (although without revision control), instead of a central-server based, I’ve decided to give it a try. For the last couple of weeks I’ve been using Collanos to collaborate on one relatively simple project and the experience was quite positive.

At first, I thought that Collanos may serve mainly as a tool for secure peer-to-peer files sharing with an information who changed what etc. It turned out that this is a capable project management application, that has a chat and discussion panel, one can post notes, links add tasks and assign them to team members. Files are stored is a separate directory – after one adds a file to Collanos, it should be opened from the application, not from original folder. This seemed a mistake in design at first, but I appreciated it very quickly. Synchronization of project directory would mean sharing all of its contents and that can be sometimes in the range of many GBs. From time to time some bug appeared here or there, but overall it worked as expected. Peer-to-peer sharing means that both people have to be online for synchronization, but so far situation that I switched computer off before a person could download my changes happened only once and it was during a weekend.

As a side note, it’s nice to see that Eclipse becomes an application platform for quite a number of programs. See for example this list of Eclipse-based software.

1 Comment

Posted by Pawel Szczesny on January 15, 2009 in Research skills, Software

Tags: Google apps, Office suite, Peer-to-peer, Project management

Outsiders and great scientists

25 Feb

Last weeks brought another worth reading pieces on being a scientist: one in PLoS Computational Biology (found via The Evilutionary Biologist) and one over at Adaptative Complexity blog (found via Genome Technology). I would add a third one, albeit not strictly about scientists. This is “The power of the marginal” by Paul Graham. Graham in general writes about start-ups, but in this particular essay he put an advice, that I keep repeating myself over and over again:

If most of your ideas aren’t stupid, you’re probably being too conservative. You’re not bracketing the problem.

When I look back over the ideas I had, they could be categorized into four main groups: the ones that were published couple of years before I found them, the ones that were published just before, the ones that were published just after I started to work on them and finally the ideas I’m still working on because they were not published yet. In this light, Graham’s advice seems to me a pretty good way to escape this schema.

Comments Off

Posted by Pawel Szczesny on February 25, 2008 in Career, Research skills

Tags: advice, essay, ideas, Paul Graham, Research

“Startup weekends” in science

30 Jan

News about yet another “startup-weekend-like” event keep hitting me more and more often. They do not always are about creating a company or a product. Sometimes it’s about collaborative coding a game or writing a novel – all in very short time. In many cases it works amazingly well – being so tight on time forces people to be ultra-productive and to be focused only on important parts of the project. I envy people attending such meetings, not necessarily because of possible outcomes, but because of the energetic atmosphere that is present there.

Deepak wrote some time ago about “Bursty work” – idea, that work can be done by distributed teams focused around high value projects, instead of teams gathered around company/startup. That actually made me think if we can join these two ideas in science: to have ultra-productive and distributed team working on time-constrained project.

Lets assume that the average publication in the field of bioinformatics/computational biology takes six months of work of one scientist. It doesn’t really matter if it’s new server, database or protein family annotation. So a team of four people should do the same work in six weeks or faster (why faster? knowledge and skills are not distributed evenly, so someone else may code the necessary script faster than I would do it). If we would increase even further the number of people involved, create a distraction-free environment and prepare enough coffee for everyone, the whole process could be done in a week. Even if the assumptions here are not really correct, I’m pretty sure that quite a number of valuable papers could be done this way in a week.

So what do you think? What about creating a platform that allows for:

creating a project that has a clear and appealing outcome (for example publication, or at least manuscript in Nature Precedings)
creating a project workspace with all necessary tools (wiki, chat, svn, etc. plus small computational backend for testing)
creating a number of roles, that need to be filled by people with certain skills
joining the project if the skills match requirements
setting an clear deadline (for example, a countdown clock that will forbid to commit changes to the project after certain amount of time, leaving the workspace read-only)

I agree that science takes time, especially the quality science. But on the other hand, I have a feeling that we waste a lot of time learning things by ourselves, instead of learning form others, we waste this time because the outcome is not well defined, and finally we waste time solving everything ourselves instead of bouncing the idea against other people (this is what collaboration is all about). So what about creating an artificial environment that forbids wasting time?

Utopian? Maybe. Naive? Most likely. Worth considering? I hope so. Let me know.

12 Comments

Posted by Pawel Szczesny on January 30, 2008 in bioinformatics, Research skills, Services

Tags: collaboration, Research, Services, startups

Ten simple rules for doing your best research – according to Richard Hamming

06 Nov

There’s an editorial in PLoS Computational Biology presenting condensed thoughts on “first-class research” of mathematician Richard Hamming. It is based on a transcript of a brilliant talk given by Hamming in 1986 at the Bell Communications Research Colloquium Seminar. Definitely a must-read.

clipped from compbiol.plosjournals.org

Hamming’s 1986 talk was remarkable. In “You and Your Research,” he addressed the question: How can scientists do great research, i.e., Nobel-Prize-type work? His insights were based on more than forty years of research as a pioneer of computer science and telecommunications who had the privilege of interacting with such luminaries as the physicists Richard Feynman, Enrico Fermi, Edward Teller, Robert Oppenheimer, Hans Bethe, and Walter Brattain, with Claude Shannon, “the father of information theory,” and with the statistician John Tukey.

Comments Off

Posted by Pawel Szczesny on November 6, 2007 in Clipped, Research skills

Tags: plos comp bio, Research, ten simple rules

Publishers, please provide images in the journal’s feed

23 Oct

Currently almost all scientific journals provide a RSS feed of their content. I am very grateful for that, as many other scientists. However, I see a large disproportion between what I would like to scan through and what I’m actually able to. Abstracts, which are usually provided in these feeds, look almost exactly the same, and after thirty or fifty I’m not really sure if I understand what I read (most likely even Robert Scoble would have the same problem, although at a different level). Biochemistry journal found a solution (see below) – providing images instead of abstracts in the feed (in fact, they used this approach on their homepage long before RSS feeds became a standard). It’s simple, very effective and I have no idea at all why it’s not more popular. So here I’m asking – publishers of scientific journals: please give me images in the journal’s feed. I don’t mind having also the abstracts, but visual summary would be much more useful. In return I promise read these feeds instead of relying on Pubmed searches or my colleagues recommendations and GR shared items.

1 Comment

Posted by Pawel Szczesny on October 23, 2007 in Papers, PubMed, Research skills

Tags: biochemistry, PubMed, Research, science journals

Manual sequence analysis – some common mistakes

25 Sep

This is a topic I probably will come back to on many occasions. Publication with very wrong sequence analysis like the one Stephen Spiro pointed out on his blog is not an exception. I may agree that large scale analysis can stand quick and dirty treatment of protein sequence (and some error propagation at the same time). In large scale analysis nobody cares if the domain assignment is 100% right (it isn’t), if there are false positives (there are) or even if the material to begin with (protein sequences for example) is free of errors (it is not) – as long as the overall quality of the work is acceptable. However, this optimistic approach cannot be applied to the manual protein sequence analysis. Simply errors introduced in such cases are a way more important. How to avoid some of these errors? A few common mistakes that come to my mind are:

lack, not accurate or quick and dirty domain annotation: this probably is a topic for separate post, but in short – relying on a single method or strict E-value, excluding overlaps, ignoring internal repeats, forgetting about structural elements like transmembrane helices etc. lead to mistakes in domain annotation
running PSI-BLAST search on unclustered databases: the profile for many query sequences will get biased and diverge in a random direction if the PSI-BLAST runs on the unclustered database (remember 500 copies of the same protein in the results?); after all these years I still don’t get why NCBI does not provide nr90 (non-redundant db clustered at 90% identity threshold) for the PSI-BLAST
running PSI-BLAST without looking at the results of each run: if you don’t assess what goes in, you risk allowing some garbage
masking low-complexity, coiled-coils and transmembrane regions in BLAST search on every single occasion: while most of the times this is a valid approach, there are cases where the answer is revealed after turning the masking off
skipping other tools for sequence analysis like predictors of signal sequences, motifs, functional sites
skipping analysis of a genomic context: while not applicable to all systems, analysis of the genomic context may influence dramatically function prediction

It’s so far all I could think of. Do you have any other suggestions? Let me know.

1 Comment

Posted by Pawel Szczesny on September 25, 2007 in Comments, Research, Research skills

On the scripting skills

19 Aug

The interview with dr Alexei Drummond inspired an interesting discussion. While I agree that some level of training in programming would be very beneficial for the biologists, I think that there’s something more important people working at the bench should learn – using the tools for biological data analysis. The scripting skills are fine, they save often enormous amount of time, however not willing to learn how to do a BLAST search (or any other basic tool in the field) and interpret results, leads to publishing papers with errors (the best case) or with completely wrong conclusions (that is more often). I’m not talking about becoming an expert – this can take years, like in programming and this should be left to people spending the whole day doing data analysis (aka bioinformaticians). I’m talking about “scripting” equivalent of programming and this level is currently taught on bioinformatics undergraduate courses at most of the universities. Such training would save the world from papers comparing multiple sequence alignments from Clustal and… BLAST (if some readers do not know – BLAST at best can produce multiple pairwise alignment; it does not align all the sequences together).
These are my two cents. I hope to hear your opinion on that.

3 Comments

Posted by Pawel Szczesny on August 19, 2007 in Comments, Research skills