RSS

Author Archives: Pawel Szczesny

Closing down Freelancing Science shop

It’s finally time to close down Freelancing Science shop. I will post in a different place, under more general domain name and on self-hosted WordPress installation. Visit my new site over at www.pawelszczesny.org.

I’m moving because existing form and scope of this blog has been more and more frustrating. I’m going to continue experiments with different approaches to scientific career, but this is not going to be the main topic of the new site. Additionally, I don’t want to suddenly spam people who subscribed to this blog when I was more interested in bioinformatics with non-scientific topics.

The new site will explore large number of different fields, such as systems science, photography, dynamic processes, biocomplexity, memetics, but also the topics covered here, such as science 2.0, bioinformatics, structural biology or data visualization. If you aren’t interested in any of new topics, you can subscribe only to selected notebooks (categories).

Within a month or so, commenting will be closed.

https://freelancingscience.com/2009/06/22/open-science-what-is-your-message/

2 Comments

Posted by Pawel Szczesny on April 8, 2010 in Comments

Proposal for Science 2.0 lectures

07 Dec

I’ve just submitted a proposal for three lectures about different aspect of Science 2.0. Target audience are PhD-students. Below you can find a brief overview. Probably the details will change a bit when I start to prepare the lectures (for example I’m aware that Etherpad is on its way out), but nevertheless you are very welcome to comment and suggest different approach.

Science 2.0 – practical aspects of the internet revolution

Part 1 – communication, collaboration, visibility

New communications channels (blogs, microblogs, aggregators, virtual conferences ans poster sessions) and examples of successful applying in science. New roles of blogs, Research Blogging initiative. Wikis, Etherpad and Google Documents/Wave – platforms for document co-writing. Collaboration for programmers, Git. Visibility and recognition in the internets: StackOverflow and ResearcherID.

Part 2 – practical open science

Spectrum of openness in science. Community annotation of genes/proteins/structures and why these aren’t so successful. Crowdsourcing and citizen-science. Overview of open data repositories, focusing on open data coming from pharma industry. Mechanisms of Open Access and Open Notebook Science. Current discussions on intellectual property – what’s not protected and what’s not licensable?

Part 3 – searching for information and literature management

Information overflow – myth or fact? Searching for information – differences between PubMed and Google Scholar. Semantic analysis of abstracts based on GoPubMed and NovoSeek. Targeted text-mining tools. Literature management: online (Connotea, CiteULike) and desktop (Zotero, Mendeley) approaches. Alternatives for EndNote. Automated or not – literature recommendations.

4 Comments

Posted by Pawel Szczesny on December 7, 2009 in Community

Tags: lectures, open-science, science, science2.0

Complex systems and biology – introduction

04 Dec

What you can read in here is a set of my loose notes on complex systems and biology. I want to learn about the topic as fast as I can, so if I’m wrong anywhere, please point that to me. This post is an overview and indication of issues I’d like to cover.

Image source: http://commons.wikimedia.org/wiki/File:Biocomplexity_spiral.jpg

Complex adaptive systems (CAS) are the heart of many phenomenas we observe every day, such as global trade, ecosystems, human body, immune system, internet and even language. Complexity of CAS does not equall to amount of information, rather it’s a indication of complex, positive and negative interactions of its components. All CAS feature a common set of dualisms:

distinct/connected – CAS are built of a large number of agents that interact simultaneously and independently but all together become tightly regulated system (other names: individual/system or distributed/collective)
robust/sensitive – CAS are pretty robust, yet at the same time are quite sensitive to initial conditions and some signals (see butterfly effect); both features are unpredictable
local/global – protein is a CAS, protein network is a CAS, cell is a CAS, tissue is a CAS, organism is a CAS, society is a CAS; agents of a CAS, can be CAS themselves
adaptive/evolving – CAS is able to adapt as a system and usually its agents are also mutually adaptive, and at the same time CAS is evolving; even if local landscape prefers simpler solutions (adaptation) CAS usually evolve toward bigger complexity

These dualisms are in some sense as artificial as wave-particle dualism. Complex system has all these features at the same time – their visibility depends only on design of a experiment. As a result, CAS present a common set of features: they are self-organizing, coherent, emergent and non-linear.

Probably the best so far representation of CAS is a network, which has a number of important features: it is scale-free (distribution of links in the network tends to follow power law), clustered (“friend of my friend is likely my friend too”) and small-world-like (diameter of a network is small, aka “six degrees of separation”). Such representation has been applied to biological complex systems, such as metabolic networks, or protein-protein interaction networks with a great success. However please remember that it’s only representation and many times people argued that scale-free networks may not be the best approximation of natural networks (see for example this recent paper).

Scale-free or not, network representation doesn’t address all dualities mentioned above, especially last two. Naturally emerging levels of organisation and relation between adaptation and evolution of complex systems are rarely studied from biological point of view, probably because we don’t have a clear idea how to reduce these phenomenas to something measurable.

In the next posts, I will try to cover other CAS representations and computational approaches to CAS modeling.

2 Comments

Posted by Pawel Szczesny on December 4, 2009 in bioinformatics

Science 2.0 in Poland – getting popular, recognized as important

28 Nov

Few days ago I had a chance to speak about Science 2.0 at the Institute of Biochemistry and Biophysics of Polish Academy of Sciences (the one I’m affiliated with). Compared to the seminar on the same topic I gave at the same place (but for much smaller audience) 4 years ago, I had much more stories to tell, way more real-life examples and better idea of where the whole “2.0” meme is leading us. I also got better at speaking (4 years ago some of my colleagues literally slept on my seminar). So, message got clearer, and messenger had improved.

But given wide interest in the topic from inside and outside of academic environment already before the seminar I think two things had happened in Poland in the last 4 years. First, internet got recognized as a game changing technology, and people simply are interested in any new way they can use this tool (yes, I know it’s 2009 – if you live on the nets it’s hard to realize how slow adoption rate is outside of virtual worlds). Second thing is, that internet as a tool is also recognized as important – for example people had ideas to include Science 2.0 topics into program of PhD studies (I will follow up on this topic in a week or two). Getting popular, important… Only wide adoption is what we need :).

Comments Off

Posted by Pawel Szczesny on November 28, 2009 in bioinformatics

Notes from Next Generation Sequencing Workshop in Rome

21 Nov

I was in Rome for two days attending Next Generation Sequencing Workshop organized by EMBRACE (EU FP6 NoE), UPPMAX and CASPUR with the support of the Italian Society of Bioinformatics. It was pretty interesting event and I want to share with you couple of interesting things I’ve learned there.

Hardware layer

First day was devoted mainly to the hardware side of NGS. It started with a presentation from Tony Cox from Sanger Institute who described a hardware setup used to support their sequencing projects. At 400 gigabases a week (current output) Sanger IT infrastructure is stretched in every direction (capacity, availability, redundancy) and Tony pointed out that each sequencing laboratory is going to face similar issues did sooner or later. His advice for such labs was to estimate first number of bases produced and then use multipliers to assess storage requirements for the project. A minor thing that I’ve noticed in his talk was exposing databases as filesystem via FUSE layer – I might use that approach in some projects too.

George Magklaras from The Biotechnology Centre of Oslo described a number of approaches they took during implementation of their infrastructure. He talked about FCoE, Fibre Channel over Ethernet, and pointed out that it’s cheaper and almost as efficient as Fibre Channel alone. At the Centre they use Lustre (Sanger is too), high performance networked file system, but they benchmark other solutions too, because some situations/projects require transparent and efficient data encryption (mostly medical data). Similarly to Tony, George pointed out that compartmentalization of data is necessary, as moving large amounts of files over the network creates a unnecessary bottleneck.

Other interesting talk was from Guy Cochrane from EBI about Sequence Read Archive. It was an overview of the project, but again with few interesting tidbits that drawn my attention. One of them was Aspera, much faster alternative (and secure at the same time) to good old FTP. He also presented a data reduction strategy that if I understood correctly is not yet implemented over at SRA, but might be some day in the future. First point was deletion of intensity data – that’s something perfectly reasonable but is heavily opposed by a number of scientists. Then, all only consensus is preserved plus second most frequent base (important for polymorphism studies). The minimum for long-term storage was proposed to consist only of sequence and quality data.

Software

Majority of second day was devoted to software. It doesn’t make sense to list all described projects – I will share with you only my general impression.

Despite large number of scientists devoting their time to develop new tools for next generation sequencing data, I think that software lags a little behind other technological advances in this area. In case of really large amount of data, assembly becomes hard or impossible, mapping erroneous, annotation too slow (pilot study of 1000 genomes project generated so much data, that computing farm was busy for full 60 days – on single CPU it would take 25 000 days). Software development for NGS differs dramatically compared to scientific software in general and needs much much better programmers than we usually are. For example, Desmond Higgins was praising open source software – they found extremely fast implementation of UPGMA algorithm (much faster then their own), and they could speed up their tool (SeedMap) so much that it’s running it on even largest family of sequences in a reasonable time.

Another bottleneck was data presentation layer – there are some attempts to make digging into data easier, but having a biologically meaningful overview is as hard as it was before. Other people pointed our that problem too (I wasn’t the only biologist there).

Need for stronger community

Probably the most funny part of the workshop was the discussion about creating an organized community of people working with next generation sequencing technologies. It was funny is this sense, that some consensus about community emerged quite fast. How to build it – that was another story. Obviously lots of participants were sure that if they build a site, people will come. Yeah, sure. 🙂 I’ve suggested using wiki in the first place and additionally hire a community manager if they really want to gather people from many different forums, sites, groups etc. Lot’s of people didn’t buy these ideas, suggesting more traditional approach, so curious if they were right, I’m going to follow development of this community.

NGS = high tech

Probably the most important lesson was to realize that sequencing is a field with very high requirements for infrastructure and even higher requirements for skilled staff. Basically every element of the infrastructure may become a bottleneck and if you want to avoid it, cost of data maintenance and analysis exceeds very fast cost of producing the data. When I talked about it to many people during the last year (I’m involved in some sequencing projects at the analysis/annotation step) often people felt I overestimate infrastructure needs. Now I have some specific number to back it up :).

2 Comments

Posted by Pawel Szczesny on November 21, 2009 in bioinformatics

Tags: next generation sequencing, ngs, workshop

Microstocks are for scientists too

05 Nov

Money is rarely directly discussed on science blogs, but rarely science bloggers say that they don’t care. Quite a number of them run advertisement or affiliate programs on their sites, trying to monetize the traffic they generate. And while I don’t know specific numbers, my estimate is (some time ago I did run such programs on a photography blog which was way more popular that this one) that in majority of cases it buys them a coffee or two per week. This blog is hosted over at WordPress.com and WP.com team forbids inserting your own scripts into the blog (occasional affiliate links seem to be fine, if you’re interested). Making money from Google ads wasn’t an option for me. But I have tried to earn money by sending images of molecules to microstock sites and that seems to be more profitable than previous strategy.

Inspiration to write this post came from the fact that I’ve recently logged into one of the sites and I was quite surprised to see that despite the fact I didn’t upload anything for almost two years, my images are still selling quite well. In majority of microstock sites your gallery exposure is bigger if you upload new stuff on regular basis. So the conclusion is that after two years there’s still not many similar images of molecules to choose from.

hemoglobin

Above you can see one of the attempts to create nice picture of hemoglobin molecule. That should give you an idea what images are selling well. Simple, clean, bright colours etc. Few other suggestions:

pay attention to the license under which the software you use to generate images is distributed. For example, you cannot use VMD or Chimera (both have non-commercial licenses), while Qutemol (under GPL) is fine.
use automated submitters (available for all platforms), instead of relying on ftp. You just don’t want to manually annotate dozens of images on the web. The other route is to fill IPCT tags.
submit to all microstock sites that let you in, but start with the bigger ones (iStockPhoto, Dreamstime, Fotolia, Shutterstock etc.)
if you have time, experiment with graphics or 3D software. Additional modifications in GIMP or Blender occasionally produce interesting images.
If you live in a strange country, check first regulations under which you can earn money via microstock. In Poland for example, you need to start a company first (which, as my Polish readers can confirm, is a really painful process)

2 Comments

Posted by Pawel Szczesny on November 5, 2009 in Money

Tags: GNU General Public License, Microstock photography, Wordpress.com

Basket as a writing tool, SCAN as a collector

03 Nov

Basket has been my favourite notetaking software for a long time, until I had switched to mindmaps. Quite recently I’ve discovered another use for it – a writing aid. Basket in one-column mode allows to rearrange your notes just by dragging them up or down (there’re keyboard shortcuts for that as well). When I’m writing a longer piece, I don’t need to hold a structure of the article in my head. I just collect all the pieces (quotes, blog posts fragments, my own notes, links, tweets etc.) and then rearrange it as much as it’s needed. When the flow of the thoughts is optimal, I start to connect these pieces by writing some text in between :).

I don’t have DevonThink (I don’t have Mac) but for finding similar things in my archive I use SCAN. SCAN can aggregate content from a number of sources (it has plugins to read PDFs, OpenOffice and MSOffice files or even RSS feeds), analyze it, automatically assign tags, extract metadata etc. It has Lucene engine built in and does quite a good job of finding related pieces in the archive. It’s quite buggy, doesn’t read all PDFs (such as encrypted), metadata extraction doesn’t work as expected but overall the tool has a potential (and there’s no similar program available on Linux platform anyway). Its development was recently restarted so there’s hope it’s going to be improved in a near future. Additionally, it has a nice eye-candy – a visual overview of relations between tags.

This strategy is similar to the workflow described by Steven Johnson, but without DevonThink. So far I haven’t found anything better under Linux, but probably I need to check online apps – things do change every month.

1 Comment

Posted by Pawel Szczesny on November 3, 2009 in Research, Software

Tags: Basket, Metadata, RSS, SCAN, Workflow

Transitions, transitions

28 Oct

Quite a few things happened while I was away. If you’re interested, here’s not so short summary of my internet hiatus:

Research area

I think I’m done with bioinformatics. My current research area seems to be located somewhere between systems biology, theoretical biology and information/complex systems theory. I hope to build on Dawkins work, deal with emergence in biology and study subtle effects in biological systems. While I’m not sure if I will have anything interesting to show ever, I don’t have energy to do yet another project which involves programming/web interfaces/dealing with data/annotations/modelling etc. I’m done with analytics, time for synthesis :).

Carrer

Last year I wrote a post dreaming about small non-profit contract research organisation. This model of Research-as-a-Service has materialized in a virtual research institute which we have finally launched few days ago (materialized in something virtual, sign of times? 😉 ). The setup is quite simple – the institute gets a project (or applies for such) and then it searches for researchers/institutions/freelancers which are willing to subcontract parts of the project. We have outsourced not only research part, even money gathering (writing grants, etc.) is done by external company. The setup is quite flexible and pretty transparent – for example, we may represent somebody’s rights, but no intellectual property is owned by the institute. Why such institution? We become a single point of contact for a large and diverse group of scientists, which are willing to do some research for real money but don’t have time and energy to hunt for gigs by themselves. While I have an academic job, I’m in the middle of transition from being a freelancer, to being a jobs provider for freelance scientists. More on that in some other post.

Open science

I plan to spend way more time on advocating open science (all of its flavors), but… in Polish. This step is out of large frustration that even prominent figures in Polish science have no idea about changes in the science internet-aware researchers are watching and creating. Knowledge about even basic things like Open Access is dramatically low in Poland (a number of people here equals OA with low quality publications which have not been peer-reviewed). With few friends, we have a number of projects in the pipeline (for example, we hope to launch a nation-wide, created by professionals promotional campaign – bilboards, TV commercials etc. – for open science). If any of these actually works, I will let you know if we have any measureable success 😉 .

Labels, labels

Robert Anton Wilson tells a nice story in his book Prometheus Rising:

William James, father of American psychology, tells of meeting an old lady who told him the Earth rested on the back of a huge
turtle.

“But, my dear lady,” Professor James asked, as politely aspossible, “what holds up the turtle?”
“Ah,” she said, “that’s easy. He is standing on the back of another turtle.”
“Oh, I see,” said Professor James, still being polite. “But would you be so good as to tell me what holds up the second turtle?”
“It’s no use, Professor,” said the old lady, realizing he was trying to lead her into a logical trap. “It’s turtles-turtles-turtles, all the way!”

Another story is a comment from my advisor about putting my real research plans in some proposal (he supports these plans):

The most likely a reaction from reviewers will be something like this: “Nice start, some decent papers, PhD looks good. And then he got crazy.”

I feel like screaming “Labels, labels, labels, all the way!” when facing stiff schemas of what scientists “is” or what artists “is” etc. It’s a hard task by itself to integrate multiple passions and multiple interests into a coherent structure. I don’t need another set of issues because of labels people attach to seemingly creative professions. But limiting myself only to topics consistent with the image of an online scientist became even more frustrating. Therefore expect that this blog (or any other venue I choose to express myself) is going to become a lot more diverse in topics and form.

1 Comment

Posted by Pawel Szczesny on October 28, 2009 in Comments, Research, Science and Art

Open Science: a step towards Open Innovation

02 Jul

Open Innovation is a catchy phrase, but I don’t think we are that close to it, as many people claim. Innocentive, InnovationXchange or NineSigma operate in the very small market, and this market does not seem to grow as fast as we would wish. Innocentive posted some statistics as of 2nd of June, 2009, so given these numbers and amount of open challenges, it’s safe to assume that as of today, around total of 1000 challenges were posted and ca. half of them were awarded. If you compare that numbers with almost 200 0000 patents issued only by US Patent Office in 2006, it gives a clear picture of the size of the market open innovation crowdsourcing companies (edit: as Jean-Claude points out in the FriendFeed comment, Innocentive and the other two companies mentioned earlier are rather crowdsourcing, not “open innovation” companies) are operating in. There are plenty of reasons why OI did not yet become mainstream (too many to list) and for that to happen, there are two important steps that we need to make first.

Open Science must become mainstream

I’ve been advocating Open Science for some time and I’m following Open Science luminaries for much, much longer. At some point it hit me that Open Science in its fullest form is not an issue that scientists can truly solve by themselves. Open Science crosses domain of Science – it’s an issue for Science, Politics and Business. We should experiment with various ways the research is done, collaborate openly, attempt to invent new business models to fund science and spread “open” meme as much we can. However, the real deal will be made between people in power from these three domains. Why this is necessary to achieve that before we may fully innovate in the open? Because in this step we will sort out all the problems we have today with intellectual property and technology transfer (both being not efficient enough for today’s standards). I cannot envision that happening in other domain – we are paid to collaborate and test ideas. This community is able to hit every major obstacle to “open” in a very short time. And once we have these obstacles removed there’s a next step:

Working models of Open Science should be tested outside of Science

In other words I postulate that whatever solutions work in domain of Science, these should be tested outside of it, in other domains. Not vice versa. Principles of Open Source software did not prove to be useful in open drug development (see Joerg’s post on the topic). Crowdsourcing will not advance quantum physics. Not all aspects of collective intelligence are working in Science. We simply need to invent working solutions within the domain first, and then test them in other domains, such as art or engineering. This step will provide another set of protocols, changes and adjustments that will allow seekers and solvers (to use Innocentive’s nomenclature) to work efficiently together crossing every domain.

Open Innovation is not a single step

I may be proved wrong by some genius that will solve Open Innovation proovedissues in a single brilliant step, but so far I believe that we need more than one to achieve this goal. And it is important to recognize that Open Science is a great opportunity to come closer to it. The sooner we realize it, the better.

5 Comments

Posted by Pawel Szczesny on July 2, 2009 in Comments, open-science

Tags: InnoCentive, InnovationXchange, NineSigma, Open innovation, Science in Society, United States Patent and Trademark Office

Visual analysis in not only about seeing

29 Jun

I’ve just sumbled across this short video on work of Turkish artist Esref Armagan, born blind, who nonetheless paints and draws. I will let you draw your own conclusions – mine are briefly expressed in the title of this post.

Hat tip Mayer Spivack.

Update: if you cannot see video embedded, here’s a link.

Comments Off

Posted by Pawel Szczesny on June 29, 2009 in Comments

Tags: ART, visual analysis, Visual arts