Proposal for Science 2.0 lectures

December 7, 2009 Pawel Szczesny 4 comments

I’ve just submitted a proposal for three lectures about different aspect of Science 2.0. Target audience are PhD-students. Below you can find a brief overview. Probably the details will change a bit when I start to prepare the lectures (for example I’m aware that Etherpad is on its way out), but nevertheless you are very welcome to comment and suggest different approach.

Science 2.0 – practical aspects of the internet revolution

Part 1 – communication, collaboration, visibility

New communications channels (blogs, microblogs, aggregators, virtual conferences ans poster sessions) and examples of successful applying in science. New roles of blogs, Research Blogging initiative. Wikis, Etherpad and Google Documents/Wave – platforms for document co-writing. Collaboration for programmers, Git. Visibility and recognition in the internets: StackOverflow and ResearcherID.

Part 2 – practical open science

Spectrum of openness in science. Community annotation of genes/proteins/structures and why these aren’t so successful. Crowdsourcing and citizen-science. Overview of open data repositories, focusing on open data coming from pharma industry. Mechanisms of Open Access and Open Notebook Science. Current discussions on intellectual property – what’s not protected and what’s not licensable?

Part 3 – searching for information and literature management

Information overflow – myth or fact? Searching for information – differences between PubMed and Google Scholar. Semantic analysis of abstracts based on GoPubMed and NovoSeek. Targeted text-mining tools. Literature management: online (Connotea, CiteULike) and desktop (Zotero, Mendeley) approaches. Alternatives for EndNote. Automated or not – literature recommendations.

Complex systems and biology – introduction

December 4, 2009 Pawel Szczesny 2 comments

What you can read in here is a set of my loose notes on complex systems and biology. I want to learn about the topic as fast as I can, so if I’m wrong anywhere, please point that to me. This post is an overview and indication of issues I’d like to cover.

Image source: http://commons.wikimedia.org/wiki/File:Biocomplexity_spiral.jpg

Complex adaptive systems (CAS) are the heart of many phenomenas we observe every day, such as global trade, ecosystems, human body, immune system, internet and even language. Complexity of CAS does not equall to amount of information, rather it’s a indication of complex, positive and negative interactions of its components. All CAS feature a common set of dualisms:

  • distinct/connected – CAS are built of a large number of agents that interact simultaneously and independently but all together become tightly regulated system (other names: individual/system or distributed/collective)
  • robust/sensitive – CAS are pretty robust, yet at the same time are quite sensitive to initial conditions and some signals (see butterfly effect); both features are unpredictable
  • local/global – protein is a CAS, protein network is a CAS, cell is a CAS, tissue is a CAS, organism is a CAS, society is a CAS; agents of a CAS, can be CAS themselves
  • adaptive/evolving – CAS is able to adapt as a system and usually its agents are also mutually adaptive, and at the same time CAS is evolving; even if local landscape prefers simpler solutions (adaptation) CAS usually evolve toward bigger complexity

These dualisms are in some sense as artificial as wave-particle dualism. Complex system has all these features at the same time – their visibility depends only on design of a experiment. As a result, CAS present a common set of features: they are self-organizing, coherent, emergent and non-linear.

Probably the best so far representation of CAS is a network, which has a number of important features: it is scale-free (distribution of links in the network tends to follow power law), clustered (“friend of my friend is likely my friend too”) and small-world-like (diameter of a network is small, aka “six degrees of separation”). Such representation has been applied to biological complex systems, such as metabolic networks, or protein-protein interaction networks with a great success. However please remember that it’s only representation and many times people argued that scale-free networks may not be the best approximation of natural networks (see for example this recent paper).

Scale-free or not, network representation doesn’t address all dualities mentioned above, especially last two. Naturally emerging levels of organisation and relation between adaptation and evolution of complex systems are rarely studied from biological point of view, probably because we don’t have a clear idea how to reduce these phenomenas to something measurable.

In the next posts, I will try to cover other CAS representations and computational approaches to CAS modeling.

Categories: bioinformatics

Science 2.0 in Poland – getting popular, recognized as important

November 28, 2009 Pawel Szczesny Leave a comment

Few days ago I had a chance to speak about Science 2.0 at the Institute of Biochemistry and Biophysics of Polish Academy of Sciences (the one I’m affiliated with). Compared to the seminar on the same topic I gave at the same place (but for much smaller audience) 4 years ago, I had much more stories to tell, way more real-life examples and better idea of where the whole “2.0″ meme is leading us. I also got better at speaking (4 years ago some of my colleagues literally slept on my seminar). So, message got clearer, and messenger had improved.

But given wide interest in the topic from inside and outside of academic environment already before the seminar I think two things had happened in Poland in the last 4 years. First, internet got recognized as a game changing technology, and people simply are interested in any new way they can use this tool (yes, I know it’s 2009 – if you live on the nets it’s hard to realize how slow adoption rate is outside of virtual worlds). Second thing is, that internet as a tool is also recognized as important – for example people had ideas to include Science 2.0 topics into program of PhD studies (I will follow up on this topic in a week or two). Getting popular, important… Only wide adoption is what we need :) .

Categories: bioinformatics

Notes from Next Generation Sequencing Workshop in Rome

November 21, 2009 Pawel Szczesny 2 comments

I was in Rome for two days attending Next Generation Sequencing Workshop organized by EMBRACE (EU FP6 NoE), UPPMAX and CASPUR with the support of the Italian Society of Bioinformatics. It was pretty interesting event and I want to share with you couple of interesting things I’ve learned there.

Hardware layer

First day was devoted mainly to the hardware side of NGS. It started with a presentation from Tony Cox from Sanger Institute who described a hardware setup used to support their sequencing projects. At 400 gigabases a week (current output) Sanger IT infrastructure is stretched in every direction (capacity, availability, redundancy) and Tony pointed out that each sequencing laboratory is going to face similar issues did sooner or later. His advice for such labs was to estimate first number of bases produced and then use multipliers to assess storage requirements for the project. A minor thing that I’ve noticed in his talk was exposing databases as filesystem via FUSE layer – I might use that approach in some projects too.

George Magklaras  from The Biotechnology Centre of Oslo described a number of approaches they took during implementation of their infrastructure. He talked about  FCoE, Fibre Channel over Ethernet, and pointed out that it’s cheaper and almost as efficient as Fibre Channel alone. At the Centre they use Lustre (Sanger is too), high performance networked file system, but they benchmark other solutions too, because some situations/projects require transparent and efficient data encryption (mostly medical data). Similarly to Tony, George pointed out that compartmentalization of data is necessary, as moving large amounts of files over the network creates a unnecessary bottleneck.

Other interesting talk was from Guy Cochrane from EBI about Sequence Read Archive. It was an overview of the project, but again with few interesting tidbits that drawn my attention. One of them was Aspera, much faster alternative (and secure at the same time) to good old FTP. He also presented a data reduction strategy that if I understood correctly is not yet implemented over at SRA, but might be some day in the future. First point was deletion of intensity data – that’s something perfectly reasonable but is heavily opposed by a number of scientists. Then, all only consensus is preserved plus second most frequent base (important for polymorphism studies). The minimum for long-term storage was proposed to consist only of sequence and quality data.

Software

Majority of second day was devoted to software. It doesn’t make sense to list all described projects – I will share with you only my general impression.

Despite large number of scientists devoting their time to develop new tools for next generation sequencing data, I think that software lags a little behind other technological advances in this area. In case of really large amount of data, assembly becomes hard or impossible, mapping erroneous, annotation too slow (pilot study of 1000 genomes project generated so much data, that computing farm was busy for full 60 days – on single CPU it would take 25 000 days). Software development for NGS differs dramatically compared to scientific software in general and needs much much better programmers than we usually are. For example, Desmond Higgins  was praising open source software – they found extremely fast implementation of UPGMA algorithm (much faster then their own), and they could speed up their tool (SeedMap) so much that it’s running it on even largest family of sequences in a reasonable time.

 

Another bottleneck was data presentation layer – there are some attempts to make digging into data easier, but having a biologically meaningful overview is as hard as it was before. Other people pointed our that problem too (I wasn’t the only biologist there).

Need for stronger community

Probably the most funny part of the workshop was the discussion about creating an organized community of people working with next generation sequencing technologies.  It was funny is this sense, that some consensus about community emerged quite fast. How to build it – that was another story. Obviously lots of participants were sure that if they build a site, people will come. Yeah, sure. :) I’ve suggested using wiki in the first place and additionally hire a community manager if they really want to gather people from many different forums, sites, groups etc. Lot’s of people didn’t buy these ideas, suggesting more traditional approach, so curious if they were right, I’m going to follow development of this community.

NGS = high tech

Probably the most important lesson was to realize that sequencing is a field with very high requirements for infrastructure and  even higher requirements for skilled staff. Basically every element of the infrastructure may become a bottleneck and if you want to avoid it, cost of data maintenance and analysis exceeds very fast cost of producing the data. When I talked about it to many people during the last year (I’m involved in some sequencing projects at the analysis/annotation step) often people felt I overestimate infrastructure needs. Now I have some specific number to back it up :) .

Microstocks are for scientists too

November 5, 2009 Pawel Szczesny 2 comments

Money is rarely directly discussed on science blogs, but rarely science bloggers say that they don’t care. Quite a number of them run advertisement or affiliate programs on their sites, trying to monetize the traffic they generate. And while I don’t know specific numbers, my estimate is (some time ago I did run such programs on a photography blog which was way more popular that this one) that in majority of cases it buys them a coffee or two per week. This blog is hosted over at WordPress.com and WP.com team forbids inserting your own scripts into the blog (occasional affiliate links seem to be fine, if you’re interested). Making money from Google ads wasn’t an option for me. But I have tried to earn money by sending images of molecules to microstock sites and that seems to be more profitable than previous strategy.

Inspiration to write this post came from the fact that I’ve recently logged into one of the sites and I was quite surprised to see that despite the fact I didn’t upload anything for almost two years, my images are still selling quite well. In majority of microstock sites your gallery exposure is bigger if you upload new stuff on regular basis. So the conclusion is that after two years there’s still not many similar images of molecules to choose from.

hemoglobin

Above you can see one of the attempts to create nice picture of hemoglobin molecule. That should give you an idea what images are selling well. Simple, clean, bright colours etc. Few other suggestions:

  • pay attention to the license under which the software you use to generate images is distributed. For example, you cannot use VMD or Chimera (both have non-commercial licenses), while Qutemol (under GPL) is fine.
  • use automated submitters (available for all platforms), instead of relying on ftp. You just don’t want to manually annotate dozens of images on the web. The other route is to fill IPCT tags.
  • submit to all microstock sites that let you in, but start with the bigger ones (iStockPhoto, Dreamstime, Fotolia, Shutterstock etc.)
  • if you have time, experiment with graphics or 3D software. Additional modifications in GIMP or Blender occasionally produce interesting images.
  • If you live in a strange country, check first regulations under which you can earn money via microstock. In Poland for example, you need to start a company first (which, as my Polish readers can confirm, is a really painful process)

 

Reblog this post [with Zemanta]

Transitions, transitions

October 28, 2009 Pawel Szczesny 1 comment

Quite a few things happened while I was away. If you’re interested, here’s not so short summary of my internet hiatus:

Research area

I think I’m done with bioinformatics. My current research area seems to be located somewhere between systems biology, theoretical biology and information/complex systems theory. I hope to build on Dawkins work, deal with emergence in biology and study subtle effects in biological systems. While I’m not sure if I will have anything interesting to show ever, I don’t have energy to do yet another project which involves programming/web interfaces/dealing with data/annotations/modelling etc. I’m done with analytics, time for synthesis :) .

Carrer

Last year I wrote a post dreaming about small non-profit contract research organisation. This model of Research-as-a-Service has materialized in a virtual research institute which we have finally launched few days ago (materialized in something virtual, sign of times? ;) ). The setup is quite simple – the institute gets a project (or applies for such) and then it searches for researchers/institutions/freelancers which are willing to subcontract parts of the project. We have outsourced not only research part, even money gathering (writing grants, etc.) is done by external company. The setup is quite flexible and pretty transparent – for example, we may represent somebody’s rights, but no intellectual property is owned by the institute. Why such institution? We become a single point of contact for a large and diverse group of scientists, which are willing to do some research for real money but don’t have time and energy to hunt for gigs by themselves. While I have an academic job, I’m in the middle of transition from being a freelancer, to being a jobs provider for freelance scientists. More on that in some other post.

Open science

I plan to spend way more time on advocating open science (all of its flavors), but… in Polish. This step is out of large frustration that even prominent figures in Polish science have no idea about changes in the science internet-aware researchers are watching and creating. Knowledge about even basic things like Open Access is dramatically low in Poland (a number of people here equals OA with low quality publications which have not been peer-reviewed). With few friends, we have a number of projects in the pipeline (for example, we hope to launch a nation-wide, created by professionals  promotional campaign – bilboards, TV commercials etc. – for open science). If any of these actually works, I will let you know if we have any measureable success ;) .

Labels, labels

Robert Anton Wilson tells a nice story in his book Prometheus Rising:

William James, father of American psychology, tells of meeting an old lady who told him the Earth rested on the back of a huge
turtle.

“But, my dear lady,” Professor James asked, as politely aspossible, “what holds up the turtle?”
“Ah,” she said, “that’s easy. He is standing on the back of another turtle.”
“Oh, I see,” said Professor James, still being polite. “But would you be so good as to tell me what holds up the second turtle?”
“It’s no use, Professor,” said the old lady, realizing he was trying to lead her into a logical trap. “It’s turtles-turtles-turtles, all the way!”

Another story is a comment from my advisor about putting my real research plans in some proposal (he supports these plans):

The most likely a reaction from reviewers will be something like this: “Nice start, some decent papers, PhD looks good. And then he got crazy.”

I feel like screaming “Labels, labels, labels, all the way!” when facing stiff schemas of what scientists “is” or what artists “is” etc. It’s a hard task by itself to integrate multiple passions and multiple interests into a coherent structure. I don’t need another set of issues because of labels people attach to seemingly creative professions. But limiting myself only to topics consistent with the image of an online scientist became even more frustrating. Therefore expect that this blog (or any other venue I choose to express myself) is going to become a lot more diverse in topics and form.

Open Science: a step towards Open Innovation

July 2, 2009 Pawel Szczesny 5 comments

Open Innovation is a catchy phrase, but I don’t think we are that close to it, as many people claim. Innocentive, InnovationXchange or NineSigma operate in the very small market, and this market does not seem to grow as fast as we would wish. Innocentive posted some statistics as of 2nd of June, 2009, so given these numbers and amount of open challenges, it’s safe to assume that as of today, around total of 1000 challenges were posted and ca. half of them were awarded. If you compare that numbers with almost 200 0000 patents issued only by US Patent Office in 2006, it gives a clear picture of the size of the market open innovation crowdsourcing companies (edit: as Jean-Claude points out in the FriendFeed comment, Innocentive and the other two companies mentioned earlier are rather crowdsourcing, not “open innovation” companies) are operating in. There are plenty of reasons why OI did not yet become mainstream (too many to list) and for that to happen, there are two important steps that we need to make first.

Open Science must become mainstream

I’ve been advocating Open Science for some time and I’m following Open Science luminaries for much, much longer. At some point it hit me that Open Science in its fullest form is not an issue that scientists can truly solve by themselves. Open Science crosses domain of Science – it’s an issue for Science, Politics and Business. We should experiment with various ways the research is done, collaborate openly, attempt to invent new business models to fund science and spread “open” meme as much we can. However, the real deal will be made between people in power from these three domains. Why this is necessary to achieve that before we may fully innovate in the open? Because in this step we will sort out all the problems we have today with intellectual property and technology transfer (both being not efficient enough for today’s standards). I cannot envision that happening in other domain – we are paid to collaborate and test ideas. This community is able to hit every major obstacle to “open” in a very short time. And once we have these obstacles removed there’s a next step:

Working models of Open Science should be tested outside of Science

In other words I postulate that whatever solutions work in domain of Science, these should be tested outside of it, in other domains. Not vice versa. Principles of Open Source software did not prove to be useful in open drug development (see Joerg’s post on the topic). Crowdsourcing will not advance quantum physics. Not all aspects of collective intelligence are working in Science. We simply need to invent working solutions within the domain first, and then test them in other domains, such as art or engineering. This step will provide another set of protocols, changes and adjustments that will allow seekers and solvers (to use Innocentive’s nomenclature) to work efficiently together crossing every domain.

Open Innovation is not a single step

I may be proved wrong by some genius that will solve Open Innovation proovedissues in a single brilliant step, but so far I believe that we need more than one to achieve this goal. And it is important to recognize that Open Science is a great opportunity to come closer to it. The sooner we realize it, the better.

Reblog this post [with Zemanta]

Visual analysis in not only about seeing

I’ve just sumbled across this short video on work of Turkish artist Esref Armagan, born blind, who nonetheless paints and draws. I will let you draw your own conclusions – mine are briefly expressed in the title of this post.

Hat tip Mayer Spivack.

Reblog this post [with Zemanta]Update: if you cannot see video embedded, here’s a link.
Categories: Comments Tags: , ,

All 2.0 – an attempt to connect disciplines

June 28, 2009 Pawel Szczesny 2 comments

All 2.0Last year I bought a domain name AllTwoPointZero.com. Initially I had an idea to launch a huge portal around “2.0″ meme – essentially tracking changes in communication methods across various areas. I wanted to quit science and start a consulting career in helping people to communicate more efficiently (new channels and tools, efficient visual communication, etc.). However, a market for such services in Poland is nonexistent, and I didn’t have a mood for relocation, so I’ve turned to other opportunities (and as effect, I’ve stayed in science). Neverthess, I still had a domain but no clear idea what to use it for.

So, with only a little time left, the next option I took was a tracker/aggregator. In theory, once done, it didn’t need much maintenance. There’s quite a lot of services for such purpose out there, but they didn’t necessarily allowed for certain things I wanted to have, so I had to code my own script. As I didn’t have much time, the resulting site is a little rough (it cannot compete with wonderful sites Euan is coding, such as recently released preview of Streamosphere). However, you should get an idea what I’m aiming for. Currently it tracks blog posts and conversations in areas of Science 2.0, Health 2.0 and Culture 2.0 (with Enterprise and Government to follow). Because within these types I sort all entries by date, I had to remove some bloggers from “Key People” list, as their high-speed blogging did not allow others to appear in the box at all. :)

At this stage, the set of sources is far from perfect – outside of science, conversations seem to be highly homogenous. When I improve the sources (maybe will use Twitter and custom FriendFeed searches), I plan to add some kind of visual summary to the tracked conversations to see if I can find some patterns that will let me establish a connection between disciplines. Let’s see…

While I was collecting links, I’ve found one interesting thing: you can find people interested in these three areas both over at FriendFeed and over at Twine. However, it seems that only scientists are actively talking with each other at these services – where are other groups storing their discussions?

Categories: bioinformatics