Bug tracking systems in science

RSS

Bug tracking systems in science

18 Apr

I’m not going to describe painful process of correcting entries in biological databases or errors in publications when one is not the author – we all know how difficult and unrewarding it is. All major databases contain wrong entries – I see misannotated (or nonexistent) genes in Genbank, artificial domains in PFAM or poorly solved structures in PDB. It’s even worse in publications, where across the whole spectrum of journals I see errors which in theory shouldn’t slip through peer review (this includes such prominent publishers like NPG).

One of the best idea I heard that addressed this issue was to build a bug tracking system (I would like to give credit to the author, but I cannot find the source; wasn’t that one of biobloggers?). It’s simple and efficient. Something is wrong? Fill a bug report. It would be linking to the original entry, would be available for aggregation (for example to track report’s author activity), and possibly could be closed by somebody else than database maintainers or authors if it’s wrong. Because it would be external to all databases, maybe it could grow to provide “community corrected” versions of these databases?

What do you think? How useful such system could be?

10 Comments

Posted by Pawel Szczesny on April 18, 2008 in Comments, Community, Software

Tags: bioinformatics, bug tracking, NPG, science

10 responses to “Bug tracking systems in science”

nsaunders

April 18, 2008 at 11:06

I think it’s an excellent idea, would be extremely useful and should be relatively simple to implement.

Database providers can be rather protective of their databases and sensitive to criticism, in my experience. We need more honesty in admitting to errors – they’re not necessarily anyone’s “fault”; most are a result of automated processes.
Pawel Szczesny

April 18, 2008 at 11:25

Thanks Neil. It was so simple I wasn’t sure if it makes sense at all.
What I thought is that even if it’s somebody’s clear fault (for example didn’t research a topic) I wouldn’t make any big deal of that as long as there’s a bug tracking system. When a report is filled it’s an imaginary red flag to readers/users. Whether authors/developers are going to correct the original, it’s their choice. I believe most of them is going to do so, but even if they don’t, errors will not propagate as easily as they do now.
Another thing is that such system in theory would allow for tracking scientist’s micro-contributions, which is something bioblogosphere is talking about for couple of years.
Mike

April 18, 2008 at 12:00

I agree also
Lighthouse or Get Satisfaction could be useful for this type of thing.
Why build something new when many open source project succesfully existing services.
Andrew Perry

April 18, 2008 at 13:12

Let me be the third to say: Great idea. It evokes ideas of the way the Ubuntu Launchpad bug tracker works, tracking bugs on software written by others. Sometimes those other software projects, like Gnome for instance, have their own bug tracker, and Launchpad uses methods (automatic or manual) to forward those bugs downstream to Gnome if they apply. It would actually be sensible if databases like PFAM and Uniprot ran their own bug trackers for this purpose.

As you have mentioned, this could be an interesting method to highlight ‘bugs’ (aka errors) in peer reviewed publications and stimulate discussion. Like bug trackers for software, these publication ‘bugs’ could be marked with different priorities, and like software bug trackers, it’s up to anyone who cares to participate to discuss the problem and come to a consensus on the severity, and go about notifying the journal/database responsible (including marking the bug WON’T_FIX if the author officially refuses to fix it). Sometimes these discussions may reveal that there are simply alternate interpretations of the results, which is valuable but probably not considered a bug.

Maybe the extra attention, with the problems identified out in the open, could shame some authors or publishers into correcting serious errors or even retracting the odd paper, rather than just hoping no one will notice their mistake. Think of it as like ‘full disclosure’ for bugs in published science 🙂
Mr. Gunn

April 21, 2008 at 16:22

This would also make it easy to issue the corrections, particularly if the manuscript were maintained in a CVS. This may be asking too much, however.

About the errors that should have been caught by peer review, I can share two perspectives, that of an author and that of a reviewer. The author often doesn’t bother to go over their manuscript for grammatical errors, feeling that it’s the editor’s job. They get the obvious stuff, but are just as prone to common errors as anyone else.

The reviewers often read for content only, feeling that pointing out grammatical errors is unimportant and not their job.

I don’t know from personal experience, but I’d imagine that this leaves editors somewhat overworked and causes them to miss the occasional mislabeled figure and so on.

It would be nice if readers could submit a “patch”. Simple bug reporting would also work, but since this would certainly cause a sharp uptick in errata, the problem of how to implement and track these corrections needs to be addressed, since the volume would probably be too much for the existing system of simply posting editorial notes or links to errata.

People reading the paper version wouldn’t be well served by a “release early and often” philosophy, certainly!
Marcin Cieslik

April 21, 2008 at 23:09

Yeah a central place to bash all those who published software with bugs and didn’t care afterwards:/.
Roland Krause

April 23, 2008 at 22:49

Nice try but the Old Ones fear corrections like, ah, whatever the Old Ones fear. The chance of getting something like this established in our waking years is fairly slim, given that the current publishing process assumes that papers are finalized and are never to revisited.
A more useful approach would be to appeal to scientists to host and update data on their own website. After all, those errors go around in the community and it would be much better to deal with actively. This can be done easily and is done already by those who care.
Pawel Szczesny

May 3, 2008 at 20:05

Thank you all for the comments.

William, I don’t mind small mistakes in papers. If these doesn’t reverse the meaning of the text, I wouldn’t bother to track them. I’m interested in finding fundamental errors – wrong assumptions, misinterpretation of data and such. I believe tracking would be easy – it would take nothing more than a Greasemonkey script, that would inform reader at NCBI page that there are some bugs reported for particular paper. Rest is on reader’s side.

Marcin, I didn’t mean to bash anybody, but at least to raise a flag on errors for the community to be aware of. One example is HMMER, brilliant and widely used package: but it has a bug under certain conditions it misses domains in the analyzed sequence. If the community was be aware of that, I believe community would fix it. See for example cd-hit – currently it is hosted by Bioinformatics.org, not its author.

Roland, I agree it wouldn’t be easy to push this idea forward, but on the other hand I don’t believe in updating data by scientists themselves. It simply doesn’t happen often enough – people change their work, move between continents and I see no chance that they keep their previous work up to date. And that includes also me and my servers, so I’m also to be blamed. It’s a complex issue (you can easily get monet for new service, but hardly to maintain the old one), so I don’t think bug tracking system would solve it completely – maybe a small change in funding system (to reserve money for maintenance of data and services by ourselves) would be a better idea.
Sean Eddy

May 6, 2008 at 15:56

Here’s a simpler idea. If you see a bug in something (such as Pfam or HMMER), just email the author.

For example, if you know of a bug in HMMER, could you please just report it to me? Likewise if you know of an “artificial domain” (whatever that means) in Pfam. Thanks.
Pawel Szczesny

May 7, 2008 at 16:42

Thank you Prof. Eddy for visiting my blog.

I’ve had enough bad experience with submitting bugs in someone’s else software or database, that I don’t bother to do it at all. Some people are responsive and helpfull, some don’t care, but I cannot know that in advance. After a few “bad” times I’m not ecouraged to point errors in somebody’s work via direct contact – it can be simply waste of time. Pointing error in publication is even worse – resistance is enourmous.

I will happily send you an email concerning HMMER. I didn’t think it’s still under development, given dynamics of its releases.