Why be a good bioinformatician?

Here is some “advice” on how NOT to be a bioinformatician (i.e. how to make bad software for biology). This makes me ask the question: “Why be a bioinformatician?”

Much of the advice in here makes me think that a lot of “bioinformaticists” don’t really have a good reason for doing what they do. I have to say that I’ve seen a lot of bad biology-focused software. I’ve even heard respected biologists declare that the entire field of bioinformatics is worthless (at least, the stuff published in bioinformatic-focused journals is worthless).

So what is a bioinformaticist trying to achieve?

One approach to bioinformatics is to create software that addresses one’s own research interest. The funny thing is, these typically are not the programs that are published in bioinformatics journals — they are published in biology journals. When I look at the software tools that have been most useful to me, they are not made by people I consider bioinformaticists — they are made by biologists, who are programming computers as a tool to solve problems that they are interested in. Even when these scientists are trained in statistics and CS, they are still tightly connected to a particular biological community and they are designing software that answers research questions that this community cares about. This often allows them to answer questions that nobody has been able to answer before.

The other approach to bioinformatics is to build a tool that others will use. This seems to be the focus of the linked SCFBM article.

All too often, these software/algorithm development projects aim only to produce incremental improvements in existing methods (e.g. making them more accurate or faster or user-friendly). These typically don’t lead anywhere, and I don’t consider these to be appropriate academic projects — this type of optimization should be performed within teams that are interested in some sort of mass-production and have real accountability for the performance of their software (e.g. at commercial firms). Publishing this type of work is an invitation for BS.

There is still space for applying serious CS to improving bioinformatic tools, but these should focus on radically different approaches to the analysis, so that they enable order-of-magnitude improvements in the efficiency of the algorithm.

This same problem of misguided motivation is seen in the plethora of web services that have emerged during the mass-sequencing era. I have been very frustrated by these, since the vast majority of them simply waste my time by promising things that they cannot deliver. Many of them are not maintained — which makes perfect sense given their limited utility to begin with.

If you are going to make a software tool “for biologists”, you need to ask yourself whether it will be useful enough to be worth making properly and maintaining it. If your service is very narrowly focused, are you going to bother maintaining it just to serve the one user per month? Are biologists going to bother discovering your service if it nearly duplicates an existing service that they are already familiar with (e.g. NCBI)? Will they ever hear about it if it provides a single narrowly focused service? Does the service actually provide useful information, or does it simply make predictions that a biologist will need to test anyway if the prediction really matters?

So before trying to figure out how to properly develop bioinformatics software, figure out why you want to make these tools at all.

Comments are closed.