Archive for research

An even lower opinion of Research Gate

Previously, I griped about the business practices of Research Gate and swore that I would not participate in their community.

Like many online communities, RG sends out plenty of spam, but this is particularly annoying spam — the “from” lines shows the names of colleagues on Research Gate who have attempted to create links to me on RG. In contrast, LinkedIn clearly identifies itself when sending updates and connection requests. Since I have a LinkedIn account (which controls the frequency of these emails), those emails aren’t even spam.

So, if you participate in Research Gate, that company will use your name to spam your colleagues. Classy.

Comments off

Why be a good bioinformatician?

Here is some “advice” on how NOT to be a bioinformatician (i.e. how to make bad software for biology). This makes me ask the question: “Why be a bioinformatician?”

Much of the advice in here makes me think that a lot of “bioinformaticists” don’t really have a good reason for doing what they do. I have to say that I’ve seen a lot of bad biology-focused software. I’ve even heard respected biologists declare that the entire field of bioinformatics is worthless (at least, the stuff published in bioinformatic-focused journals is worthless).

So what is a bioinformaticist trying to achieve?

One approach to bioinformatics is to create software that addresses one’s own research interest. The funny thing is, these typically are not the programs that are published in bioinformatics journals — they are published in biology journals. When I look at the software tools that have been most useful to me, they are not made by people I consider bioinformaticists — they are made by biologists, who are programming computers as a tool to solve problems that they are interested in. Even when these scientists are trained in statistics and CS, they are still tightly connected to a particular biological community and they are designing software that answers research questions that this community cares about. This often allows them to answer questions that nobody has been able to answer before.

The other approach to bioinformatics is to build a tool that others will use. This seems to be the focus of the linked SCFBM article.

All too often, these software/algorithm development projects aim only to produce incremental improvements in existing methods (e.g. making them more accurate or faster or user-friendly). These typically don’t lead anywhere, and I don’t consider these to be appropriate academic projects — this type of optimization should be performed within teams that are interested in some sort of mass-production and have real accountability for the performance of their software (e.g. at commercial firms). Publishing this type of work is an invitation for BS.

There is still space for applying serious CS to improving bioinformatic tools, but these should focus on radically different approaches to the analysis, so that they enable order-of-magnitude improvements in the efficiency of the algorithm.

This same problem of misguided motivation is seen in the plethora of web services that have emerged during the mass-sequencing era. I have been very frustrated by these, since the vast majority of them simply waste my time by promising things that they cannot deliver. Many of them are not maintained — which makes perfect sense given their limited utility to begin with.

If you are going to make a software tool “for biologists”, you need to ask yourself whether it will be useful enough to be worth making properly and maintaining it. If your service is very narrowly focused, are you going to bother maintaining it just to serve the one user per month? Are biologists going to bother discovering your service if it nearly duplicates an existing service that they are already familiar with (e.g. NCBI)? Will they ever hear about it if it provides a single narrowly focused service? Does the service actually provide useful information, or does it simply make predictions that a biologist will need to test anyway if the prediction really matters?

So before trying to figure out how to properly develop bioinformatics software, figure out why you want to make these tools at all.

Comments off

Why I deleted my ResearchGate account

Several months ago, I was excited to discover ResearchGate, and online community for scientists. I was initially attracted by the discussion boards, which included a lot of useful technical feedback. I set up an account, and proceeded to use the service occasionally and share my expertise. The service was not terribly useful to me, but it seemed to be growing and improving, so I was happy to play along. A couple of months ago, I noticed that I could not see anything on the site without first logging in.

I have finally decided to delete the account. Here’s what I told them:

I was originally attracted to Research Gate due to the discussions. Like any other professional/technical discussion board (e.g. StackOverflow), I expect public discussions to be truly public — not controlled by the service. I am very disappointed that Research Gate has placed a virtual wall around its content.

This is a deal breaker for me. I will not contribute content to any service that tries to take control of that content.

Too many companies are trying to make a buck by gaining control over our social interactions. This is sick, and ResearchGate does not offer nearly enough benefits to keep me on board through this process. I hope they will change their business model and recognize the users and content creators as true “members”, not just a commodity to be fed into a pipeline. If not, good riddance.

Comments (1)

Some publishers are exploiting the scientific Open Access movement — let’s crush them.

Rosie Redfield (the bulldog* of British Columbia) has been investigating the unscrupulous business practices of Apple Academic Press (AAP), and how this might be harming authors who publish in Open Access (OA) journals. The gist of the story is that AAP is republishing these scientific papers as book chapters, then selling the book for over $100. By all accounts, these compilations appear just like any other academic book, with the implication being that the chapters are original content written specifically for each book. This does not violate copyright, because the OA license allows republishing as long as attribution is provided (though it’s unclear to me that proper attribution is being given). However, this could cause a number of problems for authors, of which I am most bothered by the disruption of the citation system and the implication that the authors approved of the content in the compilation — including potentially misleading changes to the titles of “their” chapters.

Rosie has been bringing this issue to the attention of authors, and pushing OA publishers to be more proactive about addressing the problem of exploitative book publishers. Right now, the focus seems to be on refining the OA licenses and disclaimers, so that authors don’t find that they inadvertently gave up more control than they intended. I would like to see two additional types of responses: consumer education, and punishment of AAP.

1) Most of the problems that I’m concerned with arise from book buyers not being aware that the content of the book had been published elsewhere. If the publishers had been up-front about the fact that they collected previously published sources, we would not have any problem with proper citation or with the excessive price of the books. As Rosie wrote, the first step is to identify publishers who use these deceptive practices, and that’s not easy. After that, we need to find a way to get the word out.

Luckily, some people are in a position to address both of these issues in a rather straight-forward manner: Google and Amazon. Both of these companies have PDFs of parts of the book that Rosie used an an example: Epigenetics, Environment, and Genes (Amazon, Google). Google surely has the ability to compare the text against works that have been published online, and notify the consumer that the original work is available elsewhere. It seems that the book excerpts were provided by the publisher as an advertisement for the book, and if Amazon and Google don’t want to do these background checks, then they are facilitating the publisher’s fraud. Still, even if these companies don’t want to take responsibility for this (and don’t live up to their promise to “organize all the world’s information”), the rest of us can still leave reviews on the webpages, which others may read. I left comments on both the Amazon, and Google pages. Google also provides links to other websites that sell this book. Oddly enough, the Amazon page does not display my negative review, even though I was informed that it “went live“.

University Libraries are among the biggest consumers of these books. It’s part of their job to assure that they are stocking their shelves with high quality, useful books. I would consider a book like Epigenetics, Environment, and Genes to be a waste of money and I hope that my school’s librarians would be smart enough to avoid buying it. To help them out, I dropped them a little note through their online comment form, asking them to beware of publications coming from AAP. I hope it helps. If they become aware of this problem, maybe they will establish some system for validating that their books contain valuable contributions, and sharing their evaluations with other libraries.

2) It’s not enough to defend ourselves (as consumers) against these individual cases of fraudulent publishing. If we’re going to solve this problem (and take the pressure off of OA publishers), we need to discourage any publisher from pursuing these deceptive sales strategies. They should lose money and have their reputations damaged. The primary way to reduce their profits is through the above “consumer education” approach. Everyone profiting from this fraud deserves to be called out on it — from CRC Press to the editors of the individual books.

The other offensive response is to sue the publisher. I am not a lawyer, nor have I been directly harmed, so there’s not much for me to say here. However, since Rosie has been focusing on copyright law, I think this needs to be addressed. To me, it looks like these publishers have committed fraud, and I suspect that they could be successfully sued, if not in the USA, then in some other country. The Creative Commons (CC) publishing license is only tangential to this issue (unless the authors were unaware that their work could be republished in an overpriced book). Most of the anger at AAP seems to be over their fraudulent representation of the book chapters as original content that was contributed by the listed authors. The CC license allows work to be republished without the permission of the author, so shutting down AAP is not as simple as demonstrating that they never received permission to republish the work. However, there should be laws that address these specific injustices. Copyright restrictions are too broad to be used as a weapon to prevent fraud. The OA publishers clearly have an incentive to shut down AAP and discourage anyone else from following their business model, so maybe they are the best people to organize this response. As for me, I don’t have any pull in these institutions, so I will just try to increase awareness of this problem and the possible solutions.

This is not a problem with the Open Access publishing model — this is nothing more than unscrupulous people trying to make a buck by exploiting naive consumers in a rapidly changing market. These people should be handled, and we shouldn’t let it disrupt the development of Open Access publishing.

*In case it is not clear, I mean only respect with the nickname “bulldog”. I have admired Rosie’s tenacity and intellectual strength since I started research on bacterial genetics. More than once, she has challenged high-profile claims of other scientists, clearly listed the weaknesses of their arguments, and then made sure that everyone else was aware of these weaknesses. Based on her numerous blog postings, it looks like this energy is now being directed at AAP, and I trust that this problem is on the way to being solved.


Comments off

How to spot a fraud

From The Scientist:

In a detailed final report about the fraud committed by Dutch researcher Diederik Stapel, three separate investigative panels have heaped further criticism onto the field of social psychology in general. The investigators found that “from the bottom to the top there was a general neglect of fundamental scientific standards and methodological requirements”—a situation that allowed Stapel’s fraud to continue for years.

I’ve placed that report on my reading list — both for the analysis of institutional failing and the statistical methods that were used to identify fraud.

Comments off

eLife is now acitve

The new biology journal eLife is now active (though the inaugural edition is not officially out). All I can say is that if someone is going to successfully reinvent scientific publishing, it’s these guys.

Involving the Howard Hughes Medical Institute, the Max Planck Society, the Wellcome Trust, and over 200 of the world’s most talented biomedical scientists….

At eLife, our goal is to accelerate scientific advancement by promoting modes of communication whereby new results are made available quickly, openly, and in a way that helps others to build upon them. We will make data more accessible, more useable. We’ll aim to create a broader audience for important discoveries, and we’ll work to trace the impact of individual contributions – on individual fields of study, on science, and on society as a whole.


Comments off

Aphids got their color from fungus?

A neat story of Horizontal Gene Transfer from the Moran lab, showing what happened to the carotenoid biosynthesis genes after aphids acquired them from fungus.

Diversification of genes for carotenoid biosyn… [Mol Biol Evol. 2012] – PubMed – NCBI.

So why would the aphid germline have access to fungal genes? Maybe I’ll have to read more…

Comments off

This is what happens when you dismiss recombination in bacteria

Here is the abstract from PubMed. Right now, I have no comment except to say that this does not change my previously published opinions about the importance of recombination in the evolution of E. coli. More later.

Evidence of non-random mutation rates suggests an evolutionary risk management strategy.

Martincorena I, Seshasayee AS, Luscombe NM.

Nature. 2012 May 3;485(7396):95-8.


A central tenet in evolutionary theory is that mutations occur randomly with respect to their value to an organism; selection then governs whether they are fixed in a population. This principle has been challenged by long-standing theoretical models predicting that selection could modulate the rate of mutation itself. However, our understanding of how the mutation rate varies between different sites within a genome has been hindered by technical difficulties in measuring it. Here we present a study that overcomes previous limitations by combining phylogenetic and population genetic techniques. Upon comparing 34 Escherichia coli genomes, we observe that the neutral mutation rate varies by more than an order of magnitude across 2,659 genes, with mutational hot and cold spots spanning several kilobases. Importantly, the variation is not random: we detect a lower rate in highly expressed genes and in those undergoing stronger purifying selection. Our observations suggest that the mutation rate has been evolutionarily optimized to reduce the risk of deleterious mutations. Current knowledge of factors influencing the mutation rate—including transcription-coupled repair and context-dependent mutagenesis—do not explain these observations, indicating that additional mechanisms must be involved. The findings have important implications for our understanding of evolution and the control of mutations.

Comments off