Mock reports

From the world of PoliSci, comes this discussion about the use of preregistration of studies and mock reports. It’s on my ever-lengthening “to read” list. My impression of this strategy (without reading the articles) is that research can be more informative if we openly specify our theory and predictions prior to collecting data to test the theory. This avoids the bias towards statistically significant results and the implicit post-hoc nature of current scientific publication practices.




How to spot a fraud

From The Scientist:

In a detailed final report about the fraud committed by Dutch researcher Diederik Stapel, three separate investigative panels have heaped further criticism onto the field of social psychology in general. The investigators found that “from the bottom to the top there was a general neglect of fundamental scientific standards and methodological requirements”—a situation that allowed Stapel’s fraud to continue for years.

I’ve placed that report on my reading list — both for the analysis of institutional failing and the statistical methods that were used to identify fraud.


eLife is now acitve

The new biology journal eLife is now active (though the inaugural edition is not officially out). All I can say is that if someone is going to successfully reinvent scientific publishing, it’s these guys.

Involving the Howard Hughes Medical Institute, the Max Planck Society, the Wellcome Trust, and over 200 of the world’s most talented biomedical scientists….

At eLife, our goal is to accelerate scientific advancement by promoting modes of communication whereby new results are made available quickly, openly, and in a way that helps others to build upon them. We will make data more accessible, more useable. We’ll aim to create a broader audience for important discoveries, and we’ll work to trace the impact of individual contributions – on individual fields of study, on science, and on society as a whole.



Is anything truly random? RANDOM.ORG

I just learned of this web service “RANDOM.ORG – True Random Number Service” via a python module (

It’s clever, but I have to wonder about this distinction between “true randomness” and pseudorandomness. I understand the non-randomness of pseudorandom algorithms, I’m just not sure that I buy that a natural process can be truly random. I don’t know if they are relying of the complexity of the process, or quantum theory.

Either way, I think I’d prefer a pseudorandom algorithm on my own machine over a supposedly random value sent to me over the network. Even if my intention is to have a neutral arbiter in some game of chance, I don’t see the benefit of “true” randomness over pseudorandomness from some public server.


update: A good discussion of this issue at the SuperUser site. I love the StackExchange Network. Two important points stand out:

1) Pseudo-random number generators can become more random by constantly incorporating additional external information into the system. I assume this is what is doing.

2) For some purposes, pseudo-random numbers are more appropriate than truly random numbers. For instance, a stochastic simulation requires frequent bug-hunting, which would be nearly impossible if its “random” actions were not generated purely by the internal state of the system.


Houston Community College in Qatar

Ever since I heard of Qatar’s “Education City” (when Carnegie Mellon University joined), I had been under the impression that Qatar was turning to foreign institutions to provide education to the country’s elite. It turns out that they are also turning to American institutions to provide broader access to post-secondary education. Houston Community College in Qatar is currently hiring Biology professors (no, I’m not applying).

It seems that the College has been offering classes for two years now, and there has been some turn-arounds for the college. I hope things go better in the future. It looks like the world is changing: I hadn’t thought that Qatar would care for community college education (not that I really know anything about Qatar), and I hadn’t thought that a Texan community college would represent part of American academia in an Arab country.


Resources for aspiring genomicists

Here is a brain-dump targeted at an incoming graduate student:

At some point, you should take a look at the following resources. They are very useful for any genomic analysis:

Reference Sequences from the National Center for Biotechnology Information (i.e. NCBI’s RefSeq)

Bacterial sequences

X. fastidiosa Temecula1

The most readable file is “.gbk”. Good for humans, bad for computers.

This file (README) describes the sequence formats

NCBI also has a convenient portal for all sequenced species, e.g. X. fastidiosa

Here’s a good system for getting information about genomes:

If you are going to do anything yourself, you should be familiar with BLAST (the best bioinformatic software ever made)
There’s the website for searching the general database:

…and the stand-alone package (BLAST+) which will be useful for looking at any new sequences we have:

Finally, useful GUI software for genome analysis:



Biologists, please learn to use the command line

The other day, a young microbiologist and I were discussing the skill-set that was necessary for him to do his research. He indicated that he didn’t expect to ever need any software that had to be called from the command line (of course, he didn’t know the term “command line”). I quickly laid that idea to rest.

This attitude is common, yet frustrating. The computer is a central tool in modern biology, yet many biologists are happy to have only the most superficial familiarity with it. They act as though everything they need will be provided in a neat (and affordable) little software package with an intuitive graphical user interface (GUI). They won’t. GUI’s almost always cripple the underlying analytical software, and they introduce a whole new layer of bugs and complexity. They are often harder to describe than simple command-line interfaces, and are less standardized. All that time a biologist spends learning the ins and outs of some arbitrary GUI for a single commodity analysis could be spent learning standard command-line interfaces that are used by the most powerful and cutting-edge (and often free) software out there.

So here’s my plea (and advice) to biologists. Learn to use the command line.  I’m not saying that you should learn to program*. Just the command line.

How to use the command line

In Mac and Linux it’s called “the terminal”. Just right-click (in linux) to select it from the menu. In Windows it’s called the “command window” — right-click while holding shift to select it from the menu. If you don’t know what to do once you have the window open, try this:


Hit Ctrl-C when you want to end. To see the options, type “ping -h”.

A new world awaits.

* if you want to do an analysis of any complexity, it would help to learn a scripting language (e.g. python), and probably regular expressions too.


Aphids got their color from fungus?

A neat story of Horizontal Gene Transfer from the Moran lab, showing what happened to the carotenoid biosynthesis genes after aphids acquired them from fungus.

Diversification of genes for carotenoid biosyn… [Mol Biol Evol. 2012] – PubMed – NCBI.

So why would the aphid germline have access to fungal genes? Maybe I’ll have to read more…


This is what happens when you dismiss recombination in bacteria

Here is the abstract from PubMed. Right now, I have no comment except to say that this does not change my previously published opinions about the importance of recombination in the evolution of E. coli. More later.

Evidence of non-random mutation rates suggests an evolutionary risk management strategy.

Martincorena I, Seshasayee AS, Luscombe NM.

Nature. 2012 May 3;485(7396):95-8.


A central tenet in evolutionary theory is that mutations occur randomly with respect to their value to an organism; selection then governs whether they are fixed in a population. This principle has been challenged by long-standing theoretical models predicting that selection could modulate the rate of mutation itself. However, our understanding of how the mutation rate varies between different sites within a genome has been hindered by technical difficulties in measuring it. Here we present a study that overcomes previous limitations by combining phylogenetic and population genetic techniques. Upon comparing 34 Escherichia coli genomes, we observe that the neutral mutation rate varies by more than an order of magnitude across 2,659 genes, with mutational hot and cold spots spanning several kilobases. Importantly, the variation is not random: we detect a lower rate in highly expressed genes and in those undergoing stronger purifying selection. Our observations suggest that the mutation rate has been evolutionarily optimized to reduce the risk of deleterious mutations. Current knowledge of factors influencing the mutation rate—including transcription-coupled repair and context-dependent mutagenesis—do not explain these observations, indicating that additional mechanisms must be involved. The findings have important implications for our understanding of evolution and the control of mutations.


No news from Oxford Nanopore

Back in February, Oxford Nanopore announced that they would have their new technology in labs within 10 months. Six months later, there has been no press releases telling us when and how labs will be able to get their hands on these devices. Indeed, there is no sign that the technology has matured or that anyone is planning to use it. At the time of their big announcement, cynics speculated that the only purpose was to get the interest of venture capitalists. Funny enough, the only press release since February has been to announce a successful round of fundraising. I guess I should stop holding my breath.

I made an inquiry at their sales department yesterday. Let’s see if they get back to me.


« Previous Page « Previous Page Next entries »