Thought for the day…

In describing a new HIV evolution paper in Science, Dan Graur (aka “Judge Starling”) writes:

“The only thing “novel” about the analysis was the use of a Bayesian method for phylogeographic inference. Interestingly, as in all examples of its use that I have seen so far, the method tells you nothing you do not know or cannot infer by much simpler means.”

I haven’t looked at the paper in question, but I have noticed this common infatuation with pointlessly (even recklessly) complex statistical methods.



Poisonous Pennies

As if there weren’t enough reasons to abandon the penny in the USA, I just learned of a new reason to ban the penny from my home — they are poisonous. The problem is that when pennies are swallowed (by an infant, for instance), the zinc innards can be dissolved by the stomach acid. This can lead to two medical issues: the dissolved zinc can create an ulcer, and the copper shell can cut through internal organs. This issue was brought to my attention  by the first-hand story of a health-care provider whose patient died from internal lacerations caused by a penny. I cannot find anything online about incidents like this, so it is apparently extremely rare. The problem of zinc poisoning is better documented. There’s a chance that my source was confused about the mechanism by which the  penny killed the child, but given his proximity to the case and the details he provided, I’m pretty confident that he knew what he was talking about.

The risks may be pretty low, but given that pennies are nothing more than clutter in my house anyway*, I’ve decided to ban them from my house. I’ve been leaving them behind in stores, but I should just start telling cashiers to keep them (consider it a tip, if you’d like). One of the local stores has a policy of rounding to the nearest nickel, which I wish more stores would adopt. Frankly, I’d be happy if they rounded to the nearest dime or quarter.

Having completed my rant, let’s see a  demonstration of the chemical reaction — you can dissolve the zinc core of a penny at home if you’d like:

I did this using salt and vinegar, as suggested here.
Another neat trick to plate the penny with zinc, and then turn that plating into brass:

p.s. My web-search for pro/con penny arguments came across an essay at Forbes called Don’t You Dare Eliminate the Penny, as one of the first hits. The argument is so ridiculous that it actually strengthened my confidence that the penny should be eliminated, on the grounds that penny advocates have absolutely no basis for their position. However, I was able to find some better arguments by going to Wikipedia, though even some of these sound like complete BS. As is usual with debates, Wikipedia has a decent distillation of the major arguments.

*pennies are worth 5 seconds of minimum-wage work. It’s hardly worth taking the time to pick one up from the floor if you have a full time job.

**p.p.s. This post gave me a few good ideas for educational science projects…once my kids are old enough to not put the pennies in their mouths. So maybe pennies have value after all!


An even lower opinion of Research Gate

Previously, I griped about the business practices of Research Gate and swore that I would not participate in their community.

Like many online communities, RG sends out plenty of spam, but this is particularly annoying spam — the “from” lines shows the names of colleagues on Research Gate who have attempted to create links to me on RG. In contrast, LinkedIn clearly identifies itself when sending updates and connection requests. Since I have a LinkedIn account (which controls the frequency of these emails), those emails aren’t even spam.

So, if you participate in Research Gate, that company will use your name to spam your colleagues. Classy.


Kudos to Dropbox for supporting Linux

There are many reasons to use Linux, but the main drawback is a lack of support for third-party software. When I set up my new Fedora Desktop, I was caught off guard by the fact that does not provide sync software (there is a hack, but it’s not really a solution) — I hadn’t realized how dependent I had become on to backup my work and synchronize my computers. UC Berkeley contracts with both and Google to provide cloud storage, but neither of them come through in this situation.

However, by free Dropbox account has come to the rescue! The Dropbox client had one minor glitch, and otherwise works perfectly. Now I’m shifting to using them for my day-to-day backup, and limiting my use of Box and Drive to sharing large files.

Thanks Dropbox!


Why be a good bioinformatician?

Here is some “advice” on how NOT to be a bioinformatician (i.e. how to make bad software for biology). This makes me ask the question: “Why be a bioinformatician?”

Much of the advice in here makes me think that a lot of “bioinformaticists” don’t really have a good reason for doing what they do. I have to say that I’ve seen a lot of bad biology-focused software. I’ve even heard respected biologists declare that the entire field of bioinformatics is worthless (at least, the stuff published in bioinformatic-focused journals is worthless).

So what is a bioinformaticist trying to achieve?

One approach to bioinformatics is to create software that addresses one’s own research interest. The funny thing is, these typically are not the programs that are published in bioinformatics journals — they are published in biology journals. When I look at the software tools that have been most useful to me, they are not made by people I consider bioinformaticists — they are made by biologists, who are programming computers as a tool to solve problems that they are interested in. Even when these scientists are trained in statistics and CS, they are still tightly connected to a particular biological community and they are designing software that answers research questions that this community cares about. This often allows them to answer questions that nobody has been able to answer before.

The other approach to bioinformatics is to build a tool that others will use. This seems to be the focus of the linked SCFBM article.

All too often, these software/algorithm development projects aim only to produce incremental improvements in existing methods (e.g. making them more accurate or faster or user-friendly). These typically don’t lead anywhere, and I don’t consider these to be appropriate academic projects — this type of optimization should be performed within teams that are interested in some sort of mass-production and have real accountability for the performance of their software (e.g. at commercial firms). Publishing this type of work is an invitation for BS.

There is still space for applying serious CS to improving bioinformatic tools, but these should focus on radically different approaches to the analysis, so that they enable order-of-magnitude improvements in the efficiency of the algorithm.

This same problem of misguided motivation is seen in the plethora of web services that have emerged during the mass-sequencing era. I have been very frustrated by these, since the vast majority of them simply waste my time by promising things that they cannot deliver. Many of them are not maintained — which makes perfect sense given their limited utility to begin with.

If you are going to make a software tool “for biologists”, you need to ask yourself whether it will be useful enough to be worth making properly and maintaining it. If your service is very narrowly focused, are you going to bother maintaining it just to serve the one user per month? Are biologists going to bother discovering your service if it nearly duplicates an existing service that they are already familiar with (e.g. NCBI)? Will they ever hear about it if it provides a single narrowly focused service? Does the service actually provide useful information, or does it simply make predictions that a biologist will need to test anyway if the prediction really matters?

So before trying to figure out how to properly develop bioinformatics software, figure out why you want to make these tools at all.


Agent Based Modelling with Repast

A student brought my attention to Repast when he did some modelling with me this past summer. This is an agent-based modelling (ABM) platform, primarily for Java (though there are also tools for other languages). It is apparently based on an earlier system called “Swarm”, which I’ve heard is slightly more powerful and slightly more difficult to use. Since I was already familiar with Java and Eclipse (of which Repast Simphony is a derivative), we decided to give Repast a shot. In addition to the core agent-manipulation libraries, Repast has powerful visualization tools, active developers, and a decent-sized user community.

The main difficulty in using Repast in the dearth of documentation. This post is meant to help on that front by collecting links to the materials that I have found useful (as much for my own use as anyone else’s). But before getting into that, I’ll provide a little context for why Repast is interesting and perhaps why its documentation is so difficult. Everything I write is based on Repast Simphony 2.1, which is built on top of Eclipse Kepler (Build id: 20130614-0229).

One appealing feature of Repast is that it provides interfaces that can hide the inner-workings from the user, allowing researchers with different levels of programming skills to access the tools. At the simplest end in ReLogo, which is similar to the very accessible NetLogo (apparently both derived from something called Logo). I played with this briefly, but got stumped on how to do simple arithmetic in this language. Rather than learn another language, I  transferred over to the Java side of Repast, which promised greater power anyway. Even here, the core of the modelling engine is still hidden from me, which has become an issue when I wanted to access the scheduling mechanism. It seems that one consequence of developing these different levels of accessibility is that the user community is split into three groups, each of which requires separate documentation.

Since I want to learn about the advanced features of Repast, the obvious place to go is to the developers, who are active on the Repast-interest mailing list. However, I don’t want to bother them with questions that they’ve answered a million times, so I should first search the list archives (as suggested on the sign-up page). But the archive page does not include a search feature (really?). So I do a Google search, and find that someone else asked for a solution on the mailing list, and was told to try searching on Nabble. This is way too meta (To reiterate, I did a Google search to ask how to search the Repast-interest archives, which directed me to a Nabble page contaning an old discussion from Repast-interest, where the answer was that we should use Nabble.)

Another good resource is the large collection of demonstration models. The downside is that there is no quick way to find the model that demonstrates the technique that you are interested in. In my experience, the StupidModel series of models shows the most sophisticated methods.

The Repast GUI provides a powerful interface to the models, but make it difficult to just open up the source code and track the logic. For instance, the main class for all GUI models is “RepastMain”, which is a quite terse and cryptic launcher. As I understand things, Repast launches the GUI (which appears to be encoded at a lower level than Java), which then gets its instructions from a collection of XML files that are associated with your project. For Batch files, the main class is called BatchMain (but the header indicates that this is deprecated in favor of RepastBatchMain).

After poking around the web a bit, it seems that the classes to use to get at the core of the model are RepastEssentials and RunEnvironment.

With all that searching for answers, I wonder does StackOverflow have information about Repast? They don’t seem to have anything specific to Repast, but the do have some discussions of Agent Based Modelling where they discuss Repast in the general context of ABM.

Other links I found helpful/interesting:

Stackoverflow discussion of ABM approaches…and another.

Wikipedia comparison of ABM tools

Repast self-study guide (links to tons of resources)

Comments (3)

It is time for America to switch to Celsius

One of the most tedious parts of science education is familiarizing students to the International System of Units (a.k.a “metric system”). We would realize substantial gains if we were to incorporate any aspect of that system into the standard American language, but too many people just don’t see any reason to exert any effort towards learning a new system. While efforts to introduce the system have floundered (such as converting from miles to kilometers), I think American society has finally reached the point where conversion to the Celsius system is achievable.

To achieve this, we can take advantage of two big changes of the past generation:

1) A large minority of Americans now attend colleges, particularly at large public universities.

2) Digital display screens are now ubiquitous.

If you combine these with the continued, if not growing, importance of the natural sciences in our economy, then we have the right conditions for a change.

The strategy that I devised (over a cup of coffee) goes like this:

The core of this push would be the science departments at higher education institutions. Technology companies may also be interested in contributing. On university campuses, digital thermometer displays could be placed at popular locations, both indoors and outdoors. In addition to displaying the current temperature in Celsius, they could also display forecasts. The presence of forecasts would give students incentives to learn how to interpret the Celsius system. Given that many university buildings already have large flat-panel displays gratuitously distributed around the campus, the initial cost of implementing this system should be minimal. Even installing new specialized thermometer displays should be a reasonable cost for universities. Emphasis would start with science departments, then expand across the university, then into community colleges, secondary and primary schools.

Technology (especially software) companies could incorporate Celsius thermometers into their products. Any retailer or other manager of public space could install Celsius thermometer displays.

I think that converting our temperature-measuring system will be substantially easier than converting other systems, such as distance- or weight-measuring. The primary reason is that temperature is typically of public interest (e.g. a weather forecast), whereas measurements of distance or weight is typically dependent on an individuals idiosyncratic needs at a particular moment. The public nature of most temperature measurements makes it easier to offer an unsolicited measurement to others. A further advantage is that the Celsius scale incorporates intuitive reference points (water’s freezing and boiling points).

I’m excited about this… let’s see if anything will come of it (as if I have the time or clout to organize a campaign like this).


Why I deleted my ResearchGate account

Several months ago, I was excited to discover ResearchGate, and online community for scientists. I was initially attracted by the discussion boards, which included a lot of useful technical feedback. I set up an account, and proceeded to use the service occasionally and share my expertise. The service was not terribly useful to me, but it seemed to be growing and improving, so I was happy to play along. A couple of months ago, I noticed that I could not see anything on the site without first logging in.

I have finally decided to delete the account. Here’s what I told them:

I was originally attracted to Research Gate due to the discussions. Like any other professional/technical discussion board (e.g. StackOverflow), I expect public discussions to be truly public — not controlled by the service. I am very disappointed that Research Gate has placed a virtual wall around its content.

This is a deal breaker for me. I will not contribute content to any service that tries to take control of that content.

Too many companies are trying to make a buck by gaining control over our social interactions. This is sick, and ResearchGate does not offer nearly enough benefits to keep me on board through this process. I hope they will change their business model and recognize the users and content creators as true “members”, not just a commodity to be fed into a pipeline. If not, good riddance.

Comments (2)

Some publishers are exploiting the scientific Open Access movement — let’s crush them.

Rosie Redfield (the bulldog* of British Columbia) has been investigating the unscrupulous business practices of Apple Academic Press (AAP), and how this might be harming authors who publish in Open Access (OA) journals. The gist of the story is that AAP is republishing these scientific papers as book chapters, then selling the book for over $100. By all accounts, these compilations appear just like any other academic book, with the implication being that the chapters are original content written specifically for each book. This does not violate copyright, because the OA license allows republishing as long as attribution is provided (though it’s unclear to me that proper attribution is being given). However, this could cause a number of problems for authors, of which I am most bothered by the disruption of the citation system and the implication that the authors approved of the content in the compilation — including potentially misleading changes to the titles of “their” chapters.

Rosie has been bringing this issue to the attention of authors, and pushing OA publishers to be more proactive about addressing the problem of exploitative book publishers. Right now, the focus seems to be on refining the OA licenses and disclaimers, so that authors don’t find that they inadvertently gave up more control than they intended. I would like to see two additional types of responses: consumer education, and punishment of AAP.

1) Most of the problems that I’m concerned with arise from book buyers not being aware that the content of the book had been published elsewhere. If the publishers had been up-front about the fact that they collected previously published sources, we would not have any problem with proper citation or with the excessive price of the books. As Rosie wrote, the first step is to identify publishers who use these deceptive practices, and that’s not easy. After that, we need to find a way to get the word out.

Luckily, some people are in a position to address both of these issues in a rather straight-forward manner: Google and Amazon. Both of these companies have PDFs of parts of the book that Rosie used an an example: Epigenetics, Environment, and Genes (Amazon, Google). Google surely has the ability to compare the text against works that have been published online, and notify the consumer that the original work is available elsewhere. It seems that the book excerpts were provided by the publisher as an advertisement for the book, and if Amazon and Google don’t want to do these background checks, then they are facilitating the publisher’s fraud. Still, even if these companies don’t want to take responsibility for this (and don’t live up to their promise to “organize all the world’s information”), the rest of us can still leave reviews on the webpages, which others may read. I left comments on both the Amazon, and Google pages. Google also provides links to other websites that sell this book. Oddly enough, the Amazon page does not display my negative review, even though I was informed that it “went live“.

University Libraries are among the biggest consumers of these books. It’s part of their job to assure that they are stocking their shelves with high quality, useful books. I would consider a book like Epigenetics, Environment, and Genes to be a waste of money and I hope that my school’s librarians would be smart enough to avoid buying it. To help them out, I dropped them a little note through their online comment form, asking them to beware of publications coming from AAP. I hope it helps. If they become aware of this problem, maybe they will establish some system for validating that their books contain valuable contributions, and sharing their evaluations with other libraries.

2) It’s not enough to defend ourselves (as consumers) against these individual cases of fraudulent publishing. If we’re going to solve this problem (and take the pressure off of OA publishers), we need to discourage any publisher from pursuing these deceptive sales strategies. They should lose money and have their reputations damaged. The primary way to reduce their profits is through the above “consumer education” approach. Everyone profiting from this fraud deserves to be called out on it — from CRC Press to the editors of the individual books.

The other offensive response is to sue the publisher. I am not a lawyer, nor have I been directly harmed, so there’s not much for me to say here. However, since Rosie has been focusing on copyright law, I think this needs to be addressed. To me, it looks like these publishers have committed fraud, and I suspect that they could be successfully sued, if not in the USA, then in some other country. The Creative Commons (CC) publishing license is only tangential to this issue (unless the authors were unaware that their work could be republished in an overpriced book). Most of the anger at AAP seems to be over their fraudulent representation of the book chapters as original content that was contributed by the listed authors. The CC license allows work to be republished without the permission of the author, so shutting down AAP is not as simple as demonstrating that they never received permission to republish the work. However, there should be laws that address these specific injustices. Copyright restrictions are too broad to be used as a weapon to prevent fraud. The OA publishers clearly have an incentive to shut down AAP and discourage anyone else from following their business model, so maybe they are the best people to organize this response. As for me, I don’t have any pull in these institutions, so I will just try to increase awareness of this problem and the possible solutions.

This is not a problem with the Open Access publishing model — this is nothing more than unscrupulous people trying to make a buck by exploiting naive consumers in a rapidly changing market. These people should be handled, and we shouldn’t let it disrupt the development of Open Access publishing.

*In case it is not clear, I mean only respect with the nickname “bulldog”. I have admired Rosie’s tenacity and intellectual strength since I started research on bacterial genetics. More than once, she has challenged high-profile claims of other scientists, clearly listed the weaknesses of their arguments, and then made sure that everyone else was aware of these weaknesses. Based on her numerous blog postings, it looks like this energy is now being directed at AAP, and I trust that this problem is on the way to being solved.



Software Carpentry workshops

Software Carpentry seeks to train biologists in the basics of software design. Unfortunately, I was not able to attend when one was held at UC Berkeley, but I suspect that this is exactly what is needed for most biologists.


« Previous entries Next Page » Next Page »