NCBI to help with Rocky Mountain Genomics HackCon, June 17 – 21, 2019

The NCBI will participate in a one-day conference on June 18, 2019 and a hackathon, June 19-21, 2019 as a part of Rocky Mountain Genomics Hackcon 2019 at the BioFrontiers Institute in Boulder Colorado. The conference will feature technical speakers in precision medicine, metagenomics, and advanced RNA-Seq analysis, as well as an exhibitor and poster … … Continue reading

NCBI to assist in Southern California genomics hackathon in January

From January 10-12, 2018, the NCBI will help with a bioinformatics hackathon in Southern California hosted by San Diego State University. The hackathon will focus on advanced bioinformatics analysis of next generation sequencing data, proteomics, and metadata. This event is … Continue reading

Flaws in prediction of presence of "beneficial" microbes from sequence

Made a Storify that may be of interest

Genetics and microbiome of cattle methane production: 2017 PLOS Genetics Research Prize Winning Research

post-info AddThis Sharing Buttons aTTThe PLOS Genetics Editors-in-Chief and Senior Editors would like to congratulate: Rainer Roehe, Richard J. Dewhurst, Carol-Anne Duthie, John A. Rooke, Nest McKain, Dave W.  Ross, Jimmy J. Hyslop, Anthony Waterhouse,

NCBI’s First Hackathon: Advanced Bioinformatic Analysis of Next-Gen Sequencing Data

This blog post is geared toward genomics professionals. From January 5th-7th, 2015, NCBI, in conjunction with the NIH Office of Data Science, held a genomics hackathon, where genomics professionals gathered to write useful, efficient pipelines for people new to genomics. … Continue reading

Guest post from Rachid Ounit on CLARK: Fast and Accurate Classification of Metagenomic and Genomic Sequences

Recently I received and email from Rachid Ounit pointing me to a new open access paper he had on a metagenomics analysis tool called CLARK.  I asked him if he would be willing to write a guest post about it and, well, he did.  Here is it:

CLARK: Accurate metagenomic analysis of a million reads in 20 seconds or less…

At the University of California, Riverside, we have developed a new lightweight algorithm to classify accurately metagenomic samples while minimizing computational resources better than any other classifiers (e.g., Kraken).  While CLARK and Kraken have comparable accuracy, CLARK is significantly faster (cf. Fig. a) and uses less RAM and disk space (cf. Fig. b-c). In default mode and single-threaded, CLARK’s classification speed is higher than 3 million short reads per minute (cf. Fig. a), and it also scales better in multithreading (cf. Fig. d). Like Kraken, CLARK uses k-mers (short DNA words of length k) to solve the classification problem. However, while Kraken and other k-mers based classifiers consider the whole taxonomy tree and must resolve k-mers that match genomes from different taxa (by using the concept of “lowest common ancestor” from MEGAN), CLARK rather considers taxa defined for a unique taxonomy rank (e.g. species/genus), and, during the preprocessing, discards any k-mers that can be found in any pair of taxon. In other words, CLARK exploits specificities of each taxon (against all others) to populate its light and efficient data structure. It uses a customized dictionary of k-mers, in which each k-mer is associated to at most one taxon and results in fast k-mer queries. Then, the read is assigned to the taxon that has the highest amount of k-mers matches with it. Since these matches are discriminative, CLARK assignments are highly accurate. We also show that the choice of the value of k is critical for the optimal performance, and long k-mers (e.g., 31-mers) are not necessarily the best choice to perform accurate identification.  For example, high confidence assignments using 20-mers from real metagenomes show strong consistency with several published and independent results. 

Finally, CLARK can be used for detecting contamination in draft reference genome or, in genomics, chimera in sequenced BACs. We are currently investigating new techniques for improving the sensitivity and the speed of the tool, and we plan to release a new version later this year. We are also extending the tool for comparative genomics/metagenomics purposes. A “RAM-light” version of CLARK for your 4 GB RAM laptop is also available. CLARK is user-friendly (i.e., easy to use, it does not require strong background in programming/bioinformatics) and self-contained (i.e., does not need depend on any external software tool). The latest version of CLARK (v1.1.2) contains several features to analyze your results and is freely available under the GNU GPL license (for more details, please visit CLARK’s webpage). Experimental results and algorithm details can be found in the BMC genomics manuscript.

Performance of Kraken (v0.10.4-beta) and CLARK (v1.0) for the classification of a metagenome sample of 10,000 reads (average reads length 92bp).  a) The classification speed (in 103 reads per minute) in default mode. b) RAM usage (in GB) for the classification. c) Disk space (in GB) required for the database (bacterial genomes from NCBI/RefSeq). d) Classification speed (in 10^3 reads per minute) using 1, 2, 4 and 8 threads.

Do preprints count for anything? Not according to Elife & G3 & some authors ..

Well, just got pointed to this paper: Metagenomic chromosome conformation capture (meta3C) unveils the diversity of chromosome organization in microorganisms | eLife by Martial Marbouty, Axel Cournac, Jean-François Flot, Hervé Marie-Nelly, Julien Mozziconacci, Romain Koszul.  Seems potentially really interesting.

It is similar in concept and in many aspects to a paper we published in PeerJ earlier in the year (see Beitel et al., 2014 Beitel CW, Froenicke L, Lang JM, Korf IF, Michelmore RW, Eisen JA, Darling AE. (2014) Strain- and plasmid-level deconvolution of a synthetic metagenome by sequencing proximity ligation products. PeerJ 2:e415

Yet despite the similarities to our paper and to another paper that was formally published around the time of ours, this new paper does not mention these other pieces of work any where in the introduction as having any type of "prior work" relevance.  Instead, they wait until late in their discussion:
Taking advantage of chromatin conformation capture data to address genomic questions is a dynamic field: while this paper was under review, two studies were released that also aimed at exploiting the physical contacts between DNA molecules to deconvolve genomes from controlled mixes of microorganisms (Beitel et al., 2014; Burton et al., 2014).
Clearly, what they are trying to do here is to claim that since they paper was submitted before these other two (including ours) was published, that they should get some sort of "priority" for their work.  Let's look at that in more detail.  Their paper was received May 9, 2014.  Our paper was published online May 27 and the other related paper by Burton et al. was published online May 22.  In general, if a paper on what your paper is about comes out just after you submit your paper, while your paper is still in review, the common, normal thing to be asked to do is to rewrite your paper to deal with the fact that you were, in essence, scooped.  But that does not really appear to be the case here.  They are treating this in a way as "oh look, some new papers came out at the last minute and we have commented on them."  The last minute would be in this case, 6 months before this new paper was accepted.  Seems like a long time to treat this as "ooh - a new paper came up that we will add a few comments about".

But - one could quibble about the ethics and policies of dealing with papers that were published after one submitted one's own paper.  From my experience, I have always had to do major rewrites to deal with such papers.  But maybe E-Life has different policies.  Who knows.  But that is where things get really annoying here.  This is because it was May 27 when our FINAL paper came out online at PeerJ. However, the preprint of the paper was published on February 27, more than two months before their paper was even submitted.  So does this mean that the authors of this new paper do not believe that preprints exist?  It is pretty clear on the web site for our paper that there is a preprint that was published earlier.  Given what they were working on - something directly related to what our preprint/paper was about, one would assume they would have seen it with a few simple Google searches.  Or a reviewer might have pointed them to it.  Maybe not.  I do not know.  But either way, our preprint was published long before their paper was submitted and therefore I believe they should have discussed it in more detail.

Is this a sign that some people believe preprints are nothing more than rumors?  I hope not.  Preprints are a great way to share research prior to the delays that can happen in peer review.  And in my opinion, preprints should count as prior research and be cited as such.  I note - the Burton group in their paper in G3 also did not reference our preprint in what I consider to be a reasonable manner.  They add some comments in their acknowledgements
While this manuscript was in preparation, a preprint describing a related method appeared in PeerJ PrePrints (Beitel et al. 2014a). Note added in proof: this preprint was subsequently published (Beitel et al. 2014b). 
Given that our preprint was published before their paper was submitted too, I believe that they also should have made more reference to it in their paper.   But again, I can only guess that both the Burton and the Marbouty group just do not see preprints as being respectable scientific objects.  That is a bad precedent to set and I think the wrong one too.  And it is a shame.  A preprint is a publication.  No - it is not peer reviewed.  But that does not mean it should not be considered part of the scientific literature in some way.  I note - this new paper from the Marbouty group seems really interesting.  Not sure I want to dig into it any deeper if they are going to play games with the timing of submission vs. published "papers" as part of how they are positioning themselves to be viewed as doing something novel.