Nature Publishing Group continues to deceive about #OpenAccess to genome papers

I was reminded today about the wonderful history of Nature in it's claim that it would make all papers reporting a new genome sequence freely and openly available. I wrote about how this was, well, not the truth, in 2012: The Tree of Life: Hey Nature Publishing Group - When are you going to live up to your promises about "free" genome papers? #opengate #aaaaaarrgh. And today I decided to recheck this.

So I searched for "Genome sequence" on the Nature site

And, well, I found a doozy of an example of a paper that is supposed to be openly available but is not. Initial sequencing and analysis of the human genome



That's right.  The "public" human genome paper is not freely or openly available.  It is $4.99 to rent or $20 to purchase.  Is this Nature's way of saying "We think the Lander et al. paper did not actually report on a genome?" and that the Venter paper truly won the race?  I don't think so.  I think this is a way of Nature saying "How can we make money off of our past papers? Which one gets a lot of looks? What? It is freely available? Change that." or something like that.

Want to bet they will say this is a mistake?  Want to bet they will not refund anybody's money who paid for this?

Here is a simple solution for everyone out there.  Do not trust Nature Publishing Group to make something available even if they say they will.


UPDATE 9/25 1 PM

But wait - there is more.  The Plasmodium genome paper, which I wrote about in 2012 not being available and which Nature promised to fix is again behind a pay wall










And more



UPDATE 10/16/17

Nature has apologized and said they fixed the issue and will refund any money people spent to buy these articles



How Open Are You? Part 1: Metrics to Measure Openness and Free Availability of Publications

For many many years I have been raising a key questions in relation to open access publishing - how can we measure how open someone's publications are.  Ideally we would have a way of measuring this in some sort of index.  A few years ago I looked around and asked around and did not find anything out there of obvious direct relevance to what I wanted so I started mapping out ways to do this.

When Aaron Swartz died I started drafting some ideas on this topic.  Here is what I wrote (in January 2013) but never posted:


With the death of Aaron Swartz on Friday there has been much talk of people posting their articles online (a short term solution) and moving more towards openaccess publishing (a long term solution).  One key component of the move to more openaccess publishing will be assessing people on just how good a job they are doing of sharing their academic work.

I have looked around the interwebs to see if there is some existing metric for this and I could not find one.  So I have decided to develop one - which I call the Swartz Openness Index (SOI).


Let A = # of objects being assessed (could be publications, data sets, software, or all of these together). 
Let B = # of objects that are released to the commons with a broad, open license. 
A simple (and simplistic) metric could be simply 
OI = B / A

This is a decent start but misses out on the degree of openness of different objects. So a more useful metric might be the one below.
A and B as above. 
Let C = # of objects available free of charge but not openly 
OI = ( B + (C/D) ) / A  
where D is the "penalty" for making material in C not openly available

This still seems not detailed enough.  A more detailed approach might be to weight diverse aspects of the openness of the objects.  Consider for example the "Open Access Spectrum."  This has divided objects (publications in this case) into six categories in terms of potential openness: reader rights, reuse rights, copyrights, author posting rights, automatic posting, and machine readability.  And each of these is given different categories that assess the level of openness.  Seems like a useful parsing in ways.  Alas, since bizarrely the OAS is released under a somewhat restrictive CC BY-NC-ND  license I cannot technically make derivatives of it.  So I will not.  Mostly because I am pissed at PLoS and SPARC for releasing something in this way.  Inane.

But I can make my own openness spectrum.


And then I stopped writing because I was so pissed off at PLOS and SPARC for making something like this and then restricting it's use.  I had a heated discussion with people from PLOS and SPARC about this but not sure if they updated their policy.  Regardless, the concept of an Openness Index of some kind fell out of my head after this buzzkill.  And it only just now came back to me. (Though I note - I did not find the Draft post I made until AFTER I wrote the rest of this post below ... ).


To get some measure of openness in publications maybe a simple metric would be useful.  Something like the following
  • P = # of publications
  • A = # of fully open access papers
  • OI = Openness index
A simple OI would be
  • OI = 100 * A/P
However, one might want to account for relative levels of openness in this metric.  For example
  • AR = # of papers with a open but somewhat restricted license
  • F = # of papers that are freely available but not with an open license
  • C = some measure of how cheap the non freely available papers are
And so on.

Given that I am not into library science myself and not really familiar with playing around with this type of data I thought a much simpler metric would be to just go to Pubmed (which of course works only for publications in the arenas covered by Pubmed).

From Pubmed one can pull out some simple data. 
  • # of publications (for a person or Institution)
  • # of those publications in PubMed Central (a measure of free availability)
Thus one could easily measure the "Pubmed Central" index as

PMCI = 100 * (# publications in PMC / # of publications in Pubmed)

Some examples of the PMCI for various authors including some bigger names in my field, and some people I have worked with.

            Name                        #s                PMCI    
Eisen JA
224/269  
83.2
Eisen MB 
76/104
73.1
Collins FS
192/521
36.8
Lander ES
160/377
42.4
Lipman DJ
58/73
79.4
Nussinov R
170/462
36.7
Mardis E
127/187
67.9
Colwell RR
237/435
54.5
Varmus H
165/408
40.4
Brown PO
164/234
70.1
Darling AE
20/27
74.0
Coop G
23/39
59.0
Salzberg SL
107/162
61.7
Venter JC
53/237
22.4
Ward NL
24/58
41.4
Fraser CM
78/262
29.8
Quackenbush J
95/225
42.2
Ghedin E
47/82
57.3
Langille MG
10/14
71.4




And so on.  Obviously this is of limited value / accuracy in many ways.  Many papers are freely available but not in Pubmed Central.  Many papers are not covered by Pubmed or Pubmed Central.  Times change, so some measure of recent publications might be better than measuring all publications.  Author identification is challenging (until systems like ORCID get more use).  And so on.

Another thing one can do with Pubmed is to identify papers with free full text available somewhere (not just in PMC).  This can be useful for cases where material is not put into PMC for some reason.  And then with a similar search one can narrow this to just the last five years.  As openaccess has become more common maybe some people have shifted to it more and more over time (I have -- so this search should give me a better index).

Lets call the % of publications with free full text somewhere the "Free Index" or FI.  Here are the values for the same authors.

Name
PMC 
%
Pudmed 
PMCI 
Free
%
Pubmed
5 years
FI - 5 
Free
%
Pubmed
All
FI-ALL
Eisen JA
224/269
83.2
178/180
98.9
237
88.1
Eisen MB 
76/104
73.1
32/34
94.1
8379.8
Collins FS
192/521
36.8
104/128
81.3
26350.5
Lander ES
160/377
42.4
78/104
75.0
20053.1
Lipman DJ
58/73
79.4
20/22
90.9
5980.8
Mardis E
127/187
67.9
90/115
78.3
13572.2
Colwell RR
237/435
54.5
31/63
49.2
25859.3
Varmus H
165/408
40.4
21/28
75.0
20650.5
Brown PO
164/234
70.1
20/21
95.2
18579.0
Darling AE
20/27
74.0
18/21
85.7
2177.8
Coop G
23/39
59.0
16/20
80.0
2871.8
Salzberg SL
107/162
61.7
54/58
93.1
12879.0
Venter JC
53/237
22.4
20/33
60.6
8535.9
Ward NL
24/58
41.4
18/27
66.6
3051.7
Fraser CM
78/262
29.8
9/13
69.2
10941.6
Quackenbush J
95/225
42.2
54/75
72.0
13158.2
Ghedin E
47/82
57.3
30/36
83.3
5668.3
Langille MG
10/14
71.4
11/13
84.6
1178.6


Very happy to see that I score very well for the last five years. 180 papers in Pubmed.  178 of them with free full text somewhere that Pubmed recognizes. The large number of publications comes mostly from genome reports in the open access journals Standards in Genomic Sciences and Genome Announcements.  But most of my non genome report papers are also freely available.

I think in general it would be very useful to have measures of the degree of openness.  And such metrics should take into account sharing of other material like data, methods, etc.  In a way this could be a form of the altmetric calculations going on.

But before going any further I decided to look again into what has been done in this area. When I first thought of doing this a few years ago I searched and asked around and did not see much of anything.  (Although I do remember someone out there - maybe Carl Bergstrom - saying there were some metrics that might be relevant - but can't figure out who / what this information in the back of my head is).

So I decided to do some searching anew.  And lo and behold there was something directly relevant. There is a paper in the Journal of Librarianship and Scholarly Communication called: The Accessibility Quotient: A New Measure of Open Access.  By Mathew A. Willmott, Katharine H. Dunn, and Ellen Finnie Duranceau from MIT.

Full Citation: Willmott, MA, Dunn, KH, Duranceau, EF. (2012). The Accessibility Quotient: A New Measure of Open Access. Journal of Librarianship and Scholarly Communication 1(1):eP1025. http://dx.doi.org/10.7710/2162-3309.1025

Here is the abstract:

Abstract
INTRODUCTION The Accessibility Quotient (AQ), a new measure for assisting authors and librarians in assessing and characterizing the degree of accessibility for a group of papers, is proposed and described. The AQ offers a concise measure that assesses the accessibility of peer-reviewed research produced by an individual or group, by incorporating data on open availability to readers worldwide, the degree of financial barrier to access, and journal quality. The paper reports on the context for developing this measure, how the AQ is calculated, how it can be used in faculty outreach, and why it is a useful lens to use in assessing progress towards more open access to research.
METHODS Journal articles published in 2009 and 2010 by faculty members from one department in each of MIT’s five schools were examined. The AQ was calculated using economist Ted Bergstrom’s Relative Price Index to assess affordability and quality, and data from SHERPA/RoMEO to assess the right to share the peer-reviewed version of an article.
RESULTS The results show that 2009 and 2010 publications by the Media Lab and Physics have the potential to be more open than those of Sloan (Management), Mechanical Engineering, and Linguistics & Philosophy.
DISCUSSION Appropriate interpretation and applications of the AQ are discussed and some limitations of the measure are examined, with suggestions for future studies which may improve the accuracy and relevance of the AQ.
CONCLUSION The AQ offers a concise assessment of accessibility for authors, departments, disciplines, or universities who wish to characterize or understand the degree of access to their research output, capturing additional dimensions of accessibility that matter to faculty.

I completely love it.  After all. it is directly related to what I have been thinking about and, well, they actually did some systematic analysis of their metrics.  I hope more things like this come out and are readily available for anyone to calculate.  Just how open someone is could be yet another metric used to evaluate them ...

And then I did a little more searching and found the following which also seem directly relevant

So - it is good to see various people working on such metrics.  And I hope there are more and more.

Anyway - I know this is a bit incomplete but I simply do not have time right now to turn this into a full study or paper and I wanted to get these ideas out there.  I hope someone finds them useful ...

A good thing: More and more biology papers showing up in arXiv

Good to see some more papers in microbiology & genomics and related topics going to the preprint server arXiv.

If you are interested in population and evolutionary genetics a good place to keep up with papers on this topic in arXiv is Haldane's Sieve.  The good folks there in essence make a separate post about each paper of interest and then people can comment there on the papers, since the commenting functions at arXiv are, well, challenged.

In areas related to this blog, here are some recent papers in arXiv:
Am hoping more and more biologists start depositing papers in arXiv.  My brother has started doing it for all papers in his lab so I guess that means I should too.  And so should everyone else ...



The Xtranormal Case for OpenPub (from Jarrett Byrnes of OpenPub and NCEAS)

From Jarrett Byrnes as part of the OpenPub working group at NCEAS

Ten simple ways to share PDFs of your papers #PDFtribute


There is a spreading surge of PDF sharing going on in relation to a tribute to Aaron Swartz who died a few days ago.  For more on Aaron and tributes to him see the collection I am making here: The Tree of Life: RIP: Aaron Swartz.  For more on the PDF sharing see this CNET story for example: Researchers honor Swartz's memory with PDF protest and http://pdftribute.net.

I should say, sharing your PDFs is not necessarily clearly not enough (the license on the PDF may affect what people can do with them if they feel constrained to follow the law).  It is also critical to think about the level of openness of a paper, but I will save most of the comments on that for another time. What I wanted to do here is point out various ways to share PDFs for people who don't know how ...

UPDATE 1/14: See follow up post 10 things you can do to REALLY support #OpenAccess #PDFTribute

Ten simple ways to share PDFs of your papers.

1. Publish your paper in a fully #openaccess journal (so called GOLD OpenAccess).

Such journals immediately post your paper online for all to see and frequently also post your paper in various formats to repositories like Pubmed Central.  For a list of such journals see the "Directory of Open Access Journals".  In my opinion, this is the best, and, well, really only viable long term option.  This is what I do for papers from my lab.
2. Publish your paper in a non #openaccess journal that has the option of selecting / paying for #openaccess on a case by case basis. 

Many journals that are not fully #openaccess have the option of paying extra to have your paper be published in an #openaccess manner and then the journal handles not only posting the paper on their site but also frequently depositing in a repository of their or your choosing.  UPDATE: Note - in many cases the licenses used by journals for such one-off "open" publishing are not fully open, despite what some of the journals claim so proceed with caution (see PLOS Biology: Why Full Open Access Matters for example).

3. Publish in a non #openaccess journal that releases papers to a repository after a delay.

Many journals put papers behind a paywall initially but then "free"them up in some way after a set period of delay.  For example a large number in biomedicine will deposit papers to Pubmed Central and also make them freely available on their website after 6 months.  Frequently as with #2 above, the licenses associated with such release of papers are not fully open, but this is a way to have your papers be at least accessible to others after a period of time.

4. Deposit your paper in a preprint server before you submit it for publication.  

For more on preprint servers see
Examples of commonly used preprint servers include
5. Self-archive your PDF in a repository (so called GREEEN OpenAccess).

Various repositories out there exist for posting ones papers.  They work in essence like a preprint server though some people use them more for posting papers after they have been published so I am listing them separately here.  More detail on self-archiving can be found here.  A good source of information about repositories is the Registry of Open Access repositories.  Also the Directory of Open Access repositories.  Another good source is SPARC. Also see here.

One repository commonly used in biomedicine in Pubmed Central.  Alas one is only allowed to post papers there by oneself if the work in the paper was funded by an NIH grant.

Another approach is to use arXiv as a repository where you can post things even after they are published.

Another growing venue for self-archiving is an institutional repository.  As many universities expand their commitment to open access or access university repositories are becoming a source of more and more publications.  Check to see if your institution has a repository and use it.

UPDATE: Note, just depositing your paper in a repository or preprint server does not necessarily mean your paper is open access.  Look in detail at the license and copyright policies of the archives you are considering before using them.

6. Self post your PDFs to a website you control.

If you do not have a personal website and/or do not know how to post a paper to your website, well, you should learn more about this.  A few simple ways to quickly post a PDF for others to get access to include

Create a new blog / website with a system that allows posting PDFs.  There are many many options for this.  One is Posterous.  Another is Wordpress.Com.  There are certainly a million other ways.  Upload a PDF to Google Docs and then share the Google Doc link.  Post to Dropbox and share the link there.  Etc. etc. etc.  I ended up using Wordpress.Com to create my lab page and to post all my PDFs.

7. Post your PDFs to an online reference collection.

Many systems now exist for collecting and collating and sharing reference collections online.  They include CiteULike, Zotero, and Mendeley.  I particularly like Mendeley right now in part because it makes it very easy to share PDFs privately or publicly.  I for example have posted all my own papers on Mendeley as well as papers of my father's (for more on this see The Tree of Life: Freeing My Father's Publications and Free Science, One Paper at a Time | Wired Science | Wired.com).

8. Create an academic profile page and post PDFs there.

Many systems now exist for creating a personal Academic profile of sorts.  One example is Academia.Edu. I have created a page here  Jonathan Eisen | University of California, Davis - Academia.edu although I confess I have not been updating it much.

9. Post to Slideshare.

Though many people end up only posting slideshows to Slideshare, and I use it for that purpose, I have posted many of my papers there as well. See for example:

  

10. Post to "Data" archives.

There is a large growing collection of places to post "Data" to share it with others.  Some of these sites also allow posting of papers.  For example, I have posted multiple papers to Figshare, a great data sharing site that can be used to post and share just about anything. I have also used Figshare for this (for example - here is my PhD thesis there).

11. Ask a Librarian. (Yes it goes to 11)

Probably the best way to figure out how to better share your PDFs if the options above don't work for you (or even if they do) is to talk to a librarian.  They are the most knowledgable people in regard to methods and systems and other issues for sharing academic work.


Some related posts from The Tree of Life


Other ideas? Please post in comments ...


RIP: Aaron Swartz (collection of news stories, articles, etc)

Aaron Swartz from the AWL

Compiling links to stories, posts, information about Aaron Swartz and his untimely death. RIP Aaron.

About Aaron
News and Posts about his death
More from 1/14
More from 1/17-22
Storifies about Aaron Swartz
PDF upload tribute