How Open Are You? Part 1: Metrics to Measure Openness and Free Availability of Publications

For many many years I have been raising a key questions in relation to open access publishing - how can we measure how open someone's publications are.  Ideally we would have a way of measuring this in some sort of index.  A few years ago I looked around and asked around and did not find anything out there of obvious direct relevance to what I wanted so I started mapping out ways to do this.

When Aaron Swartz died I started drafting some ideas on this topic.  Here is what I wrote (in January 2013) but never posted:


With the death of Aaron Swartz on Friday there has been much talk of people posting their articles online (a short term solution) and moving more towards openaccess publishing (a long term solution).  One key component of the move to more openaccess publishing will be assessing people on just how good a job they are doing of sharing their academic work.

I have looked around the interwebs to see if there is some existing metric for this and I could not find one.  So I have decided to develop one - which I call the Swartz Openness Index (SOI).


Let A = # of objects being assessed (could be publications, data sets, software, or all of these together). 
Let B = # of objects that are released to the commons with a broad, open license. 
A simple (and simplistic) metric could be simply 
OI = B / A

This is a decent start but misses out on the degree of openness of different objects. So a more useful metric might be the one below.
A and B as above. 
Let C = # of objects available free of charge but not openly 
OI = ( B + (C/D) ) / A  
where D is the "penalty" for making material in C not openly available

This still seems not detailed enough.  A more detailed approach might be to weight diverse aspects of the openness of the objects.  Consider for example the "Open Access Spectrum."  This has divided objects (publications in this case) into six categories in terms of potential openness: reader rights, reuse rights, copyrights, author posting rights, automatic posting, and machine readability.  And each of these is given different categories that assess the level of openness.  Seems like a useful parsing in ways.  Alas, since bizarrely the OAS is released under a somewhat restrictive CC BY-NC-ND  license I cannot technically make derivatives of it.  So I will not.  Mostly because I am pissed at PLoS and SPARC for releasing something in this way.  Inane.

But I can make my own openness spectrum.


And then I stopped writing because I was so pissed off at PLOS and SPARC for making something like this and then restricting it's use.  I had a heated discussion with people from PLOS and SPARC about this but not sure if they updated their policy.  Regardless, the concept of an Openness Index of some kind fell out of my head after this buzzkill.  And it only just now came back to me. (Though I note - I did not find the Draft post I made until AFTER I wrote the rest of this post below ... ).


To get some measure of openness in publications maybe a simple metric would be useful.  Something like the following
  • P = # of publications
  • A = # of fully open access papers
  • OI = Openness index
A simple OI would be
  • OI = 100 * A/P
However, one might want to account for relative levels of openness in this metric.  For example
  • AR = # of papers with a open but somewhat restricted license
  • F = # of papers that are freely available but not with an open license
  • C = some measure of how cheap the non freely available papers are
And so on.

Given that I am not into library science myself and not really familiar with playing around with this type of data I thought a much simpler metric would be to just go to Pubmed (which of course works only for publications in the arenas covered by Pubmed).

From Pubmed one can pull out some simple data. 
  • # of publications (for a person or Institution)
  • # of those publications in PubMed Central (a measure of free availability)
Thus one could easily measure the "Pubmed Central" index as

PMCI = 100 * (# publications in PMC / # of publications in Pubmed)

Some examples of the PMCI for various authors including some bigger names in my field, and some people I have worked with.

            Name                        #s                PMCI    
Eisen JA
224/269  
83.2
Eisen MB 
76/104
73.1
Collins FS
192/521
36.8
Lander ES
160/377
42.4
Lipman DJ
58/73
79.4
Nussinov R
170/462
36.7
Mardis E
127/187
67.9
Colwell RR
237/435
54.5
Varmus H
165/408
40.4
Brown PO
164/234
70.1
Darling AE
20/27
74.0
Coop G
23/39
59.0
Salzberg SL
107/162
61.7
Venter JC
53/237
22.4
Ward NL
24/58
41.4
Fraser CM
78/262
29.8
Quackenbush J
95/225
42.2
Ghedin E
47/82
57.3
Langille MG
10/14
71.4




And so on.  Obviously this is of limited value / accuracy in many ways.  Many papers are freely available but not in Pubmed Central.  Many papers are not covered by Pubmed or Pubmed Central.  Times change, so some measure of recent publications might be better than measuring all publications.  Author identification is challenging (until systems like ORCID get more use).  And so on.

Another thing one can do with Pubmed is to identify papers with free full text available somewhere (not just in PMC).  This can be useful for cases where material is not put into PMC for some reason.  And then with a similar search one can narrow this to just the last five years.  As openaccess has become more common maybe some people have shifted to it more and more over time (I have -- so this search should give me a better index).

Lets call the % of publications with free full text somewhere the "Free Index" or FI.  Here are the values for the same authors.

Name
PMC 
%
Pudmed 
PMCI 
Free
%
Pubmed
5 years
FI - 5 
Free
%
Pubmed
All
FI-ALL
Eisen JA
224/269
83.2
178/180
98.9
237
88.1
Eisen MB 
76/104
73.1
32/34
94.1
8379.8
Collins FS
192/521
36.8
104/128
81.3
26350.5
Lander ES
160/377
42.4
78/104
75.0
20053.1
Lipman DJ
58/73
79.4
20/22
90.9
5980.8
Mardis E
127/187
67.9
90/115
78.3
13572.2
Colwell RR
237/435
54.5
31/63
49.2
25859.3
Varmus H
165/408
40.4
21/28
75.0
20650.5
Brown PO
164/234
70.1
20/21
95.2
18579.0
Darling AE
20/27
74.0
18/21
85.7
2177.8
Coop G
23/39
59.0
16/20
80.0
2871.8
Salzberg SL
107/162
61.7
54/58
93.1
12879.0
Venter JC
53/237
22.4
20/33
60.6
8535.9
Ward NL
24/58
41.4
18/27
66.6
3051.7
Fraser CM
78/262
29.8
9/13
69.2
10941.6
Quackenbush J
95/225
42.2
54/75
72.0
13158.2
Ghedin E
47/82
57.3
30/36
83.3
5668.3
Langille MG
10/14
71.4
11/13
84.6
1178.6


Very happy to see that I score very well for the last five years. 180 papers in Pubmed.  178 of them with free full text somewhere that Pubmed recognizes. The large number of publications comes mostly from genome reports in the open access journals Standards in Genomic Sciences and Genome Announcements.  But most of my non genome report papers are also freely available.

I think in general it would be very useful to have measures of the degree of openness.  And such metrics should take into account sharing of other material like data, methods, etc.  In a way this could be a form of the altmetric calculations going on.

But before going any further I decided to look again into what has been done in this area. When I first thought of doing this a few years ago I searched and asked around and did not see much of anything.  (Although I do remember someone out there - maybe Carl Bergstrom - saying there were some metrics that might be relevant - but can't figure out who / what this information in the back of my head is).

So I decided to do some searching anew.  And lo and behold there was something directly relevant. There is a paper in the Journal of Librarianship and Scholarly Communication called: The Accessibility Quotient: A New Measure of Open Access.  By Mathew A. Willmott, Katharine H. Dunn, and Ellen Finnie Duranceau from MIT.

Full Citation: Willmott, MA, Dunn, KH, Duranceau, EF. (2012). The Accessibility Quotient: A New Measure of Open Access. Journal of Librarianship and Scholarly Communication 1(1):eP1025. http://dx.doi.org/10.7710/2162-3309.1025

Here is the abstract:

Abstract
INTRODUCTION The Accessibility Quotient (AQ), a new measure for assisting authors and librarians in assessing and characterizing the degree of accessibility for a group of papers, is proposed and described. The AQ offers a concise measure that assesses the accessibility of peer-reviewed research produced by an individual or group, by incorporating data on open availability to readers worldwide, the degree of financial barrier to access, and journal quality. The paper reports on the context for developing this measure, how the AQ is calculated, how it can be used in faculty outreach, and why it is a useful lens to use in assessing progress towards more open access to research.
METHODS Journal articles published in 2009 and 2010 by faculty members from one department in each of MIT’s five schools were examined. The AQ was calculated using economist Ted Bergstrom’s Relative Price Index to assess affordability and quality, and data from SHERPA/RoMEO to assess the right to share the peer-reviewed version of an article.
RESULTS The results show that 2009 and 2010 publications by the Media Lab and Physics have the potential to be more open than those of Sloan (Management), Mechanical Engineering, and Linguistics & Philosophy.
DISCUSSION Appropriate interpretation and applications of the AQ are discussed and some limitations of the measure are examined, with suggestions for future studies which may improve the accuracy and relevance of the AQ.
CONCLUSION The AQ offers a concise assessment of accessibility for authors, departments, disciplines, or universities who wish to characterize or understand the degree of access to their research output, capturing additional dimensions of accessibility that matter to faculty.

I completely love it.  After all. it is directly related to what I have been thinking about and, well, they actually did some systematic analysis of their metrics.  I hope more things like this come out and are readily available for anyone to calculate.  Just how open someone is could be yet another metric used to evaluate them ...

And then I did a little more searching and found the following which also seem directly relevant

So - it is good to see various people working on such metrics.  And I hope there are more and more.

Anyway - I know this is a bit incomplete but I simply do not have time right now to turn this into a full study or paper and I wanted to get these ideas out there.  I hope someone finds them useful ...

The CFAA Reform “Ask Me Anything”

A long list of characters and organizations  involved in the fight to reform the outdated and overly-broad Computer Fraud & Abuse Act (CFAA) – the law used to prosecute Aaron Swartz prior to his suicide and criminalizes violating user agreements – just did an “Ask Me Anything” on Reddit. It is an interesting read:

I think there’s a growing constituency that opposes over-criminalization. CEI, Heritage, TechFreedom, Right On Crime, etc are helping to legitimize that position among conservative and libertarian Republicans, on spending and state authority grounds. The majority of Dems on House Judiciary, the committee of jurisdiction here, are in the orbit of the Progressive and/or Black Caucus, and are sympathetic to concerns about over-crim, prison industrial complex, etc. We need people who are working to legitimize those concerns/frames to keep up there work and have increased success as they do so, so we can discuss issues like the CFAA through those frames and have it resonate. – David Adam Segal, Executive Director of Demand Progress

There is an irony to people asking for fewer emails kind of spamming the thread, which will make you want to skip about 50% of the comments. Most take the form of “I really support your cause, but not enough to deal with a couple of extra emails each week.” Full credit to the reformers, as I did not see a single reply from them like, “Oh, you support us that much? THANKS!”


A day to think, to pause, to ponder

Today is not an easy day for me.

I pause today to think about a person in my life.  A person who was dedicated to science and discovery and improving the human condition.  A person who was idealistic and sensitive and also had some mental health issues.  A person who was pushed over the edge by an overly aggressive, misguided investigation.  A person who became lost in some sort of downward spiral triggered by this investigation.  A person who then took their own life and in one moment created a catastrophic ripple in the world around them.

This person was not Aaron Swartz, though I am thinking of him today too. The person I refer to was my father.  On this day, February 7, 1987, my father Howard J. Eisen took his own life.  I was a freshman in college then.  Enjoying life on my own at Harvard.  Exploring the world of new friends, academic pursuits, and the usual college antics.  And then it all exploded.  The details are a bit of a blur and most are not really important for what I write about here.  But suffice it to say I was devastated.

I flew home to Maryland with my brother and slowly the details emerged.  My father was a researcher at the NIH.  A paper was being prepared for publication by a post doc who worked for a colleague / boss of my father and who my father also worked with.  My father was apparently asked to look at the paper and some "discrepancies" were noted and my father helped launch an investigation into the work.  The NIH panel that was brought in to investigate the work of this post doc was very aggressive - very unpleasant - and even though no accusations of wrong doing were made against my father - the style and tone of the investigation pushed him over the edge.  And he could not dig himself out.  Some people knew he was having trouble with the whole incident but others (e.g., myself) were not in the loop at all.  I knew nothing.  Perhaps people thought I had enough going on as a freshman in college or perhaps it just never came up.  But all I knew was discovered after finding out my father had died, by taking his own life, on February 7, 1987.

Losing my father at the age of 18 was devastating.  Still is.  The fact that he killed himself made it even worse of course.  There were even news stories for a while about it - in the Washington Post, and New York Times, and the Associated Press and Nature and such.  Some of the stories helped in a way because they did not accuse my father of any wrong doing.  For example the Washington Post reported

"Dr. Howard J. Eisen, a respected scientist at the National Institutes of Health, committed suicide at his Bethesda home last week while under pressure from an investigation he helped initiate of alleged scientific fraud by a coworker. 
The suicide has shocked the NIH community and outraged some scientists there, who think that the stress of the investigation triggered Eisen's death. They view it as a case of the system making a responsible scientist suffer even though he acted aggressively to uncover possible dishonesty in his laboratory. Eisen's friends and family acknowledged that his personality-he was intensely idealistic and unusually sensitive-made him vulnerable."
And the Nature article, by Joe Palca, reported "NIH made no allegations against Eisen." Did these make me feel better?  I suppose.  But of course, not really.  Suicide is brutal for those left behind (and I am sure for those who commit it).  I have never recovered.  But I note - the life and death of my father, and the story of the investigation, have shaped my life.  It is why, when I went to graduate school, my #1 criterion for choosing a PhD advisor was that they were a good, kind person.  After struggling with some of the people I worked with I found such a person in Phil Hanawalt and, really, never wanted to leave his lab.  I see so many examples of scientists and MDs and administrators abusing their positions of power and finding someone who does not do any such things can sometimes be a challenge.

The story behind my father's death is also why, a few years ago, when I realized my father's publications were not freely and openly available that I got so angry.  My father had, in a way, died over his research.  And for it to not be available pained me to no end.  When David Dobbs wrote a story about my quest to Free my Father's publications I felt some peace that I had done something in his name.  And when I finally made them all available a week later, I was truly happy.

The story behind my father's death is also why, when people have pointed out to me that I have been a bit over the top in critiquing others, that I back off.  And I have tried to get others on the web and in my arena to be much more careful about avoiding personal attacks (e.g, see here).

I also note that the story behind my father's death is why the death of Aaron Swartz hit home so hard to me.  I knew Aaron a tiny bit (having met a SciFoo many years ago) but not in any deep way.  I read the stories about his JStor download and even wrote about it a little bit.  But I was not aware of the demented, aggressive prosecution of him and when I read about his death I was devastated.  The story reminded me a great deal of my father.  I wrote about Swartz and about the follow up PDFTribute movement (here and here) but it felt a bit awkward since I did not know quite how to discuss my own personal feelings about this story.  So I said nothing.  But now, in tribute to my father, I am trying to not ignore the facts around his death.  They are a part of his life and a part of why I am the way I am.  So I write this post.  And I call for others out there to remember - life is fragile.  Be careful with your words and your actions.  No - one cannot blame everyone - or anyone really - for complex things like suicide.  But we can all do a little bit to improve how we treat others.  And on this day, when I am 44, the same age as my father was when he died, that is what I think about.

My father, Howard J. Eisen

UPDATE 2/8: See my brother's nearly simultaneously written post about this topic (which we did not discuss - typical - here).

Storification of responses

Ten simple ways to share PDFs of your papers #PDFtribute


There is a spreading surge of PDF sharing going on in relation to a tribute to Aaron Swartz who died a few days ago.  For more on Aaron and tributes to him see the collection I am making here: The Tree of Life: RIP: Aaron Swartz.  For more on the PDF sharing see this CNET story for example: Researchers honor Swartz's memory with PDF protest and http://pdftribute.net.

I should say, sharing your PDFs is not necessarily clearly not enough (the license on the PDF may affect what people can do with them if they feel constrained to follow the law).  It is also critical to think about the level of openness of a paper, but I will save most of the comments on that for another time. What I wanted to do here is point out various ways to share PDFs for people who don't know how ...

UPDATE 1/14: See follow up post 10 things you can do to REALLY support #OpenAccess #PDFTribute

Ten simple ways to share PDFs of your papers.

1. Publish your paper in a fully #openaccess journal (so called GOLD OpenAccess).

Such journals immediately post your paper online for all to see and frequently also post your paper in various formats to repositories like Pubmed Central.  For a list of such journals see the "Directory of Open Access Journals".  In my opinion, this is the best, and, well, really only viable long term option.  This is what I do for papers from my lab.
2. Publish your paper in a non #openaccess journal that has the option of selecting / paying for #openaccess on a case by case basis. 

Many journals that are not fully #openaccess have the option of paying extra to have your paper be published in an #openaccess manner and then the journal handles not only posting the paper on their site but also frequently depositing in a repository of their or your choosing.  UPDATE: Note - in many cases the licenses used by journals for such one-off "open" publishing are not fully open, despite what some of the journals claim so proceed with caution (see PLOS Biology: Why Full Open Access Matters for example).

3. Publish in a non #openaccess journal that releases papers to a repository after a delay.

Many journals put papers behind a paywall initially but then "free"them up in some way after a set period of delay.  For example a large number in biomedicine will deposit papers to Pubmed Central and also make them freely available on their website after 6 months.  Frequently as with #2 above, the licenses associated with such release of papers are not fully open, but this is a way to have your papers be at least accessible to others after a period of time.

4. Deposit your paper in a preprint server before you submit it for publication.  

For more on preprint servers see
Examples of commonly used preprint servers include
5. Self-archive your PDF in a repository (so called GREEEN OpenAccess).

Various repositories out there exist for posting ones papers.  They work in essence like a preprint server though some people use them more for posting papers after they have been published so I am listing them separately here.  More detail on self-archiving can be found here.  A good source of information about repositories is the Registry of Open Access repositories.  Also the Directory of Open Access repositories.  Another good source is SPARC. Also see here.

One repository commonly used in biomedicine in Pubmed Central.  Alas one is only allowed to post papers there by oneself if the work in the paper was funded by an NIH grant.

Another approach is to use arXiv as a repository where you can post things even after they are published.

Another growing venue for self-archiving is an institutional repository.  As many universities expand their commitment to open access or access university repositories are becoming a source of more and more publications.  Check to see if your institution has a repository and use it.

UPDATE: Note, just depositing your paper in a repository or preprint server does not necessarily mean your paper is open access.  Look in detail at the license and copyright policies of the archives you are considering before using them.

6. Self post your PDFs to a website you control.

If you do not have a personal website and/or do not know how to post a paper to your website, well, you should learn more about this.  A few simple ways to quickly post a PDF for others to get access to include

Create a new blog / website with a system that allows posting PDFs.  There are many many options for this.  One is Posterous.  Another is Wordpress.Com.  There are certainly a million other ways.  Upload a PDF to Google Docs and then share the Google Doc link.  Post to Dropbox and share the link there.  Etc. etc. etc.  I ended up using Wordpress.Com to create my lab page and to post all my PDFs.

7. Post your PDFs to an online reference collection.

Many systems now exist for collecting and collating and sharing reference collections online.  They include CiteULike, Zotero, and Mendeley.  I particularly like Mendeley right now in part because it makes it very easy to share PDFs privately or publicly.  I for example have posted all my own papers on Mendeley as well as papers of my father's (for more on this see The Tree of Life: Freeing My Father's Publications and Free Science, One Paper at a Time | Wired Science | Wired.com).

8. Create an academic profile page and post PDFs there.

Many systems now exist for creating a personal Academic profile of sorts.  One example is Academia.Edu. I have created a page here  Jonathan Eisen | University of California, Davis - Academia.edu although I confess I have not been updating it much.

9. Post to Slideshare.

Though many people end up only posting slideshows to Slideshare, and I use it for that purpose, I have posted many of my papers there as well. See for example:

  

10. Post to "Data" archives.

There is a large growing collection of places to post "Data" to share it with others.  Some of these sites also allow posting of papers.  For example, I have posted multiple papers to Figshare, a great data sharing site that can be used to post and share just about anything. I have also used Figshare for this (for example - here is my PhD thesis there).

11. Ask a Librarian. (Yes it goes to 11)

Probably the best way to figure out how to better share your PDFs if the options above don't work for you (or even if they do) is to talk to a librarian.  They are the most knowledgable people in regard to methods and systems and other issues for sharing academic work.


Some related posts from The Tree of Life


Other ideas? Please post in comments ...


RIP: Aaron Swartz (collection of news stories, articles, etc)

Aaron Swartz from the AWL

Compiling links to stories, posts, information about Aaron Swartz and his untimely death. RIP Aaron.

About Aaron
News and Posts about his death
More from 1/14
More from 1/17-22
Storifies about Aaron Swartz
PDF upload tribute