Wikipedia views and every line of Billy Joel’s “We Didn’t Start the Fire”

In the biggest crossover event of the century, Tom Lum used the Wikipedia API to chart the number of views for every reference in Billy Joel’s We Didn’t Start the Fire. Yes. [via @waxpancake]

Tags: , ,

Map of the most popular people replacing the cities they lived in

For The Pudding, Matt Daniels and Russell Goldenberg used Wikipedia pageviews to replace city names with each city’s most popular resident:

Person/city associations were based on the thousands of “People from X city” pages on Wikipedia. The top person from each city was determined by using median pageviews (with a minimum of 1 year of traffic). We chose to include multiple occurrences for a single person because there is both no way to determine which is more accurate and people can “be from” multiple places.

So you end up with LeBron James for Akron, Barack Obama for Chicago, etc.


See also the (non-data-driven) USA song map, which inspired this one. My favorite in this map genre is the series from R. Luke DuBois, who used online dating profiles to replace city names with the most unique personal qualities.

Tags: , ,

Algorithms to fix underrepresentation on Wikipedia

Wikipedia is human-edited, so naturally there are biases towards certain groups of people. Primer, an artificial intelligence startup, is working on a system that looks for people who should have an article. It’s called Quicksilver.

We trained Quicksilver’s models on 30,000 English Wikipedia articles about scientists, their Wikidata entries, and over 3 million sentences from news documents describing them and their work. Then we fed in the names and affiliations of 200,000 authors of scientific papers.

In the morning we found 40,000 people missing from Wikipedia who have a similar distribution of news coverage as those who do have articles. Quicksilver doubled the number of scientists potentially eligible for a Wikipedia article overnight.

Then, after it finds people, it generates sample articles to get things started.

Tags: , ,

Data to identify Wikipedia rabbit holes

New data dump from the Wikimedia Foundation:

The Wikimedia Foundation’s Analytics team is releasing a monthly clickstream dataset. The dataset represents—in aggregate—how readers reach a Wikipedia article and navigate to the next. Previously published as a static release, this dataset is now available as a series of monthly data dumps for English, Russian, German, Spanish, and Japanese Wikipedias.


How different languages represent van Gogh

Christian Laesser takes an abstract look at how different languages represent Vincent van Gogh through various Wikipedia pages.

The visualization explores how different languages present Van Gogh’s work and life by images. Inspired by Geolinguistic Contrasts in Wikipedia. The viz tries to show different narative strategies by showing the image type, origin date and authorship. You can reveal the connections between languages by hovering the images.

I’m not quite convinced this helps with understanding, but I appreciate the experimentation.

Tags: , ,

Wikipedia Activism and Diversity in Science

There’s no getting around it. A lot of scientists are white men, and it’s always been that way. But it’s never been the whole picture. Getting a better picture of scientists whose work or lives

Past and future predictions of when the world will end

When the world ends

Wikipedia has a list of predicted dates for when apocalypse strikes, because of course it does. For kicks and giggles, Jeff Fletcher put the dates on a timeline. The horizontal position of each dot represents the predicted date. The vertical position doesn't mean so much, other than there are a lot of dates around that time.

Luckily, we got past the most recent September 1, 2015 prediction and the grip of ones before that. Phew. Next up: 2020.

Tags: ,

A timeline of history


“I wish there was a timeline browser for all the historical events documented on Wikipedia, from the Big Bang up to present,” you thought to yourself. Well look no more. Histography, a final project by Matan Stauber at the Bezalel Academy of Arts and Design, is an interactive timeline that lets you sift through events and eras. It's updated with new events on the daily.

Each dot represents an event, and the horizontal axis represents its place in time. Categories in the left sidebar let you quickly filter to literature, war, inventions, etc. A scrollbar on the bottom highlights specific sections of time, such as the Stone Age, Renaissance, and Industrial Age.

When you filter, the dots that don't match roll away as if you were working with a table of marbles, further reinforced by the sound of colliding balls.

As with many things Wikipedia data-related, this only accounts for things on Wikipedia and not all things that ever happened in the history of the universe. So naturally, there are more recorded events as you move up to the present.

But with this in mind, this is a fun one to poke at. I want one of those interactive tables with this piece running on it. It'd be the ultimate coffee table book.

Tags: , ,


A couple of weeks ago I unintentionally set off a bit of a firestorm regarding Wikipedia, Elsevier and open access. I was scanning my Twitter feed, as one does, and came upon a link to an Elsevier press release:

Elsevier access donations help Wikipedia editors improve science articles: With free access to ScienceDirect, top editors can ensure that science read by the public is accurate

I read the rest of it, and found that Elsevier and Wikipedia (through the Wikipedia Library Access Program) had struck a deal whereby 45 top (i.e. highly active) Wikipedia editors would get free access to Elsevier’s database of science papers – Science Direct – for a year, thereby “improving the encyclopedia and bringing the best quality information to the public.”

I have some substantive issues with this arrangement, as I will detail below. But what really stuck in my craw was the way that several members of the Wikipedia Library were used not just to highlight the benefits of the deal to Wikipedia and its users, but to serve as mouthpieces for misleading Elsevier PR, such as this:

Elsevier publishes some of the best science scholarship in the world, and our globally located volunteers often seek out that access but don’t have access to research libraries. Elsevier is helping us bridge that gap!

It was painful to hear people from Wikipedia suggesting that Elsevier is coming to the rescue of people who don’t have access to the scientific literature! In reality, Elsevier is one of the primary reasons they don’t have access, having fought open access tooth and nail for two decades and spent millions of dollars to lobby against almost any act anywhere that would improve public access to science. And yet here was Wikipedia – a group that IS one of the great heroes of the access revolution – publicly praising Elsevier for providing access to 0.0000006% of the world’s population.

Furthermore, I found the whole idea that this is a “donation” is ridiculous. Elsevier is giving away something that costs them nothing to provide – they just have to create 45 accounts. It’s extremely unlikely that the Wikipedia editors in question were potential subscribers to Elsevier journals or that they would pay to access individual articles. So no revenue was lost. And in exchange for giving away nothing, Elsevier almost certainly increases the number of links from Wikipedia to their papers – something of significant value to them.

I was fairly astonished to see this, and, being somewhat short-tempered, I fired off a series of tweets:

These tweets struck a bit of a nerve, and the reaction, at least temporarily, seemed to pit #openaccess advocates against Wikipedians – as highlighted in a story by Glyn Moody. I in no way meant to do this. It would be hard to find two groups whose goals are more aligned.

So I want to reiterate something I said over and over as these tweets turned into a kind of mini-controversy. In saying I thought that making this deal with Elsevier was a bad idea, I was not in any way trying to criticize Wikipedia or the people who make it work. I love Wikipedia. As a kid who spent hours and hours reading an old encyclopedia my grandparents gave me, I think that Wikipedia is one of the greatest creations of the Internet Age. Its editors and contributors, as well as Jimmy Wales and the many others who made it a reality, are absolute, unvarnished heroes.

In no way do I question the commitment of Wikipedia to open access. I just think they made a mistake here, and I worry about a bit about the impact this kind of deal will have on Wikipedia. But it is a concern born of true love for the institution.

So with that in mind, let me delve into this a bit more deeply.

First of all, I understand completely why Wikipedia make this kind of deal. The mission of Wikimedia is to “empower and engage people around the world to collect and develop educational content under a free license or in the public domain, and to disseminate it effectively and globally” [1]. But there is a major challenge to building an accurate and fully-referenced open encyclopedia: much of the source material they need to do this is either not online or is behind paywalls. It’s clear that Wikipedia sees opening source material as the long-term solution to this problem. But in the meantime they feel compelled to ensure that the people who build Wikipedia have a way around paywalls when they are doing so. It’s not all that conceptually different from a university library that works to provide access to paywalled sources to its scholars.

So the question to me isn’t whether Wikipedia should make any deals with publishers. The question is should they have made this deal with this publisher. And just like I have strongly disagreed with deals universities (including my own) routinely make to provide campus access to Elsevier journals, I do not think this deal is good for Wikipedia or the public.

Here are my concerns:

This deal will prolong the life of the paywalled business model

If the only effect of this deal was to provide editors with access, I would hold my nose and support Wikipedia’s efforts to work around the current insane scholarly publishing system. But I don’t think this is the only effect of the deal. In several ways this deal strengthens Elsevier’s subscription publishing business, and strengthening this business is clearly bad for Wikipedia and its mission.

How does it strengthen Elsevier’s business? First, it provides them with good PR – allowing them to pretend that they support openness, something that serves to at least partially blunt the increasingly bad PR their business subscription journal publishing business has incurred in recent years. Second, it provides them with revenue. This deal will increase the number of links in Wikipedia to Elsevier papers, and links on Wikipedia are clearly of great value to Elsevier – they can monetize them in multiple ways: a) by advertising on the landing pages, b) by collecting one-time fees from people without accounts who want to view an article, and, most significantly, c) by increasing traffic to their journals from users with access, which they cite to justify increased payments from universities and other institutions.

Finally, and most significantly, the deal mitigates some of the direct negative consequences of publishing paywalled journals and publishing in paywalled journals. One of the consequences of papers appearing in paywalled journals is that they are less likely to be cited and otherwise used on the Internet and beyond. And, as open resources like Wikipedia grow and grow in importance, this will become more true. This is a potentially powerful force for driving people to publish in a more open way, and, if anything, supporters of openness should be working to amplify this effect. But this deal does the opposite – it significantly dilutes the negative impacts of publishing in Elsevier’s paywalled journals, and thereby almost certainly will help prolong the life of the paywalled journal business model.

I realize that not making this deal would weaken Wikipedia in the short-run. But I am certain it would strengthen it in the long-run by quickening the arrival of a truly open scientific literature, and I think we are all in this for the long-run.

Wikipedia got too little from Elsevier

Even if you accept that this kind of deal has to be made, I think it’s a bad deal. Elsevier got great PR, significant tangible financial benefits, and several clear intangible benefits. An exchange for this, they’ve given away almost nothing. To me this was a missed opportunity related to the framing of this as a “donation”. If you’re asking for a donation, you don’t make demands. But it seems like Wikipedia was in a good position to ask for something that would benefit its readers in a much bigger way, such as Elsevier letting everyone through their paywall when following links from Wikipedia.

I obviously can’t guarantee Elsevier would have agreed to this, and maybe Wikipedia tried to negotiate for more, but it does strike me that Wikipedia undervalued itself with this arrangement.

Will this effect how articles are linked from Wikipedia?

One of the many things I love about Wikipedia is that there is a clear bias in favor of sources that are available for free online to everyone. This is obviously part philosophical – people who put the most time into building Wikipedia are obviously true believers in openness and almost certainly are biased in favor of providing open sources whenever possible. But some of this is also practical. Almost by definition if you can not access a source, you are unlikely (and should not) cite it. You can see this effect clearly in academic scientists who have only a weak bias towards citing open sources because they have access to most papers and don’t think about access when choosing what to cite. I don’t question the commitment of Wikipedians to openness. There are plenty of cases where people cite freely available versions of papers (e.g. preprints) instead of official paywalled versions. I just worry that easy access to paywalled papers will increase the number of times the paywalled version is cited in lieu of others (like free copies in PubMed Central). Obviously, there are ways to mitigate this – bots that check citations and add open ones. But it warrants watching.

And I’m not in any way suggesting that people should systematically reject citing paywalled sources. Sometimes information is fungible – there are many sources that one could cite for a particular fact – but this is obviously not always the case. Clearly for Wikipedia to be successful in the current environment, it has to be based on, and cite, a lot of paywalled sources.

Science journal articles are not like books

Several people have made the comparison between book citations and journal articles. But there are crucial differences. First, there is a real viable alternative to paywalled journals right now, and I would argue that it is in Wikipedia’s interest to support that alternative by not making things too easy for paywalled journals. Unfortunately, the same is not true for books, even academic ones. But even with the generally poor accessibility of books, I wonder if Wikipedians would support a deal with Amazon in which prolific edits got Kindle’s with free access to all Amazon e-books in exchange for providing links to Amazon when the books were cited (this was suggested by someone on Twitter but I can’t find the link)? I doubt it, yet to me this is almost exactly analogous to this Elsevier deal. In any case, the main point is that the situation with books is really bad, but that isn’t a good reason not to make the situation for journal articles better.

Wikipedia rocks

All that said, I hope this issue is behind us. It was painful to see myself being portrayed as a critic of Wikipedia. I am not. I could not love Wikipedia more than I do. I use it every day. It is one of the best advertisements for openness out there, and I can even see an argument that says that if deals with the devil make Wikipedia better, then this benefits openness far more than it hurts it. So let’s just leave it at that. I’ve enjoyed all the conversation about this issue, and I look forward to doing anything I can to make Wikipedia better and better in the future.

Promoting a positive psychology self-help book with a Wikipedia entry

This edition of Mind the Brain continues an odd and fascinating story of an aggressive promotion of a positive psychology self-help book. In this chapter, I tell how the promotion is being aided by the author’s son creating a laudatory … Continue reading »

The post Promoting a positive psychology self-help book with a Wikipedia entry appeared first on PLOS Blogs Network.