Monday, April 14, 2008

Who edits Wikipedia?

In January 2007, Jimmy Wales said "there is a tight set of volunteers around 650 people who do nearly half of all the editing work" at the (English) Wikipedia. Where do those numbers come from, and what do we actually know about editors?

Unfortunately, the
comprehensive page of statistics for the English Wikipedia has nothing beyond October 2006. But we do know that at the end of October 2006, 152,000 registered editors had done 10 or more edits since registering. Of these, only 8,000 were new in October to the "10+" club; the other 144,000 had reached that (cumulative) level prior to October.

The October 2006 statistics also show a few things about editing activity that month: 43,000 registered editors did more than 5 edits; of these, 4,300 editors did more than 100 edits that month. In
an interview published in December 2005, Wales referred to the second group as "very active editors".

Then there is the matter of not all edits being equal.
A research paper using data through late 2006 found (page 5) that "editors with the top 0.1% of edits (about 4,200 users) have contributed over 40% of Wikipedia’s value", where "value" is measured by persistent words in Wikipedia articles, weighting these by the number of times that each specific article was viewed. This measurement explicitly treats the fighting of vandalism as not contributing any value; it also ignores edits to anything other than articles themselves ("mainspace"). Still, it serves to point out that some edits are more valuable than others.

We do know other things about edits: A sampling of edits of articles, in late 2007, found there were about 130,000 such edits per day (that's about 100 per minute), with roughly half by registered editors other than admins and bots, about a third by anonymous (IP) editors, 8 percent by bots, and 7 percent by admins.  (For the last, that doesn't necessarily mean administrative actions; some if not most admins find time to do regular editing as well as admin work.) 

We also have some data on vandalism; in particular,  we know that the rate of vandalism is increasing. The research paper previously (see particularly its figure 7) found an exponentially increasing percentage of page views in the period starting in late 2002 were of "damaged" articles. The good news here is threefold: first, the authors estimated that 42% of these damaged versions had no more than one subsequent view (as likely as not by the person who reverted the damaging edit); second, that the probability was well under 1% that any given page view would be of a damaged article (so visible vandalism is relatively rare); and third, that the introduction of vandalism-fighting bots in 2006 did significantly slowed (but not stop) the increase in the percentage of page views that were of damaged articles.

A second source showing increasing vandalism is that sampling of article edits, done in late 2007.  It found that the revert rate (edits that were reverted, plus edits that did the revert) had risen significantly; 6% of all article edits in 2004, 11% in 2005, 14% in 2006, and 16% in 2007 (through October). The last figure means that in 2007, about one in twelve edits was vandalism, and another one in twelve edits was to revert that vandalism: limited gratification for vandals, given the speed that most vandalism is reverted, and a tremendous waste of time for editors. At the extreme, this can result in a situation described in March 2008 by one of the editors active in the Professional Wrestling WikiProject: "Most of our work is focused on reverting vandalism, as most active wrestlers are targets of daily IP and new-user vandalism."  

Not surprisingly, the late 2007 sampling of edits shows that anonymous IP editors are the most likely to be reverted, by far: about a fifth of their edits are reverted, compared to 5% for registered editors and 1% for bots.  To put that into perspective: in 2006 and 2007, the number of article edits by anonymous IP editors ranged from 30,000 to 50,000 edits per day,  with registered editors, as a group (excluding bots and admins), doing roughly 50% more edits per day than IP editors.

To wrap up this post (there are more statistics available, but that's for another time), one of the curiosities of the English Wikipedia is the huge number of registered editors, of whom about two-thirds have never done even a single edit. New user registration has on occasion been in excess of 10,000 per day, and the total number of such registrations is in excess of 7 million (there are almost 6.9 active million registered accounts as of April 14, 2008; but a newly registered editor will get a userid higher than 7,000,000, meaning that over a hundred thousand accounts have been effectively deleted - for example, with name usurpation.) 

Given the registration figures, one thought that occurs to me every now and again is that if the English Wikipedia charged five dollars (U.S.) for a new account, as does, then even if registered accounts dropped by 90%, to only 600 or so a day, the resulting revenue would still pay for more than 20% of the Wikimedia Foundation's current budget. And there certainly would be much less vandalism by newly registered editors; $5 isn't much, but it's an effective barrier for someone who knows that a half-dozen or so vandalizing edits will result in an indefinite block of further editing.

No comments: