Who Does What
Collaboration Patterns on Wikipedia and Their Impact on Article Quality
There are almost four million articles in the English Wikipedia and it is used by almost 16 million users. Some 250,000 new accounts are created every month and about 300,000 editors have edited Wikipedia more than 10 times. Approximately the same number, 300,000 editors, edit Wikipedia every month; of these, about 50,000 perform more than five edits, and 5,000 perform more than 100 edits. By November 24, 2011, a total of 500 million edits had been performed on the English Wikipedia.
While all of these contributions results in an extremely large number of articles, they are not all of uniformly high quality. In fact, Wikipedia now grades the quality of each article using a scale ranging from Featured (the best) to C (the lowest grade). In between these two extremes are Good, A and B articles in decreasing order of quality.
The authors’ research used data mining techniques to understand how high quality articles are generated. Instead of using the standard aggregate statistics that previous research has used (number unique contributors to an article or number of edits made to an article), they decided to approach it from another perspective. They collected a large sample of articles of different quality and extracted every single contribution made to the article from its edit history.
Much of this was unstructured data which had to be structured into a clean dataset. Based on this, they found that there are seven roles people play on an article. For example, “starters” initiate an article while others add references and links to an article (“content justifiers”). Yet others check articles to remove incorrect information (“cleaners”). Moreover, there are specific ways people in these roles implicitly collaborate with each other and these collaboration patterns can help predict if an article is likely to be ranked as a high quality (i.e Featured or Good).
For example articles that have a specific mix of “all round contributors,” “starters,” and “content justifiers” result in the highest quality. This research is important because it helps develop mechanisms to motivate people to contribute in different ways. It also provides a way for Wikipedia to analyze the contributions to identify and automate the quality assessment of articles based on the contribution patterns. Currently the process of quality assessment is manual and is a very arduous task given the large volume of articles. Such an analysis of patterns can also be used to gently nudge contributors to gradually transition from being merely “starters” into “all round contributors” which will help raise the quality of articles. A study like this based on interaction data that is automatically tracked also provides impetus for development new theories to characterize and explain implicit collaborations enabled by today’s social media and crowdsourcing technologies.
Sudha Ram is the McClelland Professor of MIS at the Eller College of Management, University of Arizona. Jun Liu is a recent doctoral graduate from the Eller MIS program.
Published in ACM Transactions on Management Information Systems, 2011.