Research talk:Newsletter/2015/April

See also the talk page of the Signpost version

Re: Misalignment Between Supply and Demand of Quality Content in Peer Production Communities

Latest comment: 9 years ago1 comment1 person in discussion

I did not read the article at the time this review was published but now I managed to and I kept some notes which I may as well post here.

The article has major issues, mostly related to the usage of wikiproject assessments. Particularly, the dataset is restricted to assessed articles: impossible to tell how representative it is, except for Russian Wikipedia where it's clearly insufficient (authors improve their usage of assessments in cscw2015-improvementprojects.pdf ). They also fail to recognise that requirements for classes differ depending on the topic (e.g. fungi vs. countries). Assessing an article is in itself "spent effort" (sometimes a way to claim ownership), which is not considered.
The limitations of "all pageviews are equal" are also understated: the authors assume that visitors of high-demand articles want a "featured article" thing, i.e. almost a book on the topic, while maybe they are just using the article (e.g. on a country) as portal/index to access specific articles or information.
The interpretation of the collected data is dubious. The demand classes are arbitrary (determined by how many articles there are in the corresponding assessment class). 50 % of the high demand is due to a temporary surge. An high demand article with "good article" status is considered "insufficient quality", while by definition GA means the quality is sufficient. More reasonable would be to say we're failing users of high demand articles if they are C class or less, and in general perhaps if class quality is less than half of demand class; this would give ~50k "insufficient quality" articles instead of the ~700k found by authors.
Authors also incorrectly assume that quality-demand misalignment is consequence of unoptimal distribution of effort (editors focusing on low-demand topics and ignoring high-demand topics), while it can very well be the opposite. For instance, a surge in interest for a topic could make it high demand and attract many recentist contributions, making it then fall to C class due to criterion "contains much irrelevant material", for the reason opposite to what the authors speculate. Moreover, this is "extra effort" which ultimately results in "insufficient quality" a month later when the trivia is no longer interesting; hence "insufficient quality" assessed now doesn't tell us anything on whether the demand was met X months ago.
Pursuing a better alignment, as measured by the authors, would not improve the project. For instance, it would be easy to improve the numbers presented by expelling low-demand articles from featured articles, because this would reduce the size of the "high demand" class and hence the number of articles in said class which have "insufficient quality". However, that wouldn't improve quality at all. Hence it's not clear what the authors are measuring.

--Nemo 12:40, 26 September 2015 (UTC)Reply