Jump to content

Research:New page reviewer impact analysis/Number of new page patrollers

From Meta, a Wikimedia project coordination wiki

How has the implementation of the New Page Reviewer Right impacted the number of people doing new page reviews?

The new page reviewer right restricted the ability to review new pages to only those users vested with the right. Here we provide a very simple metric to find out the number of users performing new page patrol around the implementation of the user-right. Described below is the workflow to get the aforementioned metric.

Getting data

[edit]

Number of users doing page reviews per month is obtained via the below SQL query after running it on Quarry.

use enwiki_p;
SELECT EXTRACT(YEAR FROM DATE_FORMAT(log_timestamp,'%Y%m%d%H%i%s')) AS `year`, 
                    EXTRACT(MONTH FROM DATE_FORMAT(log_timestamp,'%Y%m%d%H%i%s')) AS `month`, 
                    log_user,
                    count(*) as reviews_performed
                        from logging_logindex 
                            WHERE log_type='pagetriage-curation' 
                            AND log_timestamp 
                            between 20151101000000 
                            and 20170801000000 GROUP BY `year`, `month`, log_user
                        ORDER BY `year` ASC, 
                        `month` ASC;

Parsing dataset

[edit]

After downloading the above dataset, it was parsed through the below python script to generate the graph:

dataset parsing
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import matplotlib.dates as mdates
import pdb

review_usersset = 'quarry-20824-users-doing-page-reviews-run196795.tsv'
col = 'log_user'
df = pd.read_csv(review_usersset, delimiter='\t')
# get total years to iterate on
years = df['year'].unique()
review_users = np.array([])
avg_reviews = np.array([])
for y in years:
    df_tmp = df[df['year'] == y]
    # Get unique months in the year
    months = df_tmp['month'].unique()
    for m in months:
        reviewers_per_month = df_tmp[df['month'] == m][col].count()
        # Add per month review users to the array
        review_users = np.append(review_users, reviewers_per_month)
        # Add per month average reviews to the array
        avg_reviews = np.append(avg_reviews, df_tmp[df['month'] == m]['reviews_performed'].mean())

# Generate year-months for x-axis
months = pd.date_range('2015-11', periods=review_users.shape[0], freq='1m')
f = open('reviewers_parser.wiki','w')
for i, m in enumerate(months):
    f.write('|-\n|{:%Y-%m}\n|{}\n|{}\n'.format(m, review_users[i], avg_reviews[i]))
f.close()
multiple_bars = plt.figure()

plt.plot(months, review_users, label="users doing review")
plt.plot(months, avg_reviews, label="mean review per user that month")
plt.ylabel('Average editors reviewing / Mean reviews')
plt.xlabel('Months')
plt.legend()
xfmt = mdates.DateFormatter('%d-%m-%y')
plt.axvline('2016-11', color='b', linestyle='dashed', linewidth=2, label="NPP right implementation")
plt.text('2016-11', plt.gca().get_ylim()[1]+5,'NPP user right implementation', ha='center', va='center')
plt.show()


Results

[edit]

The number of users doing New Page Patrol has been continually decreasing as shown by the plot.

Number of users performing new page patrol

Some useful observations can be made:

  • The users performing new page patrol has been constantly decreasing.
  • The number of user doing NPP showed a downward spike just after the NPP rights implementation.
  • The average reviews per user in each month remained roughly the same before the November 2016, then began to increase a bit. This means that the users having the New Page Patrol right have been doing more work than before.


Dataset

[edit]
Year-Month # of reviewers Average reviews per user
2015-11 232.0 88.84913793103448
2015-12 216.0 87.39814814814815
2016-01 224.0 93.00446428571429
2016-02 205.0 89.82926829268293
2016-03 204.0 99.6029411764706
2016-04 220.0 107.11363636363636
2016-05 236.0 95.02542372881356
2016-06 212.0 51.320754716981135
2016-07 205.0 63.765853658536585
2016-08 241.0 50.024896265560166
2016-09 240.0 57.40833333333333
2016-10 225.0 98.72444444444444
2016-11 276.0 56.97463768115942
2016-12 176.0 90.57954545454545
2017-01 151.0 111.51655629139073
2017-02 179.0 121.50837988826815
2017-03 155.0 109.14838709677419
2017-04 151.0 137.20529801324503
2017-05 171.0 126.39766081871345
2017-06 162.0 121.44444444444444
2017-07 170.0 126.55882352941177