Jump to content

Research talk:Autoconfirmed article creation trial/Work log/2017-06-30

Add topic
From Meta, a Wikimedia project coordination wiki

Friday, June 30, 2017

[edit]

My goal for today is to determine how long the trial will need to run in order to get statistically significant results.

A trial should occur over a multiple of weeks due to the periodic nature of work-week/weekend. We'll need to look at all newcomers when measuring outcomes because we don't know who *would have* created articles during the trial period.

Assumptions:

  • We'll want to look at measures of productivity and retention
  • The most recent couple of weeks will represent the trial period well

My plan is to work from https://analytics.wikimedia.org/dashboards/standard-metrics/#projects=enwiki/metrics=(Beta)%20Monthly%20New%20Editors backwards using standard metrics to get some baseline rates. Then using these baseline rates, I'll be doing a power analysis based on substantial changes in rates.

Oh wait. It looks like that won't work because the data ends in 2016.

OK time to run some queries. After a bunch of digging, I got some stats that the analytics team have prepared.

hive (wmf)> SELECT metric, value FROM mediawiki_metrics WHERE wiki_db = "enwiki" and dt = "2017-04-01" AND metric LIKE "monthly%";
OK
metric	value
monthly_new_editors	47620
monthly_new_registered_users	149449
monthly_surviving_new_editors	1626


So April has 30 days, so we have roughly 149449/(30/7) = 34871 new_registered_users, 47620/(30/7) = 11111 new_editors, and 1626/(30/7) = 379 surviving_new_editors.

OK. So that gives me a useful baseline. It looks like roughly 11111/34871 = 31.9% of registered users will make an edit and 1626/11111 = 14.6% of editors will stick around.


Even before I start with the power analysis, I'm pretty sure we're going to see significant differences at 1% change for this number of observations. Next I'll be plotting these values on a fancy chi^2 graph. --Halfak (WMF) (talk) 18:54, 30 June 2017 (UTC)Reply


OK! Time for power analysis plots.

P values are plotted for a power analysis of the baseline survival rate (surviving new editors/new editors) in English Wikipedia for three change thresholds (1%, 2%, 3%). Vertical lines represent the number of new editors per week from April 2017.
Survival rate power analysis. P values are plotted for a power analysis of the baseline survival rate (surviving new editors/new editors) in English Wikipedia for three change thresholds (1%, 2%, 3%). Vertical lines represent the number of new editors per week from April 2017.
P values are plotted for a power analysis of the baseline new editor rate (new editors/new registered users) in English Wikipedia for three change thresholds (1%, 2%, 3%). Vertical lines represent the number of new registered users per week from April 2017.
Edit rate power analysis. P values are plotted for a power analysis of the baseline new editor rate (new editors/new registered users) in English Wikipedia for three change thresholds (1%, 2%, 3%). Vertical lines represent the number of new registered users per week from April 2017.

These plots show that we should expect enough observations to see significance if the survival rate or edit rate increase or decrease by 1% during the trial period if it lasts for a week. If we want to run a *controlled* experiment we'll need two weeks worth of observations. If we want to be *absolutely sure*, we could run the trial for two weeks and expect to get vanishingly small p-values. --Halfak (WMF) (talk) 21:07, 30 June 2017 (UTC)Reply