Research:Onboarding new Wikipedians/Rollout

Contact

Wikimedia Foundation

This page documents a completed research project.

On February 11th, Extension:GettingStarted was deployed on 29 wikis. Later, it was updated to the current state of 30 wikis, including all of the top 10 Wikipedias by pageviews.

The purpose of this study is to measure the scale at which GettingStarted operates (e.g. how many newcomers on Wikimedia Projects received a GettingStarted intervention?) and to get a sense for the impact that the new features have on newcomer behavior.

Research questions

RQ 1: How is GettingStarted being used?

How is GettingStarted being used?
1. How many newly registered users saw each CTA?
2. How many of those editors edit -- through GS or otherwise?
Are GettingStarted edits reverted more often than non-GettingStarted edits?

RQ 2: How has GettingStarted affected newcomer activation and productivity?

How did the proportion of new editors (editor activation) change after GettingStarted was deployed?
How did the proportion of productive new editors (editor productivity) change after GettingStarted was deployed?

Methods

Code repository: https://github.com/halfak/Measuring-the-impact-of-GettingStarted

Deployment wikis

Based on config and Server admin log we can determine when GettingStarted was deployed.

GettingStarted deployments

wiki	GettingStarted deployment	Suggestions deployed
astwiki	2014-02-11 18:13:00	NA
bswiki	2014-02-11 18:13:00	NA
cawiki	2014-02-11 18:13:00	NA
dawiki	2014-02-11 18:13:00	NA
dewiki	2014-02-11 18:13:00	NA
elwiki	2014-02-11 18:13:00	NA
enwiki	2014-02-11 18:13:00	2014-02-11 18:13:00
eswiki	2014-02-11 18:13:00	2014-02-11 18:13:00
fawiki	2014-02-11 18:13:00	NA
frwiki	2014-02-11 18:13:00	NA
fowiki	2014-02-11 18:13:00	NA
glwiki	2014-02-11 18:13:00	NA
hewiki	2014-02-11 18:13:00	NA
huwiki	2014-02-11 18:13:00	NA
iswiki	2014-02-11 18:13:00	NA
itwiki	2014-02-11 18:13:00	NA
kowiki	2014-02-11 18:13:00	NA
lbwiki	2014-02-11 18:13:00	NA
mkwiki	2014-02-11 18:13:00	NA
mlwiki	2014-02-11 18:13:00	NA
nlwiki	2014-02-11 18:13:00	NA
plwiki	2014-02-11 18:13:00	NA
ptwiki	2014-02-11 18:13:00	NA
ruwiki	2014-02-11 18:13:00	NA
simplewiki	2014-02-11 18:13:00	NA
svwiki	2014-02-11 18:13:00	2014-02-27 00:18:00
viwiki	2014-02-11 18:13:00	NA
ukwiki	2014-02-11 18:13:00	2014-02-11 18:13:00
zhwiki	2014-02-11 18:13:00	2014-02-11 18:13:00
jawiki	2014-02-27 00:18:00	NA

Measuring usage

In order to measure the usage of GettingStarted, we observe and compare the number of newly registered users across Wikimedia projects with the number of users with a recorded impression of GettingStarted (see Schema:GettingStartedRedirectImpression). We also observe the number of edits made via GettingStarted through the application of a change tag: "gettingstarted edit".

Assuming a natural experiment

In order to address RQ 2, we'll be assuming that a natural experiment took place immediately after GettingStarted was deployed. We take advantage of this by comparing metrics of new user activation and productivity before and after deployment. Since the only way to take advantage of GettingStarted's functionality is to be served a CTA immediately after registering an account, there shouldn't be substantial concern about measuring those editors who registered immediately before GettingStarted's deployment.

As opposed to controlled experiments, natural experiments have the potential for confounds to affect inference about causation. A trend that was taking place in a wiki independent of the deployment of the GettingStarted deployment will look like an effect of GettingStarted in the analysis. Thus, it's important when viewing the results to consider this potential issue.

Sample periods

Power analysis. The p-value of a $\chi ^{2}$ test is plotted by number of observation for differing levels of baseline and change in proportion. A horizontal line is plotted at p=0.05 and a vertical line is plotted at the # of observations to be sampled.

In order to compare new editor fitness before and after deployment, we sampled newly registered users from the two weeks immediately before and after the deployment dates. Figure #Natural experiment sample periods depicts these sample periods visually.

Natural experiment sample periods. A conceptual diagram depicts the sample periods before and after deployment of mw:Extension:GettingStarted.

In order to determine how many observations would need to be sampled, we performed a power analysis for several baseline rates and expected changes. Figure #Power analysis plots the p-value of a Chi-squared test for various levels of baselines and changes. We chose a minimum number of observations at 500 since that was the smallest number of observations that will still let us identify significance for large effects. We define "large effects" as twice the observed effect in English Wikipedia for GettingStarted (which ranged from 1.5-3% depending on the metric^[1], so we settled on 5%). 16 wikis had at least 500 newly registered users in the sample periods (es, fr, zh, ru, de, pt, it, fa, nl, pl, vi, sv, uk, ko, hu, he, el). We set the maximum number of observations at 2000 since most changes would appear to be significant at that number of observations and setting an upper bound reduces the processing time necessary.

Comparison

Boolean measures

New editor rate (new editors / newly registered user)
Productive new editor rate (productive new editors / newly registered user)
Returning new editor rate (returning new editors / newly registered user)

Differences in proportions between before and after periods are identified using a en:Chi-squared test.

Scale measures

Revisions in 24h
productive edits in 24h
edit sessions in first week
time spent editing in first week

Differences in expected values between before and after periods are identified using a logged en:t-test.

Results

RQ 1: How is GettingStarted being used?

What proportion of users saw/used a GettingStarted CTA?

Group funnel proportions. A proportional funnel is shown for the flow from newly registered users on all projects to wikis with GettingStarted installed (30 wikis) to making edits with GS.

In order to get a sense for what proportion of newly registered users were affected by the deployment of GettingStarted, ran a set of queries to count the number of newly registered users we saw across all Wikimedia projects and tracked their activities as they navigated various funnels that GettingStarted provides. Figure #Group funnel proportions displays the proportion and raw counts of users who made it to each step in the funnel.

Who saw GettingStarted's CTA? Since the GettingStarted experience is currently only available for desktop users. (TODO: link to design docs for GS like experience on mobile) Of the 336,310 newly registered user who registered during our 30 day period after deployment, 273,169 (81.23%) of them registered though the desktop interface. 218,968 of these desktop users registered on one of the 30 wikis were GettingStarted was deployed. 143,627 of the desktop users who registered on GettingStarted wikis saw a GettingStarted CTA. In other words:

42.7% of newly registered users across all projects had the opportunity to take advantage of GettingStarted.

Which CTAs did they see? Of these users who saw a change to the their post-registration experience, the plurality (46.49%) saw the CTA that only asked them if they would like to see suggested tasks for them to perform (see Suggest only CTA). Most often, the "Edit this page" option was not available because the redirect page was a protected article (54.55%) or a page in the Project namespace. The next most common CTA was the combined "Edit this page or Find easy tasks" (see Edit & Suggest CTA). 39.6% of users who saw any CTA saw this one. Finally, 13.91% saw the CTA with only the option to "Edit this page" (see Edit only CTA). These users were predominantly on wikis that lacked suggested tasks (98.9%).

Reverts of GettingStarted edits

Comparison of revert rates.

One of our concerns with tagging edits "via Getting Started edit suggestions" was that it might draw additional attention from Wikipedians and encourage extra scrutiny of edits made through GettingStarted. If GS tagged edits are receiving extra scrutiny, then we'd expect the rate of reverts for these edits to be higher. To check this hypothesis, we gathered all of the 1st edits performed by newcomers who registered during our 30 day period and detected which revisions were reverted within 48 hours.

Figure #Comparison of revert rates plots the difference between the revert rate of 1st edits not made through GettingStarted with the revert rate of 1st edits made through GettingStarted. Note that in all but a couple of cases, the 95% confidence interval's error bars cross the zero line. This means that there's no significant difference between the revert rate for GettingStarted and non-GettingStarted edits on those wikis. However, there are three Wikis that did see significant differences: viwiki and cawiki, saw higher revert rates for GS edits and enwiki saw lower revert rates for GS edits.

It's important to note that, which such a high number of tests at a 95% error cutoff, we should expect to see a 1-2 wikis report a Type I error. With this in mind, the significant differences observed for viwiki and cawiki should be taken with a grain of salt. However, with English Wikipedia, we had such a large number of observations that the result is clearly significant. It appears that GettingStarted edits are reverted significantly less often than than non-GettingStarted edits.

RQ 2: How has GettingStarted affected newcomer activation and productivity?

In order to look for evidence of changes in the activation and productivity due to the introduction of GettingStarted, we used an array of metrics to measure newcomer performance before and after the deployment of GettingStarted.

The figures below plot the difference between metrics before and after the deployment. When the plotted value is above zero, that means an increase in the metric was observed. Overall, the results fail to demonstrate a clear difference in the before and after state of these Wikis.

While some wikis show significant differences under some metrics, this type of statistical error is expected to happen with 95 confidence intervals in about 1/20 tests. Here, we see 10 instances of significant results out of 112 tests:

Dewiki showed a significant drop in the rate of new editors
Plwiki showed a significant increase in the rate of returning new editors
Eswiki, Itwik and Plwiki show a significant increase in the number of productive edits newcomers performed in their first day.
Plwiki saw a significant increase in the number of newcomer edit sessions while Frwiki saw a significant decrease
Plwiki and Ukwiki saw a significant increase in the amount of time spent editing while Frwiki saw a significant decrease

Given the lack of a clear trend cross-wikis and the lack of an obvious correlation between the availability of suggested tasks in the user experience and performance outcomes, it's not clear from these results that GettingStarted is having a measurable effect in the short term. Future work may reduce noise and potential confounds by running a controlled experiment on these wikis.

Boolean measures

Difference in new editor rates (ns0). The difference in the proportions of new editors (main NS only) before and after the deployment of mw:Extension:GettingStarted is plotted with 95% CI error bars.

The difference in the proportions of productive new editors before and after the deployment of mw:Extension:GettingStarted is plotted with 95% CI error bars.

Difference in productive editor rates. The difference in the proportions of productive new editors before and after the deployment of mw:Extension:GettingStarted is plotted with 95% CI error bars.

Difference in returning editor rates. The difference in the proportions of returning editors before and after the deployment of mw:Extension:GettingStarted is plotted with 95% CI error bars.

Scalar measures

Difference in 24h article edits. The difference in the log mean article revisions saved in newcomers first day before and after the deployment of mw:Extension:GettingStarted is plotted with 95% CI error bars.

Difference in 24h productive edits. The difference in the log mean productive edits saved in newcomers first day before and after the deployment of mw:Extension:GettingStarted is plotted with 95% CI error bars.

Difference in edit sessions (first week). The difference in the log mean edit sessions saved in newcomers first week before and after the deployment of mw:Extension:GettingStarted is plotted with 95% CI error bars.

Difference in time spent editing (first week). The difference in the log mean approximate time spent editing saved in newcomers first week before and after the deployment of mw:Extension:GettingStarted is plotted with 95% CI error bars.

References

↑ Research:OB4

[1] Research:OB4

[1]