Research talk:Onboarding new Wikipedians/Rollout/Work log/2014-03-20
Add topicThursday, March 20th
[edit]First thing first, I need to move my analysis from yesterday to the main page. Then I need to pick up the analysis of the first month of GettingStarted. My primary goal with the first 30 days analysis is to figure out the scale at which GettingStarted operates. It will be very similar to the two weeks funnel analysis that I performed earlier, but I'll focus on keeping it simple and generating cross-wiki comparisons. OK. To main document. --Halfak (WMF) (talk) 14:04, 20 March 2014 (UTC)
I just added a bunch of content and fleshed out some of the methods section. Also this figure explaining the natural experiment:
--Halfak (WMF) (talk) 19:06, 20 March 2014 (UTC)
Now it's time to think about what will help us best understand how many newcomers GS is affecting.
I want to know:
- How many newly registered users showed up on all projects in the month post-deployment?
- How of those users registered via desktop?
- How many of those users registered on a GS wiki? (This will be tricky because jawiki came late)
- How many of those users saw a CTA and which one?
- No CTA
- Suggest only
- Edit only
- Edit or suggest
- How many of those users made an article edit in the first 24h
First things first. How many users registered cross-projects? We can use Schema:ServerSideAccountCreation to figure that out.
SET @deployment = "20140211181300"; SELECT COUNT(*) FROM ( SELECT DISTINCT wiki, event_userId as user_id FROM ServerSideAccountCreation_5487345 WHERE event_isSelfMade AND timestamp BETWEEN @deployment AND DATE_FORMAT(DATE_ADD(@deployment, INTERVAL 30 DAY), "%Y%m%d%H%i%S") ) AS uniques +----------+ | COUNT(*) | +----------+ | 336310 | +----------+ 1 row in set (1 min 11.51 sec)
OK. 336,310 newly registered users cross-project.
> SELECT -> not mobile as desktop, -> COUNT(*) -> FROM ( -> SELECT -> wiki, -> event_userId as user_id, -> event_displayMobile as mobile -> FROM ServerSideAccountCreation_5487345 -> WHERE -> event_isSelfMade AND -> timestamp BETWEEN -> @deployment AND -> DATE_FORMAT(DATE_ADD(@deployment, INTERVAL 30 DAY), "%Y%m%d%H%i%S") -> GROUP BY 1,2 -> ) AS uniques -> GROUP BY 1; +---------+----------+ | desktop | COUNT(*) | +---------+----------+ | 0 | 63141 | | 1 | 273169 | +---------+----------+ 2 rows in set (27.24 sec)
273169/336310 = 0.8123 or 81.23% desktop user registrations.
> SELECT -> wiki IN ("astwiki", "bswiki", "cawiki", "dawiki", "dewiki", "elwiki", -> "enwiki", "eswiki", "fawiki", "frwiki", "fowiki", "glwiki", -> "hewiki", "huwiki", "iswiki", "itwiki", "kowiki", "lbwiki", -> "mkwiki", "mlwiki", "nlwiki", "plwiki", "ptwiki", "ruwiki", -> "simplewiki", "svwiki", "viwiki", "ukwiki", "zhwiki", "jawiki") AS gs_wiki, -> COUNT(*) -> FROM ( -> SELECT DISTINCT -> wiki, -> event_userId as user_id -> FROM ServerSideAccountCreation_5487345 -> WHERE -> event_isSelfMade AND -> NOT event_displayMobile AND -> timestamp BETWEEN -> @deployment AND -> DATE_FORMAT(DATE_ADD(@deployment, INTERVAL 30 DAY), "%Y%m%d%H%i%S") -> ) AS uniques -> GROUP BY 1; +---------+----------+ | gs_wiki | COUNT(*) | +---------+----------+ | 0 | 54201 | | 1 | 218968 | +---------+----------+ 2 rows in set (20.88 sec)
That's 218968/273169 = 0.8016 or 80.16% of desktop users registered on a GettingStarted wiki.
Now, I'm going to assume that users saw only one version of the CTA. This isn't necessarily true. I'm also going to let jawiki's missing registrations due to the two week delay slip, which will be OK just so long as we don't try to draw proportions across this point in the funnel.
CREATE TEMPORARY TABLE staging.month_gs_potentials SELECT ssac.wiki, ssac.event_userId AS user_id FROM ServerSideAccountCreation_5487345 ssac WHERE event_isSelfMade AND NOT event_displayMobile AND ssac.timestamp BETWEEN @deployment AND DATE_FORMAT(DATE_ADD(@deployment, INTERVAL 30 DAY), "%Y%m%d%H%i%S") AND ssac.wiki IN ("astwiki", "bswiki", "cawiki", "dawiki", "dewiki", "elwiki", "enwiki", "eswiki", "fawiki", "frwiki", "fowiki", "glwiki", "hewiki", "huwiki", "iswiki", "itwiki", "kowiki", "lbwiki", "mkwiki", "mlwiki", "nlwiki", "plwiki", "ptwiki", "ruwiki", "simplewiki", "svwiki", "viwiki", "ukwiki", "zhwiki", "jawiki") GROUP BY 1,2; CREATE UNIQUE INDEX wiki_user ON staging.month_gs_potentials (wiki, user_id); CREATE TEMPORARY TABLE staging.month_gs_impressions SELECT wiki, event_userId AS user_id, event_ctaType AS cta_type FROM GettingStartedRedirectImpression_7355552 WHERE timestamp BETWEEN @deployment AND DATE_FORMAT(DATE_ADD(@deployment, INTERVAL 30 DAY), "%Y%m%d%H%i%S") GROUP BY 1,2; CREATE UNIQUE INDEX wiki_user ON staging.month_gs_impressions (wiki, user_id); +---------------------------+----------+ | cta_type | COUNT(*) | +---------------------------+----------+ | NULL | 75341 | | edit current | 19982 | | edit current or suggested | 56878 | | suggested | 66767 | +---------------------------+----------+ 4 rows in set (2.00 sec)
- No CTA: 75341/218968 = 0.3441 or 34.41%
- Just edit: 19982/218968 = 0.0913 or 9.13%
- Just suggest: 66767/218968 = 0.3049 or 30.49%
- Both edit & suggest: 56878/218968 = 0.2598 or 25.98%
That's all I have time for today. --Halfak (WMF) (talk) 21:06, 20 March 2014 (UTC)