Research talk:Anonymous editor acquisition/Signup CTA experiment/Work log/2014-05-29
Add topicThursday, May 29th
[edit]OK. So the analysis I ran [Research_talk:Anonymous_editor_acquisition/Signup_CTA_experiment/Work_log/2014-05-28|yesterday]] has some substantial issues. For example, I was counting the number of anons who showed up before the experiment even began. In order to make a good comparison between the conditions, I need to limit my observations to token'd users who had an opportunity to be involved in the experiment. Luckily, there are a few events that were only active during the experiment and one that specifically represents the beginning of the flow of users through the experiment funnel: Schema:SignupExpPageLinkClick
So, I think that all I need to do is run the last query and limit the set of users to those with at least one page link click.
SELECT
wiki,
bucket,
IF(first_user_id IS NULL,
"pure anon",
IF(first_user_registration IS NULL OR first_user_registration <= 20140502000000,
"old user",
IF(first_user_registration <= 20140519180800,
"tracked user",
"experiment user"))) AS editor_class,
COUNT(*) AS tokens
FROM staging.token_info
WHERE link_clicks > 0
GROUP BY 1,2,3;
... bah. Hold the press. I'm not getting any link clicks for users in the control condition. This is wrong.
> SELECT -> wiki, -> IF(ORD(RIGHT(event_token, 1)) <= ORD("J"), "pre-edit", IF(ORD(RIGHT(event_token, 1)) <= ORD("d"), "post-edit", "control")) AS bucket, -> CONCAT("link click ", event_link) AS event, -> count(*) -> FROM -> log.SignupExpPageLinkClick_8101692 -> WHERE wiki IN ("enwiki", "dewiki", "itwiki", "frwiki") -> AND event_token IS NOT NULL -> GROUP BY 1,2,3; +--------+-----------+---------------------------+----------+ | wiki | bucket | event | count(*) | +--------+-----------+---------------------------+----------+ | dewiki | control | link click create account | 5252 | | dewiki | control | link click edit page | 16729 | | dewiki | control | link click edit section | 37069 | | dewiki | post-edit | link click create account | 4674 | | dewiki | post-edit | link click edit page | 14299 | | dewiki | post-edit | link click edit section | 33477 | | dewiki | pre-edit | link click create account | 4779 | | dewiki | pre-edit | link click edit page | 13423 | | dewiki | pre-edit | link click edit section | 32620 | | enwiki | control | link click create account | 35746 | | enwiki | control | link click edit page | 111620 | | enwiki | control | link click edit section | 175675 | | enwiki | post-edit | link click create account | 33119 | | enwiki | post-edit | link click edit page | 100428 | | enwiki | post-edit | link click edit section | 155819 | | enwiki | pre-edit | link click create account | 32515 | | enwiki | pre-edit | link click edit page | 107158 | | enwiki | pre-edit | link click edit section | 162989 | | frwiki | control | link click create account | 3804 | | frwiki | control | link click edit page | 17112 | | frwiki | control | link click edit section | 25838 | | frwiki | post-edit | link click create account | 3636 | | frwiki | post-edit | link click edit page | 15940 | | frwiki | post-edit | link click edit section | 25340 | | frwiki | pre-edit | link click create account | 3571 | | frwiki | pre-edit | link click edit page | 17418 | | frwiki | pre-edit | link click edit section | 25642 | | itwiki | control | link click create account | 1282 | | itwiki | control | link click edit page | 10943 | | itwiki | control | link click edit section | 21152 | | itwiki | post-edit | link click create account | 1235 | | itwiki | post-edit | link click edit page | 9824 | | itwiki | post-edit | link click edit section | 18759 | | itwiki | pre-edit | link click create account | 1235 | | itwiki | pre-edit | link click edit page | 10798 | | itwiki | pre-edit | link click edit section | 20608 | +--------+-----------+---------------------------+----------+ 36 rows in set (7.18 sec)
There they are. The problem must be in my stats query.
> SELECT -> wiki, -> IF(ORD(RIGHT(token, 1)) <= ORD("J"), "pre-edit", IF(ORD(RIGHT(token, 1)) <= ORD("d"), "post-edit", "control")) AS bucket, -> link_clicks > 0, -> count(*) -> FROM -> token_stats -> GROUP BY 1,2,3; +--------+-----------+-----------------+----------+ | wiki | bucket | link_clicks > 0 | count(*) | +--------+-----------+-----------------+----------+ | dewiki | control | 0 | 67700 | | dewiki | control | 1 | 8 | | dewiki | post-edit | 0 | 61399 | | dewiki | post-edit | 1 | 134 | | dewiki | pre-edit | 0 | 60402 | | dewiki | pre-edit | 1 | 2255 | | enwiki | control | 0 | 348731 | | enwiki | control | 1 | 21 | | enwiki | post-edit | 0 | 317645 | | enwiki | post-edit | 1 | 910 | | enwiki | pre-edit | 0 | 309677 | | enwiki | pre-edit | 1 | 18641 | | frwiki | control | 0 | 52340 | | frwiki | control | 1 | 3 | | frwiki | post-edit | 0 | 47222 | | frwiki | post-edit | 1 | 202 | | frwiki | pre-edit | 0 | 46622 | | frwiki | pre-edit | 1 | 2363 | | itwiki | control | 0 | 30968 | | itwiki | control | 1 | 1 | | itwiki | post-edit | 0 | 27938 | | itwiki | post-edit | 1 | 154 | | itwiki | pre-edit | 0 | 28832 | | itwiki | pre-edit | 1 | 1156 | +--------+-----------+-----------------+----------+ 24 rows in set (2.87 sec)
Sure enough. I'm not getting them. Why? Maybe I'm just not counting right. Let's check total events.
> SELECT -> wiki, -> IF(ORD(RIGHT(token, 1)) <= ORD("J"), "pre-edit", IF(ORD(RIGHT(token, 1)) <= ORD("d"), "post-edit", "control")) AS bucket, -> total_events > 0, -> count(*) -> FROM -> token_stats -> GROUP BY 1,2,3; +--------+-----------+------------------+----------+ | wiki | bucket | total_events > 0 | count(*) | +--------+-----------+------------------+----------+ | dewiki | control | 0 | 67575 | | dewiki | control | 1 | 133 | | dewiki | post-edit | 0 | 61413 | | dewiki | post-edit | 1 | 120 | | dewiki | pre-edit | 0 | 23827 | | dewiki | pre-edit | 1 | 38830 | | enwiki | control | 0 | 348337 | | enwiki | control | 1 | 415 | | enwiki | post-edit | 0 | 318202 | | enwiki | post-edit | 1 | 353 | | enwiki | pre-edit | 0 | 133217 | | enwiki | pre-edit | 1 | 195101 | | frwiki | control | 0 | 52251 | | frwiki | control | 1 | 92 | | frwiki | post-edit | 0 | 47355 | | frwiki | post-edit | 1 | 69 | | frwiki | pre-edit | 0 | 19302 | | frwiki | pre-edit | 1 | 29683 | | itwiki | control | 0 | 30907 | | itwiki | control | 1 | 62 | | itwiki | post-edit | 0 | 28044 | | itwiki | post-edit | 1 | 48 | | itwiki | pre-edit | 0 | 9998 | | itwiki | pre-edit | 1 | 19990 | +--------+-----------+------------------+----------+ 24 rows in set (2.16 sec)
Ha! Now that's even worse. Not a single one of these should be false (0).
I still don't know why this isn't working. I ran the following and events were counted as expected.
SELECT
wiki,
token,
MIN(timestamp) AS first_event,
SUM(event LIKE "CTA impression%") AS cta_impressions,
SUM(event = "CTA impression pre-edit") AS pre_cta_impressions,
SUM(event = "CTA impression post-edit") AS post_cta_impressions,
SUM(event LIKE "link click%") AS link_clicks,
SUM(event LIKE "link click edit%") AS edit_link_clicks,
SUM(event LIKE "link click create%") AS registration_link_clicks,
COUNT(*) AS total_events
FROM (
SELECT
wiki,
event_token AS token,
timestamp,
CONCAT("CTA impression ", event_cta) AS event
FROM
log.SignupExpCTAImpression_8101716
WHERE wiki IN ("enwiki", "dewiki", "itwiki", "frwiki")
AND event_token IS NOT NULL
UNION ALL
SELECT
wiki,
event_token AS token,
timestamp,
CONCAT("link click ", event_link) AS event
FROM
log.SignupExpPageLinkClick_8101692
WHERE wiki IN ("enwiki", "dewiki", "itwiki", "frwiki")
AND event_token IS NOT NULL
) AS token_events
GROUP BY 1,2
LIMIT 10
OK. Current theory is that I missed a column when specifying the table and MySQL didn't raise any warnings about too many/few columns when loading the data in. I just changed the query so that it will load the token_stats directly from the query into a new table. Let's see how that goes. --Halfak (WMF) (talk) 00:45, 30 May 2014 (UTC)
Got it! Woo!
> SELECT -> wiki, -> IF(ORD(RIGHT(token, 1)) <= ORD("J"), "pre-edit", IF(ORD(RIGHT(token, 1)) <= ORD("d"), "post-edit", "control")) AS bucket, -> total_events > 0, -> count(*) -> FROM -> token_stats -> GROUP BY 1,2,3; +--------+-----------+------------------+----------+ | wiki | bucket | total_events > 0 | count(*) | +--------+-----------+------------------+----------+ | dewiki | control | 1 | 67747 | | dewiki | post-edit | 1 | 61578 | | dewiki | pre-edit | 1 | 62690 | | enwiki | control | 1 | 349371 | | enwiki | post-edit | 1 | 319176 | | enwiki | pre-edit | 1 | 328935 | | frwiki | control | 1 | 52375 | | frwiki | post-edit | 1 | 47466 | | frwiki | pre-edit | 1 | 49023 | | itwiki | control | 1 | 30989 | | itwiki | post-edit | 1 | 28111 | | itwiki | pre-edit | 1 | 30007 | +--------+-----------+------------------+----------+ 12 rows in set (4.63 sec)
OK. Back to the thing I was originally going to do -- limit my counts to tokens that showed up during the experiment.
SELECT
wiki,
bucket,
IF(first_user_id IS NULL,
"pure anon",
IF(first_user_registration IS NULL OR first_user_registration <= 20140502000000,
"old user",
IF(first_user_registration <= 20140519180800,
"tracked user",
"experiment user"))) AS editor_class,
COUNT(*) AS tokens
FROM staging.token_info
WHERE link_clicks > 0
GROUP BY 1,2,3;
Generating an intermediary table just got a lot slower. I'll just let the table loading query run for a bit and come back if it isn't too late. --Halfak (WMF) (talk) 01:23, 30 May 2014 (UTC)