Jump to content

Research talk:Active editor spike 2015/Work log/2015-04-08

Add topic
From Meta, a Wikimedia project coordination wiki

Wednesday, April 8, 2015

[edit]

Today, I'm going to look at registration rates and activation rates broken down by mobile/desktop. To do this, I'm going to make use of Schema:PageContentSaveComplete and Schema:ServerSideAccountCreation -- both of which record the mobile/desktop source.

Registrations

[edit]
> select left(timestamp, 6) AS month, count(*) FROM ServerSideAccountCreation_5487345 GROUP BY 1;
+--------+----------+
| month  | count(*) |
+--------+----------+
| 201306 |   154651 |
| 201307 |   306863 |
| 201308 |   341139 |
| 201309 |   357668 |
| 201310 |   356127 |
| 201311 |   350107 |
| 201312 |   340671 |
| 201401 |   359553 |
| 201402 |   317216 |
| 201403 |   323158 |
| 201404 |   335925 |
| 201405 |   363648 |
| 201406 |   326076 |
| 201407 |   421697 |
| 201408 |   716665 |
| 201409 |   543066 |
| 201410 |   536955 |
| 201411 |   525470 |
| 201412 |   499713 |
| 201501 |   532291 |
| 201502 |   494341 |
| 201503 |   527792 |
| 201504 |   134090 |
+--------+----------+
23 rows in set (7.27 sec)

It looks like we have registrations going back to July, 2013.

> select left(timestamp, 6) AS month, count(*) FROM ServerSideAccountCreation_5233795 GROUP BY 1;
+--------+----------+
| month  | count(*) |
+--------+----------+
| 201302 |     2468 |
| 201303 |   332050 |
| 201304 |   218731 |
| 201305 |   293814 |
| 201306 |   126206 |
+--------+----------+
5 rows in set (0.77 sec)

We can go back to March, 2013 with the previous version of the schema. It looks like the change only affected a description.

OK. So now to gather monthly registration rates per wiki.

SELECT
        wiki,
        month,
        mobile,
        SUM(registrations)
FROM (
        (SELECT
                wiki,
                LEFT(timestamp, 6) AS month,
                event_displayMobile AS mobile,
                COUNT(DISTINCT event_userId) AS registrations
                FROM ServerSideAccountCreation_5487345
        WHERE event_isSelfMade = 1
        GROUP BY wiki, month, mobile)
        UNION ALL
        (SELECT
                wiki,
                LEFT(timestamp, 6) AS month,
                event_displayMobile AS mobile,
                COUNT(DISTINCT event_userId) AS registrations
        FROM ServerSideAccountCreation_5233795
        WHERE event_isSelfMade = 1
        GROUP BY wiki, month, mobile)
) AS wiki_month_registration_counts
WHERE month BETWEEN "201303" AND "201503"
GROUP BY wiki, month, mobile;

OK... That's running. Now to gather the edit data.

Edits

[edit]
> select left(timestamp, 6) AS month, count(*) FROM PageContentSaveComplete_5588433 GROUP BY 1;
+--------+----------+
| month  | count(*) |
+--------+----------+
| 201311 |   654534 |
| 201312 | 23699979 |
| 201401 | 29744876 |
| 201402 | 23836212 |
| 201403 | 24008427 |
| 201404 | 22866635 |
| 201405 | 28359010 |
| 201406 | 24691518 |
| 201407 | 26508595 |
| 201408 | 25410517 |
| 201409 | 23319162 |
| 201410 | 26306705 |
| 201411 | 26241917 |
| 201412 | 27645244 |
| 201501 | 26267808 |
| 201502 | 23990825 |
| 201503 | 31280861 |
| 201504 |  6764302 |
+--------+----------+
18 rows in set (6 min 19.56 sec

What? Did we really only enable PageContentSaveComplete at the end of 2013? I thought it had predated ServerSideAccountCreation. Looks like the revision history disagrees. Oh well. Mobile hasn't been around that long anyway.

Time to write the query. --Halfak (WMF) (talk) 16:26, 8 April 2015 (UTC)Reply


OK Query is:

SELECT
        wiki,
        LEFT(timestamp, 6) AS month,
        rev_user AS user_id,
        SUM(event_isMobile) AS mobile_edits,
        SUM(!event_isMobile) AS desktop_edits
FROM revision
INNER JOIN log.PageContentSaveComplete_5588433 ON
      event_revisionId = rev_id AND
      wiki = DATABASE()
WHERE
      rev_timestamp BETWEEN "201312" AND "201504" AND
      timestamp	BETWEEN	"201312" AND "201504";

Since the user_id field isn't in the schema, I need to join against each wiki's revision table.  :( This is a good excuse to fix up the utility that enables that. --Halfak (WMF) (talk) 17:10, 8 April 2015 (UTC)Reply

https://github.com/halfak/Multiquery Done

Now to run that query against the whole set of DBs. --Halfak (WMF) (talk) 19:56, 8 April 2015 (UTC)Reply