Jump to content

Research talk:The Co-op pilot study/Work log/2015-04-11

Add topic
From Meta, a Wikimedia project coordination wiki

Saturday, April 11, 2015

[edit]

OK. So it looks like Soni raised some concerns that my query wasn't matching the real user list.

> SELECT user_registration FROM user
    -> WHERE user_name IN 
    ->   ("BrillLyle", "Christopher2625649908", "David Tornheim", "Komchi", 
    ->     "Negative24", "Quinto Simmaco", "Rotideypoc41352");
+-------------------+
| user_registration |
+-------------------+
| 20130426183842    |
| 20150211074359    |
| 20080715021343    |
| 20141019043349    |
| 20140425230847    |
| 20150228091223    |
| 20090313034539    |
+-------------------+
7 rows in set (0.12 sec)

It looks like all of these users registered before the Co-op started, so I think that excluding them from analysis makes sense. It's good to note that a substantial proportion of old learners somehow found the co-op though.

Next up, I'd like to get a record of all of the invites that HostBot sent to newcomers.

> SELECT COUNT(*) FROM revision WHERE rev_user_text = "HostBot" AND rev_comment LIKE "%you are invited to the Co-op!%";
+----------+
| COUNT(*) |
+----------+
|      907 |
+----------+
1 row in set (0.30 sec)

It looks like we should get 907 records. --EpochFail (talk) 16:45, 11 April 2015 (UTC)Reply

OK. Here's my query to generate the table.

SQL query
CREATE TEMPORARY TABLE staging.coop_invite_post
SELECT
  user_id,
  rev_timestamp AS invite_posted
FROM revision
INNER JOIN page ON rev_page = page_id
INNER JOIN user ON REPLACE(page_title, "_", " ") = user_name
WHERE rev_user_text = "HostBot" AND
      rev_comment LIKE "%you are invited to the Co-op!%";

CREATE TEMPORARY TABLE staging.coop_invite_post2
SELECT * FROM staging.coop_invite_post;

CREATE TABLE IF NOT EXISTS staging.coop_invite (
  user_id INT,
  invite_posted VARBINARY(14),
  edits_before INT,
  main_edits_before INT,
  PRIMARY KEY(user_id)
);
TRUNCATE TABLE staging.coop_invite;
INSERT INTO staging.coop_invite
SELECT
  user_id,
  invite_posted,
  SUM(edits_before),
  SUM(main_edits_before)
FROM (
  (SELECT
    user_id,
    invite_posted,
    COUNT(rev_id) AS edits_before,
    SUM(page_namespace = 0) AS main_edits_before
  FROM staging.coop_invite_post
  LEFT JOIN revision ON
    user_id = rev_user AND
    rev_timestamp < invite_posted
  LEFT JOIN page ON rev_page = page_id
  GROUP BY user_id)
  UNION ALL
  (SELECT
    user_id,
    invite_posted,
    COUNT(ar_rev_id) AS edits_before,
    SUM(ar_namespace = 0) AS main_edits_before
  FROM staging.coop_invite_post2
  LEFT JOIN archive ON
    ar_user = user_id AND
    ar_timestamp < invite_posted
  GROUP BY user_id)
) AS edit_counts
GROUP BY user_id;

SELECT COUNT(*), NOW() FROM staging.coop_invite;

Now to plot some of this. --EpochFail (talk) 18:28, 11 April 2015 (UTC)Reply


Figures!

The number of Co-op invites sent to newcomers per day is plotted.
Invites per day. The number of Co-op invites sent to newcomers per day is plotted.
The number of new Co-op learner profiles created per day is plotted.
New learners per day. The number of new Co-op learner profiles created per day is plotted.
The density of time between user registration and receiving an invitation to the Co-op is plotted
Registration → Invite. The density of time between user registration and receiving an invitation to the Co-op is plotted
The density of time between when a newcomer is invited to the Co-op and when they create their profile is plotted.
Invite → Profile creation. The density of time between when a newcomer is invited to the Co-op and when they create their profile is plotted.

Observations

[edit]
  1. It looks like a lot of editors are receiving co-op invites within an hour of registration because they are matching the 10 edit threshold.
  2. If you're going to create a co-op profile, you'll probably do it within 24 hours of receiving an invite, but you might also wait a week.