Research talk:The Co-op pilot study/Work log/2015-04-04
Add topicSaturday, April 4, 2015
[edit]Today, I want to look at the learners who have signed up at the co-op and take some basic measurements of their behavior.
It looks like I can use en:Category:Co-op_learner to gather a set of learners who have completed a profile page. --EpochFail (talk) 17:26, 4 April 2015 (UTC)
OK... well that required more cleanup than expected, but I think I'm effectively gathering all of the legitimate learners. I found 39.
SQL
|
---|
CREATE TEMPORARY TABLE staging.learner_profile_names
SELECT
page_id,
CONCAT(UCASE(LEFT(CONVERT(profile_title USING utf8), 1)),
REPLACE(SUBSTRING(profile_title, 2), "_", " ")) AS profile_name
FROM (
SELECT
page_id,
SUBSTRING_INDEX(page_title, "/", -1) as profile_title
FROM categorylinks
INNER JOIN page ON page_id = cl_from
WHERE
cl_to = "Co-op_learner" AND
page_namespace = 4 AND
page_title LIKE "Co-op/%"
) AS profile_titles;
CREATE TEMPORARY TABLE staging.learner_profile_names2
SELECT * FROM staging.learner_profile_names;
SELECT * FROM (
SELECT
profile_name,
user_id,
user_name,
user_registration,
"profile name" AS method
FROM staging.learner_profile_names
INNER JOIN user ON user_name = profile_name
UNION ALL
SELECT
profile_name,
user_id,
user_name,
user_registration,
"first edit" AS method
FROM (
SELECT page_id, min(rev_id) AS first_rev_id, profile_name
FROM staging.learner_profile_names2
LEFT JOIN user ON profile_name = user_name
INNER JOIN revision ON page_id = rev_page
WHERE user_name IS NULL
GROUP BY page_id
) AS missing_profile_pages
INNER JOIN revision ON first_rev_id = rev_id
INNER JOIN user ON rev_user = user_id
) AS learners
WHERE
profile_name NOT IN ("Learner profile", "Learner preload") AND
user_name NOT IN ("Jtmorgan", "Jmorgan (WMF)") AND
user_registration >= "20150304";
|
+--------------------+----------+--------------------+-------------------+--------------+ | profile_name | user_id | user_name | user_registration | method | +--------------------+----------+--------------------+-------------------+--------------+ | MColton3 | 24324556 | MColton3 | 20150305051617 | profile name | | LLRungegordon | 24316702 | LLRungegordon | 20150304132432 | profile name | | Maximus2929 | 24334881 | Maximus2929 | 20150306035009 | profile name | | NXKMartialArts | 24368037 | NXKMartialArts | 20150309100430 | profile name | | Blue Gnome Tree | 24374357 | Blue Gnome Tree | 20150309213439 | profile name | | MLODROB | 24362877 | MLODROB | 20150308204407 | profile name | | IngridBetancourt | 24391736 | IngridBetancourt | 20150311123359 | profile name | | Saundra monterroso | 24393463 | Saundra monterroso | 20150311154134 | profile name | | Jackheart314 | 24376996 | Jackheart314 | 20150310035227 | profile name | | Bossofbossess | 24393171 | Bossofbossess | 20150311151112 | profile name | | Md.Saiful Azad | 24400513 | Md.Saiful Azad | 20150312080114 | profile name | | Gfrederic | 24393246 | Gfrederic | 20150311151925 | profile name | | Koopa24 | 24404941 | Koopa24 | 20150312163816 | profile name | | Tfan101 | 24408313 | Tfan101 | 20150312232356 | profile name | | Acad1989 | 24418291 | Acad1989 | 20150313223248 | profile name | | Jnav7 | 24417269 | Jnav7 | 20150313200414 | profile name | | EditorRazu | 24393601 | EditorRazu | 20150311155638 | profile name | | Harvey1257 | 24408217 | Harvey1257 | 20150312230948 | profile name | | Anand kotwal | 24503064 | Anand kotwal | 20150322045411 | profile name | | Shiteshsachan | 24495812 | Shiteshsachan | 20150321110632 | profile name | | Kat.pambid | 24512909 | Kat.pambid | 20150323033330 | profile name | | JamusDoore | 24528575 | JamusDoore | 20150324130911 | profile name | | Brotothextrem | 24510647 | Brotothextrem | 20150322213109 | profile name | | Pmahon628 | 24332327 | Pmahon628 | 20150305212118 | profile name | | Mothusi Masibi | 24539122 | Mothusi Masibi | 20150325131032 | profile name | | Newrunner769 | 24544211 | Newrunner769 | 20150325211804 | profile name | | Morg4kele | 24502509 | Morg4kele | 20150322031751 | profile name | | Lauraejpress | 24563560 | Lauraejpress | 20150327175117 | profile name | | Vhernandez20 | 24586977 | Vhernandez20 | 20150330030700 | profile name | | Daniellecart1 | 24560324 | Daniellecart1 | 20150327112233 | profile name | | Bodb Derg | 24590083 | Bodb Derg | 20150330111337 | profile name | | Grokjtrip | 24602657 | Grokjtrip | 20150331142032 | profile name | | CityofGlendale2015 | 24615102 | CityofGlendale2015 | 20150401172529 | profile name | | Jachterberg9 | 24629277 | Jachterberg9 | 20150403121404 | profile name | | Swasthikprabhu | 24621495 | Swasthikprabhu | 20150402141802 | profile name | | SAVIOR59 | 24368925 | Savior59 | 20150309120420 | first edit | | Eugene Issac | 24380164 | Eugene issac | 20150310115721 | first edit | | Uzi Oz | 24555588 | UziOz | 20150326222509 | first edit | | Robco311 | 24613350 | BeeCeePhoto | 20150401141350 | first edit | +--------------------+----------+--------------------+-------------------+--------------+ 39 rows in set (0.05 sec)
Registration date
[edit]I wonder if there were bursts within which learners joined the coop.
> SELECT LEFT(user_registration, 8) AS date, COUNT(*) FROM staging.coop_learner GROUP BY 1; +----------+----------+ | date | COUNT(*) | +----------+----------+ | 20150304 | 1 | | 20150305 | 2 | | 20150306 | 1 | | 20150308 | 1 | | 20150309 | 3 | | 20150310 | 2 | | 20150311 | 5 | | 20150312 | 4 | | 20150313 | 2 | | 20150321 | 1 | | 20150322 | 3 | | 20150323 | 1 | | 20150324 | 1 | | 20150325 | 2 | | 20150326 | 1 | | 20150327 | 2 | | 20150330 | 2 | | 20150331 | 1 | | 20150401 | 2 | | 20150402 | 1 | | 20150403 | 1 | +----------+----------+ 21 rows in set (0.00 sec)
It looks like the co-op got a steady stream of 1-2 learners per day except for a time period between March 13th and March 22nd. What happened there? --EpochFail (talk) 17:35, 4 April 2015 (UTC)
Time to profile creation
[edit]OK. So I want to include the profile page creation timestamp as well, so I'm going to update the table generating query I have above.
SQL
|
---|
CREATE TABLE IF NOT EXISTS staging.learner_profile (
profile_page_id INT,
profile_name VARBINARY(255),
profile_creation VARBINARY(14),
user_id INT,
user_name VARBINARY(255),
user_registration VARBINARY(14),
method VARCHAR(50)
);
TRUNCATE TABLE staging.learner_profile;
INSERT INTO staging.learner_profile
SELECT
page_id,
profile_name,
min(rev_timestamp) as profile_creation,
user_id,
user_name,
user_registration,
method
FROM (
SELECT
page_id,
profile_name,
user_id,
user_name,
user_registration,
"profile name" AS method
FROM staging.learner_profile_names
INNER JOIN user ON user_name = profile_name
UNION ALL
SELECT
page_id,
profile_name,
user_id,
user_name,
user_registration,
"first edit" AS method
FROM (
SELECT page_id, min(rev_id) AS first_rev_id, profile_name
FROM staging.learner_profile_names2
LEFT JOIN user ON profile_name = user_name
INNER JOIN revision ON page_id = rev_page
WHERE user_name IS NULL
GROUP BY page_id
) AS missing_profile_pages
INNER JOIN revision ON first_rev_id = rev_id
INNER JOIN user ON rev_user = user_id
) AS learners
INNER JOIN revision ON rev_page = page_id
WHERE
profile_name NOT IN ("Learner profile", "Learner preload") AND
user_name NOT IN ("Jtmorgan", "Jmorgan (WMF)") AND
user_registration >= "20150304"
GROUP BY page_id;
|
Now to ask the last question, but about profile creation -- not registration.
> SELECT LEFT(profile_creation, 8) AS date, COUNT(*) FROM staging.learner_profile GROUP BY 1; +----------+----------+ | date | COUNT(*) | +----------+----------+ | 20150306 | 2 | | 20150308 | 1 | | 20150310 | 3 | | 20150311 | 2 | | 20150312 | 2 | | 20150313 | 4 | | 20150314 | 2 | | 20150315 | 2 | | 20150316 | 1 | | 20150317 | 1 | | 20150323 | 2 | | 20150324 | 3 | | 20150326 | 3 | | 20150328 | 1 | | 20150329 | 1 | | 20150330 | 2 | | 20150331 | 3 | | 20150402 | 2 | | 20150403 | 2 | +----------+----------+ 19 rows in set (0.00 sec)
OK. Basically the same answer. I wonder how long between registration and profile creation.
> SELECT DATEDIFF(profile_creation, user_registration) AS days, COUNT(*) AS learners FROM staging.learner_profile GROUP BY 1; +------+----------+ | days | learners | +------+----------+ | 0 | 5 | | 1 | 14 | | 2 | 12 | | 3 | 2 | | 4 | 2 | | 5 | 2 | | 6 | 1 | | 21 | 1 | +------+----------+ 8 rows in set (0.01 sec)
Looks like most profile creations happen 1-2 days after registration. I wonder about those profile creations that happened less than a day after registration. HostBot shouldn't allow for that.
> SELECT user_name, TIMEDIFF(profile_creation, user_registration) FROM staging.learner_profile WHERE DATEDIFF(profile_creation, user_registration) < 1; +------------------+-----------------------------------------------+ | user_name | TIMEDIFF(profile_creation, user_registration) | +------------------+-----------------------------------------------+ | IngridBetancourt | 06:19:30.000000 | | JamusDoore | 04:41:57.000000 | | Vhernandez20 | 16:45:47.000000 | | Grokjtrip | 03:07:46.000000 | | Jachterberg9 | 05:08:58.000000 | +------------------+-----------------------------------------------+ 5 rows in set (0.00 sec)
It looks like en:User:Jachterberg9 got a post from HostBot [1] at 2015-04-03 17:11:25 (UTC) --EpochFail (talk) 18:00, 4 April 2015 (UTC)