Research talk:Automated classification of draft quality/Work log/2016-09-28
Add topicAppearance
Latest comment: 8 years ago by EpochFail in topic Wednesday, September 28, 2016
Wednesday, September 28, 2016
[edit]OK looks like my last queries have failed, so I'm going to run them directly on labsDB in PAWS.
https://quarry.wmflabs.org/query/12795 -- Surviving articles https://quarry.wmflabs.org/query/12796 -- Deleted articles
--EpochFail (talk) 17:26, 28 September 2016 (UTC)
Here's what I've got:
SELECT
page_title,
rev_id AS creation_rev_id,
rev_timestamp AS creation_timestamp,
FALSE AS archived,
"OK" AS draft_quality
FROM revision
INNER JOIN page ON
rev_page = page_id
WHERE
rev_timestamp BETWEEN "20150827" AND "20160827" AND
rev_parent_id = 0 AND
page_namespace = 0
UNION ALL
SELECT
ar_title,
ar_rev_id AS creation_rev_id,
ar_timestamp AS creation_timestamp,
True AS archived,
IF(log_comment REGEXP "WP:CSD#G3\\|", "vandalism",
IF(log_comment REGEXP "WP:CSD#G10\\|", "attack",
IF(log_comment REGEXP "WP:CSD#G11\\|", "spam",
IF(log_comment REGEXP "WP:CSD#A11\\|", "hoax", "OK")))) AS draft_quality
FROM archive
LEFT JOIN logging speedy_delete ON
log_namespace = ar_namespace AND
log_title = ar_title AND
log_type = "delete" AND
log_action = "delete" AND
log_comment LIKE "[[WP:CSD#%" AND
log_comment REGEXP "WP:CSD#(G3|G10|G11|A11)\\|" AND
log_timestamp > ar_timestamp
WHERE
ar_timestamp BETWEEN "20150827" AND "20160827" AND
log_timestamp BETWEEN "20150827" AND "20160827" AND
ar_parent_id = 0 AND
ar_namespace = 0;
It should theoretically run on labsDB, but I think it might have more of a chance to finish, so I'll be running on both labs and analytics databases. --EpochFail (talk) 17:34, 28 September 2016 (UTC)
Well, that's still running. So I guess I'll just let it go overnight and see what we've got in the morning. --EpochFail (talk) 22:33, 28 September 2016 (UTC)