Research talk:Wikipedia article creation/Work log/Friday, November 15th
Add topicAppearance
Friday, November 15th
[edit]I don't have much time to hack on this today. I've been waiting for one of my queries to finish. This is the one that gathers all of the pages that *have not* been deleted along with the rev_id of their first edit. Instead of waiting on that to finish, I decided to split up the work.
First, I'll get all of the page stats from the revision table:
CREATE TABLE halfak.nov13_page_revision_stats
SELECT
rev_page AS page_id,
MIN(rev_id) AS first_rev_id,
COUNT(*) AS revisions,
MIN(rev_timestamp) AS first_edit,
MAX(rev_timestamp) AS last_edit
FROM revision
WHERE rev_timestamp < "20131105000000"
GROUP BY 1
Then I'll join it with page to get non-archived pages with stats:
SELECT
page_id,
page_namespace,
page_title,
revisions,
first_edit,
last_edit,
False AS archived,
first_rev_id
FROM halfak.page_revision_stats
INNER JOIN page USING (page_id)
WHERE rev_timestamp < "20131105000000"
GROUP BY 1,2,3;
I've kicked the query off so, hopefully, it will be done by the end of the day.