Research talk:Quality dynamics of English Wikipedia/Work log/2017-02-17
Add topicFriday, February 17, 2017
[edit]Hey folks. Today I'm going to generate some monthly datasets for a few WikiProjects
- WikiProject Chicago
- WikiProject Michigan
- WikiProject Oregon
So, my past queries assumed that only one template was used. This isn't true. There are many redirected template names that can be included on talk pages. For example, Military history has a lot.
> select page.page_title from pagelinks INNER JOIN page ON page_id = pl_from WHERE pl_namespace = 10 and pl_title = "WikiProject_Military_history" AND pl_from_namespace = 10 AND page.page_is_redirect; +------------------------------+ | page_title | +------------------------------+ | WikiProject_Military_History | | WPCAS | | WikiProject_Colditz | | MILHIST | | Milhist | | WP_Military_History | | WikiProject_MILHIST | | WikiProject_War | | WP_Military_history | | WPMILHIST | | WPMilhist | | MilHist | | Mil_Hist | | WPMH | | WP_Mil | | Military_history | | WPMIL | +------------------------------+ 17 rows in set (0.00 sec)
So in order to account for that, I'm going to change the querying strategy to pull in all of these templates too.
SELECT
monthly_aq.timestamp AS month,
SUM(weighted_sum) AS weighted_sum,
SUM(LOG(weighted_sum)) AS weighted_log_sum,
SUM(prediction = "Stub") AS stub_n,
SUM(prediction = "Start") AS start_n,
SUM(prediction = "C") AS c_n,
SUM(prediction = "B") AS b_n,
SUM(prediction = "GA") AS ga_n,
SUM(prediction = "FA") AS fa_n,
COUNT(*) AS n
FROM page AS talk
INNER JOIN page AS article ON
talk.page_title = article.page_title AND
article.page_namespace = 0
INNER JOIN templatelinks ON
tl_from = talk.page_id
INNER JOIN staging.enwiki_monthly_aq AS monthly_aq ON
article.page_id = monthly_aq.page_id
WHERE
talk.page_namespace = 1 AND
tl_namespace = 10 AND
(
tl_title = @WP_TEMPLATE OR
tl_title IN (
SELECT page.page_title
FROM pagelinks
INNER JOIN page ON page_id = pl_from
WHERE
pl_namespace = 10 AND
pl_title = @WP_TEMPLATE AND
pl_from_namespace = 10 AND
page.page_is_redirect
)
)
GROUP BY month;
The hard part is that I need to manually do a bit of work to find the right template for a specific WikiProject. My strategy right now is just to go to a page that would *obviously* be classified by the target WikiProject, find out what template is used and check to see if it redirects me to a "main" template and then use that.
E.g.
- en:Chicago → en:Template:WikiProject Chicago
- en:World War II → en:Template:WikiProject Military history
- en:Oregon → en:Template:Wikiproject Oregon → en:Template:WikiProject Oregon (Notice the capitalization change)
Alright. Now I'm generating some datasets. --EpochFail (talk) 16:48, 17 February 2017 (UTC)