Research talk:Quality dynamics of English Wikipedia/Work log/2016-12-19
Add topicAppearance
Latest comment: 8 years ago by EpochFail in topic Monday, December 19, 2016
Monday, December 19, 2016
[edit]Hey folks. I did some work over the weekend to include the quality dynamics of Visual Arts because I attended a relevant hackathon. The results of that analysis are here: https://paws.wmflabs.org/paws/user/EpochFail/notebooks/article-quality/Comparing%20quality%20of%20WP%20Visual%20Arts%20%26%20Woman%20Scientists%20to%20the%20rest%20of%20Wikipedia.ipynb
I've been working with the database rather than writing the data aggregation scripts and I've figured out how to make aggregation work very quickly for smaller cross-sections (individual Wiki Projects). Here's my query for gathering content from WikiProject Visual Arts.
SELECT
monthly_aq.timestamp AS month,
IF(cl_to IS NOT NULL, SUBSTRING_INDEX(cl_to, "-", 1), "Unlabeled") AS importance,
SUM(weighted_sum) AS weighted_sum,
SUM(LOG(weighted_sum)) AS weighted_log_sum,
SUM(prediction = "Stub") AS stub_n,
SUM(prediction = "Start") AS start_n,
SUM(prediction = "C") AS c_n,
SUM(prediction = "B") AS b_n,
SUM(prediction = "GA") AS ga_n,
SUM(prediction = "FA") AS fa_n
FROM page AS talk
INNER JOIN page AS article ON
talk.page_title = article.page_title AND
article.page_namespace = 0
INNER JOIN templatelinks ON
tl_from = talk.page_id
INNER JOIN staging.enwiki_monthly_aq AS monthly_aq ON
article.page_id = monthly_aq.page_id
LEFT JOIN categorylinks ON
talk.page_id = cl_from AND
cl_to IN (
"Unknown-importance_Version_1.0_articles",
"Low-importance_Version_1.0_articles",
"Mid-importance_Version_1.0_articles",
"High-importance_Version_1.0_articles",
"Top-importance_Version_1.0_articles",
"NA-importance_Version_1.0_articles"
)
WHERE
talk.page_namespace = 1 AND
tl_namespace = 10 AND
tl_title = "WikiProject_Visual_arts"
GROUP BY month, importance;