Jump to content

Research talk:Quality dynamics of English Wikipedia/Work log/2016-12-19

Add topic
From Meta, a Wikimedia project coordination wiki

Monday, December 19, 2016[edit]

Hey folks. I did some work over the weekend to include the quality dynamics of Visual Arts because I attended a relevant hackathon. The results of that analysis are here: https://paws.wmflabs.org/paws/user/EpochFail/notebooks/article-quality/Comparing%20quality%20of%20WP%20Visual%20Arts%20%26%20Woman%20Scientists%20to%20the%20rest%20of%20Wikipedia.ipynb

I've been working with the database rather than writing the data aggregation scripts and I've figured out how to make aggregation work very quickly for smaller cross-sections (individual Wiki Projects). Here's my query for gathering content from WikiProject Visual Arts.

SELECT
  monthly_aq.timestamp AS month,
  IF(cl_to IS NOT NULL, SUBSTRING_INDEX(cl_to, "-", 1), "Unlabeled") AS importance,
  SUM(weighted_sum) AS weighted_sum,
  SUM(LOG(weighted_sum)) AS weighted_log_sum,
  SUM(prediction = "Stub") AS stub_n,
  SUM(prediction = "Start") AS start_n,
  SUM(prediction = "C") AS c_n,
  SUM(prediction = "B") AS b_n,
  SUM(prediction = "GA") AS ga_n,
  SUM(prediction = "FA") AS fa_n
FROM page AS talk
INNER JOIN page AS article ON
  talk.page_title = article.page_title AND
  article.page_namespace = 0
INNER JOIN templatelinks ON
  tl_from = talk.page_id
INNER JOIN staging.enwiki_monthly_aq AS monthly_aq ON
  article.page_id = monthly_aq.page_id
LEFT JOIN categorylinks ON
  talk.page_id = cl_from AND
  cl_to IN (
    "Unknown-importance_Version_1.0_articles",
    "Low-importance_Version_1.0_articles",
    "Mid-importance_Version_1.0_articles",
    "High-importance_Version_1.0_articles",
    "Top-importance_Version_1.0_articles",
    "NA-importance_Version_1.0_articles"
  )
WHERE
  talk.page_namespace = 1 AND
  tl_namespace = 10 AND
  tl_title = "WikiProject_Visual_arts"
GROUP BY month, importance;

--EpochFail (talk) 20:28, 19 December 2016 (UTC)Reply