Research talk:Quality dynamics of English Wikipedia/Work log/2016-11-19
Add topicSaturday, November 19, 2016
[edit]Today, I'm mostly just recording some work that I did on 2016-11-13. I was at the OCDX workshop for GROUP 2016 and I wanted to do something cool with the Monthly Wikipedia article quality predictions dataset, so I suggested we compare quality trends in Wikipedia with the trends in quality of some interesting topics in Wikipedia. After thinking about what would be most interesting, I chose to dig into articles covered by WikiProject Women Scientists.
I ran https://quarry.wmflabs.org/query/14033 to get all of the pages in WikiProject Women Scientists. Then I merged this subset of pages with the article quality predictions and generated some statistics. I also generated the same statistics across all of English Wikipedia.
- All wiki
- Code: https://github.com/OCDX/article-quality/blob/master/monthly_wiki_quality.ipynb
- Dataset: https://github.com/OCDX/article-quality/blob/master/enwiki.monthly_wiki_quality.tsv
- Just WikiProject Women Scientists
- Code: https://github.com/OCDX/article-quality/blob/master/monthly_women_scientist_quality.ipynb
- Dataset: https://github.com/OCDX/article-quality/blob/master/enwiki.monthly_women_scientist_quality.tsv
All of the work came together in this notebook: https://github.com/OCDX/article-quality/blob/master/Comparing%20quality%20of%20WP%20Women%20Scientists%20to%20the%20rest%20of%20Wikipedia.ipynb
I was going to summarize the results, but I've run out of time for today. Next steps are to work on some importance measure and put the infra together to run the analysis across a bunch of different topic cross-sections. --EpochFail (talk) 20:12, 19 November 2016 (UTC)