Grants talk:IdeaLab/Wikipedia Metrics for Institutions
Add topicMerged proposal
[edit]I made a similar proposal and I am merging my content here and deleting that proposal.
Grants:IdeaLab/Audience metrics for small groups of articles |
---|
Organizations will not commit their resources to developing Wikimedia projects unless they have proof that Wikimedia projects help them meet their own goals. One very common goal that many organizations have is "dissemination of content", and this also is something that Wikimedia projects do to a greater audience and at lower cost than all other options in those cases in which the dissemination is to happen online. However, many organizations fail to recognize that Wikimedia projects can be used in this way because the analytics reports done on Wikimedia projects at as a whole are difficult to put into the context of any entities focused interest. If there were some way to show an analytics report that was relevant to a particular organization, then that organization could more easily recognize that Wikimedia projects reach their audience, and they would be more persuaded to contribute.
A tool should be created which does the following:
|
Blue Rasberry (talk) 16:16, 25 November 2014 (UTC)
I created another version of this page to outline a tool that might be of use to GLAM institutions. Grants talk:IdeaLab/Wikipedia Metrics for Institutions/GLAM. OR drohowa (talk) 18:53, 18 December 2014 (UTC)
Technical design
[edit]A tool is designed which has input fields including the following:
- accepts the URL of a Wikipedia page
- accepts a start date
- accepts an end date
Given these three inputs, the tool checks the existing pageview counting tool at http://stats.grok.se and returns a single number, which is the sum of all pageviews of all Wikipedia articles listed in the submitted URL.
Disclaimer
[edit]Some specific organizations are named in the below examples. These organizations know nothing about this proposal and are not involved in it. They have a marginal relationship to Wikipedia in that they contribute content to the en:Choosing Wisely health campaign, which the non-profit organization en:Consumer Reports shares on Wikipedia through me. Blue Rasberry (talk) 18:00, 25 November 2014 (UTC)
Example
[edit]The en:American College of Emergency Physicians is curious about how many people use Wikipedia to get information about emergency medicine, which is their field of expertise. They add some information to some Wikipedia articles, and now would like traffic reports to measure how many people they can reach if they have their experts develop health articles on Wikipedia. After putting information in the articles, they list the articles they have developed at en:Wikipedia:Choosing Wisely/American College of Emergency Physicians watchlist, and they set the time range as May 1 2014 - July 31 2014. In return, they get a single number, which is the sum of pageview counts for each article in that list for each of the three months.
Here is an example of input and output:
Please give a URL to be processed
Please give a date from which to start the pageview count
- May 1 2014
Please give the last day of the pageview count
- July 31 2014
In return, the tool would give the following output:
The total number of pageviews received by the Wikipedia articles which were linked in https://en.wikipedia.org/wiki/Wikipedia:Choosing_Wisely/American_College_of_Emergency_Physicians_watchlist during the range of May 1 2014 - July 31 2014 was 523,350.
Output variation - spreadsheet instead of single value
[edit]The minimal accepted output for this tool is a single number, which is a sum of pageviews.
A more useful output would be data which could be put into a spreadsheet and which showed monthly pageview counts for each article. Using the example above, here are some values for the emergency medicine article in a useful format.
article name | May 2014 | June 2014 | July 2014 | total |
---|---|---|---|---|
Foley catheter | 13,114 | 11,988 | 11,990 | 37,092 |
Palliative care | 60,077 | 54,262 | 49,617 | 163,956 |
Abscess | 48,768 | 39,840 | 39,979 | 128,587 |
Oral rehydration therapy | 18,786 | 16,730 | 17,892 | 53,408 |
Fluid replacement | 5,250 | 4,473 | 4,118 | 13,841 |
Intravenous therapy | 46,942 | 42,349 | 37,175 | 126,466 |
grand total | 523,350 |
A file should be exportable such that it can be viewed in Google Docs.
Further variation
[edit]Consider the spreadsheet listed above. Suppose that instead of one set of articles to be examined, there were multiple sets. In this case, the tool would be configured such that it would compile multiple reports from a single URL.
For example, suppose the tool was given a single URL. In that URL, there are links to multiple lists of articles, perhaps for American College of Emergency Physicians, American Congress of Obstetricians and Gynecologists, and the American College of Cardiology. In this case, the tool should explore an additional level down instead of examining the links on the page given, and should return the metrics from the articles found by following the links in the provided link.
In this case, the tool would return multiple instances of the spreadsheets shown above, with one spreadsheet per link. The reason behind this is that multiple organizations may want reports for articles within their field of interest, and it should be possible to collect all of these reports with one action.
Explanation of utility
[edit]Many organizations which do online educational outreach desire metrics which enable them to do impact evaluation. Commonly examined metrics include pageviews of an organizations own website, their Facebook Likes, their Twitter retweets and impressions, and their number of followers in various social media platforms. Currently, no analogous metric is available to export from Wikimedia projects. Wikipedia pageviews are the best available comparable metric to any of these other commonly used metrics, and for that reason, having a tool which presents these metrics will enable organizations to compare the utility of supporting Wikipedia as compared to the utility of investing in any other communication channel.
Scale of this
[edit]A typical use case is that a user would want a three-month report on 10-50 articles. An extremely active user may want a yearly report for 300 Wikipedia articles. An anticipated highest use case might be for someone to wish to see a report for all articles in a Wikipedia category somehow, and therefore might request a year's pageview metrics from 5-10,000 articles.
For the sake of this trial, if a scheme could be managed to deliver a three-month pageview report for up to 30 Wikipedia articles in one instance of running the tool, then that would be success. This amount of use should also meet the needs of 99% of anticipated users of this tool.
Over Specified?
[edit]#!/usr/bin/env python
# -*- coding: utf-8 -*-
# Import stats.grok.se view count
# License: Public Domain
import requests, json
from datetime import date, timedelta
# The data lags behind, so don't set to the current date
begin = date(2014, 5, 1)
end = date(2014, 6, 1) + timedelta(days=-1)
pages = "Foley catheter|Palliative care|Abscess|Oral rehydration therapy|Fluid replacement|Intravenous therapy".split('|')
# Get data
pageviews = {}
for page_title in pages:
pageviews[page_title] = {}
# XXX Python's date functions are insufficient
for i in range(begin.year*12 + begin.month - 1, end.year*12+end.month + 1 - 1):
req = requests.get("http://stats.grok.se/json/%s/%s/%s" % (
'en',
"%04d%02d" % (i//12, i%12+1),
page_title,
))
response_json = req.json()
pageviews[page_title].update(response_json['daily_views'])
# Sum the page views
page_hits = 0
for viewdate, views in pageviews[page_title].iteritems():
if begin.strftime("%Y-%m-%d") <= viewdate <= end.strftime("%Y-%m-%d"):
page_hits += views
# Print daily view count in TSV for Excel's Pivot tables
# print "%s\t%s\t%s" % (page_title, viewdate.encode('utf-8'), views)
# Output the results
print "%s: %s" % (page_title, page_hits)
Its pretty basic, doesn't take long to run. You can modify it to import into Excel's pivot tables to graph weekend affects. Also, in the intervening time since this was proposed Vipul Naik has built the tool: http://wikipediaviews.org/ --Dispenser (talk) 03:44, 8 December 2014 (UTC)
- Dispenser Fascinating! Let me play with this. Blue Rasberry (talk) 20:58, 8 December 2014 (UTC)