Talk:Learning and Evaluation/Global metrics
Add topicThought on quality vs. bytes added
[edit]Regarding "6. # of bytes added to Wikimedia projects", we are doing a project where subject matter experts review articles and make suggestions for improvement. This might obviously lead to that in order to get better quality in an article you need to remove bytes. If this measurement is used, this project will be one with negative impact. I rather have a measurement "# of bytes changed on Wikimedia projects" in order to give credit for removal of bad information. We still have the problem that removing and then adding new stuff in the same edit may lead up to zero bytes added but a huge increase in quality. Jan Ainali (WMSE) (talk) 05:55, 1 August 2014 (UTC)
- +1 A Wikipedia article isnt finished when there is nothing more to add, but when there is nothing more to delete. An encyclopedia article should be brief and dense; if this metric is used, we encourge people to blow up the text, making it harder for readers to find all information fast and easy.--Pavel Richter (WMDE) (talk) 09:03, 1 August 2014 (UTC)
- Thanks for these comments. Both great points. We are working with the Analytics team on standardizing these metrics with WMF's internal metric definitions, so I'm sure this is one that will be updated in definition. A concern I immediately have with total bytes changed is of course the easy gamification of the metrics: deletions and restoration, e.g. But, I'll talk to analytics about a way around this because I obviously agree with your point that editing involves adding, modifying, and eliminating. Jwild (talk) 13:47, 4 August 2014 (UTC)
- The Bytes Added metric in Wikimetrics actually has four submetrics:
- net_sum: bytes added minus bytes removed
- absolute_sum : bytes added plus bytes removed
- positive_only_sum : bytes added
- negative_only_sum : bytes removed
- In the program reporting completed so far, the positive_only_sum for bytes added has been used. Here it seems the proposed "# of bytes changed" would be equivalent to the absolute sum (bytes added plus bytes removed) and would be an easy extension for reporting beyond just the positive only sum (bytes added) if desired for such additional reporting and context. The positive_only_sum "bytes added" is does not lead to the problem that removing and then adding new stuff in the same edit may lead up to zero bytes added, that would occur only if the net_sum (bytes added minus bytes removed) submetric were used rather. JAnstee (WMF) (talk) 15:26, 7 August 2014 (UTC)
- The Bytes Added metric in Wikimetrics actually has four submetrics:
- Thanks for these comments. Both great points. We are working with the Analytics team on standardizing these metrics with WMF's internal metric definitions, so I'm sure this is one that will be updated in definition. A concern I immediately have with total bytes changed is of course the easy gamification of the metrics: deletions and restoration, e.g. But, I'll talk to analytics about a way around this because I obviously agree with your point that editing involves adding, modifying, and eliminating. Jwild (talk) 13:47, 4 August 2014 (UTC)
- Support for Jan und Pavel. But in any case – “added” or “changed” – number of bytes can only be a ancillary criterion. --Packa (talk) 19:26, 5 August 2014 (UTC)
Thought on number of images/media added to Wikimedia articles/pages
[edit]We here at Wikimedia Deutschland support projects (such as the "Festivalsommer", where volunteers travel to music festivals and take pictures of the performing artists) that have the goal to add as many free quality images to Wikimedia Commons as possible. These projects do not see Wikimedia Commons as only the media archive for Wikipedias, but as a Wikimedia project on its own rights, providing free media for everyone.
If this metric is used in this way, we would not be able to support such projects any longer. Instead, we would encourage and support projects only if they aim to provide pictures for Wikipedia.--Pavel Richter (WMDE) (talk) 09:11, 1 August 2014 (UTC)
- Thanks for the comments, Pavel! The major underlying point you bring up is a grave concern I have with any standardized metrics: these should absolutely, 100% not be the only measures of projects, and we know that many projects can be successful outside the scope of these metrics. That said, these are the most common metrics that can be used across grants, which is why we want to collect them as the lowest baseline. Projects such as your description of "Festivalsommer" may be a bit outside the scope of these metrics, and a careful explanation with alternative metrics of success could potentially be appropriate. What you would want to be able to show is that the financial and time investment you are making in paying for people to go to concerts are actually resulting in free quality images that have some sort of usage. If you can't find a way to tell that story, perhaps it is a project for consideration to eliminate (I don't know - I have never seen the data behind it). Jwild (talk) 13:44, 4 August 2014 (UTC)
- There are so many things to take into consideration. Numbers definitely are an easy but in my opinion pretty poor way of measuring our impact. What does something like the "Sommer Festival" (and probably many other funded project to add content) bring beyons bytes and pictures? Do they help with editor engagement/retention (maybe existing editors who are kept working because they've been able to illustrate/consolidate existing articles)? Do they allow for brand recognition (getting accredited here, there and somewhere else opens doors for more and sometimes more important events/institutions) which in turn bring more content to the projects? There are so many long-term "metrics" we need to look at that do not involve simple numbers but much more time and attention. But yes, we've got to start somewhere. On the subject of Commons it'd be interesting for example to try and track re-use outside Commons of pictures that are *not* in Wikipedia/Sister projects for example. I know this is a tiny set of data right here, but it's trying to look at Commons differently. Re-use and citation of Wikimedia content might be a metric? In any case it's a tough call and whatever metrics we end up using in full or partly for our funding decisions need to be wrapped up in openess of what "impact" really means in very different contexts. notafish }<';> 20:43, 4 August 2014 (UTC)
- notafish: on the topic of image reuse & citation: this is indeed a tough question. See on this topic:
- Kousha, Kayvan; Mike Thelwall, Somayeh Rezaie (2010-09-01). "Can the impact of scholarly images be assessed online? An exploratory study using image identification technology". Journal of the American Society for Information Science and Technology 61 (9): 1734–1744. ISSN 1532-2890. doi:10.1002/asi.21370. Retrieved 2014-06-05.
- Anecdotal data is gathered on Wikimedia Commons in c:Category:Commons as a media source.
- Jean-Fred (talk) 17:38, 25 August 2014 (UTC)
- There are so many things to take into consideration. Numbers definitely are an easy but in my opinion pretty poor way of measuring our impact. What does something like the "Sommer Festival" (and probably many other funded project to add content) bring beyons bytes and pictures? Do they help with editor engagement/retention (maybe existing editors who are kept working because they've been able to illustrate/consolidate existing articles)? Do they allow for brand recognition (getting accredited here, there and somewhere else opens doors for more and sometimes more important events/institutions) which in turn bring more content to the projects? There are so many long-term "metrics" we need to look at that do not involve simple numbers but much more time and attention. But yes, we've got to start somewhere. On the subject of Commons it'd be interesting for example to try and track re-use outside Commons of pictures that are *not* in Wikipedia/Sister projects for example. I know this is a tiny set of data right here, but it's trying to look at Commons differently. Re-use and citation of Wikimedia content might be a metric? In any case it's a tough call and whatever metrics we end up using in full or partly for our funding decisions need to be wrapped up in openess of what "impact" really means in very different contexts. notafish }<';> 20:43, 4 August 2014 (UTC)
Updates to the metrics
[edit]Hello everyone - I updated some of the metrics yesterday based on the feedback received from groups/individuals during Wikimania, as well as following a conversation with the director of analytics at WMF. We are working on getting our metrics aligned (across engineering projects and grantmaking) and also want to be sure we have the tools in place to enable these things to be as easily measured as possible! Just wanted to have this for the record :) Jwild (talk) 12:10, 11 August 2014 (UTC)
Comments
[edit]Hi Jesse
I know this is work in progress, but I'm unsure of the meaning of the opening "problem":
"Our [intentional] design for self-evaluation results in limited ability to consolidate/sum the inputs, outputs, and outcomes of grantee-funded projects."
Just in terms of translatability, I first read "results" as the head noun: "Our [intentional] design for self-evaluation results". And "our" and "grantees" are unclear in relation to subject/actor in the sentence. And I think translators might have a task in accurately expressing a few of the word-binaries (consolidate/sum; outputs/outcomes). Let me guess: "outputs" are direct, tangible, more short-term results, while "outcomes" are longer-term, broader benefits?
A broader question I have about evaluation texts on Meta is their opening framing in terms of problem-then-solution. Sometimes this works; sometimes it doesn't. A disadvantage is that "problem" can funnel writers and readers into a negative frame, right at the outset. Engaging readers can sometimes be helped by turning a negative into a positive. Um ... thinking:
Problem: It can be challenging to design self-evaluation in ways that allow grantees to capture the inputs and outcomes of funded projects and turn them into useful data.
That's just an off-the-cuff guess; but is it useful? Tony (talk) 03:10, 21 August 2014 (UTC)
- Hi! Thanks for the thoughts. I agree that the problem/solution framework isn't always the best, but nor is another model. The huge benefit of this framework that I have seen thus far is that it provides consistency across learning/evaluation, and also helps frame things more like an experiment: you are attempting to solve a particular problem so need to build in specific observation points/metrics. As for this particular problem statement (around global metrics): I will definitely take a secondly look at how we might better articulate. One small piece will be do to link to the definitions of those words (we have a glossary).
The one missing piece about your specific suggestion is the fact that we (grantmaking) DID (past tense) design a grants program based on open-ended self-evaluation, and it has left big gaps in our collective ability to evaluate. We're going to be revamping this page and the messaging behind it this week, so I'll be thinking about how to best frame this particular set of work and be sure to link to definitions. Thanks again! Jwild (talk) 16:24, 25 August 2014 (UTC)
- Jesse, thanks for your explanation. Further thoughts on the global metrics overleaf: very numbers-based ... almost exclusively. And where quality is mentioned (e.g. number of articles improved), it's tangential and runs into the same old problem: hard to specify benchmarks (improved in what ways, by how much?). While these global metrics are intended only as a base, does it send a message to the programmatic community that quality isn't fundamental? Tony (talk) 10:03, 26 August 2014 (UTC)
- It's a good point, and I really hope it doesn't send that message. We are going to be very careful in our communication around this that these are only base numbers for aggregation purposes, and we will provide suggestions for other types of metrics to look at based on goals. Some of these will be quality indicators. One of the things we talked about as a grantmaking team yesterday was the need to crack some of the quality questions, and I hope this will be an area of development over the course of this next fiscal year. Ideally, these required metrics could change once we have more standardized, systematic ways of tracking things like readership or quality. Thanks again! Jwild (talk) 16:25, 26 August 2014 (UTC)
- Jessie, I've just run through it and copy-edited superficially. I see in various places references to context and the fact that the global metrics are not the only thing. What would open up the parallel need for quality (where it's possible—and it may not always be) is a few examples. I think a few gentle nudges are in order over the next year or so to shift the culture a little towards quality. Certainly some of the LPs are looking in that direction. Tony (talk) 07:59, 1 September 2014 (UTC)
- It's a good point, and I really hope it doesn't send that message. We are going to be very careful in our communication around this that these are only base numbers for aggregation purposes, and we will provide suggestions for other types of metrics to look at based on goals. Some of these will be quality indicators. One of the things we talked about as a grantmaking team yesterday was the need to crack some of the quality questions, and I hope this will be an area of development over the course of this next fiscal year. Ideally, these required metrics could change once we have more standardized, systematic ways of tracking things like readership or quality. Thanks again! Jwild (talk) 16:25, 26 August 2014 (UTC)
- Hi Tony, I am not sure we understand "context" in the same way here. You seem to say (and I agree with you) that we should not forget quality and that those metrics are geared towards a.more quantitative approach. And you seem to think that our need for context might be a way to push towards quality also. However, when we speak about context within the framework of Global Metrics, what we're trying to say is: don't stop at giving numbers all by themselves, but give us context so that these numbers make sense. For example, a net result of 2 new editors might be interpreted in many different ways. If they are the result of an event that drew 5 people, they might be a much better number than the result of an event that drew 500 people. Then again, if this event with 500 people resulted in 2 new editors on a project where there were 10 active editors, then the number can be interpreted as a higher success than if it drew 2 new editors in a project where there are already hundreds of active people, etc. What we're trying to say here is that there are no good or bad numbers in and of themselves, and that giving context will be the only way we can assess those numbers and conclude success or failure of a program. That is I think a necessary first step in using Global Metrics in an intelligent way. I am happy to add those examples to make this clearer if it is indeed what you are asking for.
- Quality (of the content) is a whole different ball game for which, in my opinion, we still need to find (better) measures altogether. Some will argue we can't, other more versed in evaluation will argue we can, but we still have a long way ahead of us in developing those measures. I think the message we want to convey here is that this is a first step, a common ground sorely needed to assess our impact across.projects, countries and programs. My hope is that we will be working on making those metrics that could measure quality as soon as possible. notafish }<';> 20:21, 1 September 2014 (UTC)
- Notafish, I agree with what you say. Yes, quality is hard (both for grantmaking and L and E, and for programmatic managers; this is probably why we've emphasised the numerical thus far. I guess it's hard because it's inherently more subjective and/or complex. Number of images uploaded is easy. Number (or proportion of those) that make it to featured status is also numerical and a quality determinant because it outsources quality enforcement to community judges in their normal function, complete with criteria. These are both towards the end-point of the programmatic pipleline. At the start and middle lie the potential to assist volunteers to upskill, to aspire for quality (and know what it is): this is harder because it can involve things like writing criteria, providing (translated) access to the right learning patterns, where possible and appropriate, providing online personal assistance and encouragement, and finally, judging and reflecting on quality in community discourse. It's these possibilities that lay open the potential for gradual cultural change—and I mention photography only as one example.
One of the hardest things for program managers is to see examples, to know what kind of things can be done, and to be able to witness other projects that have done it. Perhaps it's wrong to insert examples in the global metrics, but I put to you that working out how to make quality more achievable in the programmatic communities might be a priority. Tony (talk) 02:22, 2 October 2014 (UTC)
- Notafish, I agree with what you say. Yes, quality is hard (both for grantmaking and L and E, and for programmatic managers; this is probably why we've emphasised the numerical thus far. I guess it's hard because it's inherently more subjective and/or complex. Number of images uploaded is easy. Number (or proportion of those) that make it to featured status is also numerical and a quality determinant because it outsources quality enforcement to community judges in their normal function, complete with criteria. These are both towards the end-point of the programmatic pipleline. At the start and middle lie the potential to assist volunteers to upskill, to aspire for quality (and know what it is): this is harder because it can involve things like writing criteria, providing (translated) access to the right learning patterns, where possible and appropriate, providing online personal assistance and encouragement, and finally, judging and reflecting on quality in community discourse. It's these possibilities that lay open the potential for gradual cultural change—and I mention photography only as one example.
- Jesse, thanks for your explanation. Further thoughts on the global metrics overleaf: very numbers-based ... almost exclusively. And where quality is mentioned (e.g. number of articles improved), it's tangential and runs into the same old problem: hard to specify benchmarks (improved in what ways, by how much?). While these global metrics are intended only as a base, does it send a message to the programmatic community that quality isn't fundamental? Tony (talk) 10:03, 26 August 2014 (UTC)
“Number of articles added or improved on Wikimedia projects”
[edit]Hi,
I am not sure “artcles” is the best word here, as it does not appear to be shared by all “Wikimedia projects” (although, indeed, by some, as Wikivoyage or Wikispecies): Wiktionary uses “entries”, Wikidata goes for “items”, Wikisource employs “texts”, Wikiversity seems to use “learning resources”. What are your thoughts on this?
Cheers, Jean-Fred (talk) 10:03, 1 October 2014 (UTC)
- Good point. Although I think having article in the title is useful for clarity, as long as within the definition and any documentation, we make it clear that we're talking about the primary content pages for a given project. Otherwise we have to use some abstract term (like "primary content pages"). Jmorgan (WMF) (talk) 17:59, 2 October 2014 (UTC)
- Sure. But I think it is important to make it clear whether « secondary content pages », such as the “List” Namespace on the Spanish-language Wikipedia, would be encompassed y this definition − indeed, in this case, pictures from WLM would be counted on fr.wp (because lists are in the Main NS) and not on es.wp (because lists are in the Anexo NS, eg. w:es:Anexo:Bienes_de_interés_cultural_de_Cantabria. Jean-Fred (talk) 13:40, 3 October 2014 (UTC)
“Number of new images/media added to Wikimedia articles/pages”
[edit]Hi,
If I understand correctly, this metric is about the count of distinct images used, and not the count of usages? I’d be curious to know the reasoning on why you chose one and not the other?
Cheers, Jean-Fred (talk) 21:10, 1 October 2014 (UTC)
- Ping? :) Jean-Fred (talk) 14:35, 25 October 2014 (UTC)
- Hi Jean-Fred, I am really sorry I missed your message. The metric you mention represents non-duplicate number of images used, rather than the duplicate number of image uses. While image use, more broadly, is some program leaders target for particular content, primarily it is first that an image is used at all. This is why having our current page counts (duplicate counts) is useless without the primary page count data of how many unique visitors those hits represent. It makes sense to look at the number of productive images in more cases than how productive an image is. In other cases, the quality of an image may also be measured by the reach of an image: this is a possible metric for anyone who has that goal, and may extend well beyond WM uses in some cases too. Remember: No one is limited to the global metrics. You can include your own metrics and indicators of progress or success when reporting, as well. Thanks! MCruz (WMF) (talk) 20:43, 27 October 2014 (UTC)
Objection to retained active editor survival metric of 5 edits per month
[edit]I object to the retained active editor survival metric of 5 edits per month (as I understand its meaning) as it is not realistic and sets up nearly every project for failure on this metric. It doesn't account for the reality of the ebbs and flows of life and work. Under this metric I don't count as having survived because I only made two edits over the course of six weeks in July/August when I was on vacation. I'm by no means the most active editor, but I did make about ~500 edits on en.wiki last year, and another couple hundred on the other projects, including co-organizing the Art+Feminism editathons. I should count as an active retained editor! I can give you examples of others in this category if needed. I am okay with 60 per year, but 5 per month is not a viable expectation from my perspective and experience and only sets up projects to fail when it comes to report outcomes.--Theredproject (talk) 14:58, 28 October 2014 (UTC)
- Hi Theredproject, thanks for reaching out and sharing your views! We value this a lot, as we strive to put forward a comprehensive evaluation practice, that anyone could feel comfortable with. We understand your concerns with this metric, and as far as this parameter goes, we incorporated here the definition provided by the Research team (please see «Metrics standardization», and more specifically, this section). As you can see, we added here «60 edits a year», which, as you point out, is not the simple sum of monthly edits. It might be of interest to you that this problematic is already addressed in the discussion page, and the team is still working on this metric. Maybe Dario con give more feedback in this point. Thanks again for pointing this out, and please bear in mind that there is more to evaluation than global metrics: please check out the Evaluation portal to find more resources to build your own metrics and measure against your program's goals. Thanks! MCruz (WMF) (talk) 17:57, 28 October 2014 (UTC)
- Hi Theredproject. I'm largely responsible for the linked metrics pages. First, let me apologize for the steam rolling effect that large scale metrics tend to have on individual examples. Until we can put a webcam behind your computer (ahhH!) or electrodes in your brain (eek!), we're stuck with some very limited, course measures of the engagement and retention of Wikipedia editors. In this case, we take advantage of the temporal rhythms of edit events, and there are times at which that strategy falls flat. While I feel this is a necessary compromise, it does end up being kind of lame sometimes.
- Now, I'll be the first to admit that 5 edits as a metric for "survival" is not a very good one. However, I should state that we (Research team) never recommend its use for measuring the retention rate of new editors. Instead, we developed have developed more reasonable measures (e.g. surviving new editor). In this case, we recommend that the presence of a single edit during the "survival period" be considered enough evidence of retention. However, as you might imagine, this metric falls prey to the same criticisms as any other large scale metric. E.g. if you make an edit just before and just after a survival period, this metric still assume that you are inactive. To make sure that the decisions we make about the duration of survival period do not bias our interpretation of the measurement, we perform sensitivity analyses to make sure that minor changes don't result in major fluctuations.
- In general, I'd encourage you and anyone who considers a metric keep in mind the nature of what is intended to be measured. I think you've done a fine job of it here and I encourage you to continue by addressing this in any report based on the metric. Too often we let the metrics do the thinking for us. It's those situations that tend to turn into the myopic pursuit of making the numbers go up at the cost of the original goal. --Halfak (WMF) (talk) 00:21, 20 November 2014 (UTC)
- Hi Halfak (WMF), thank you for this thorough and nuanced response. I will look further at this, and do my best to not let the metric to the thinking for me!--Theredproject (talk) 02:48, 20 November 2014 (UTC)
My problem with this system
[edit]Easy to fake.
Based on numbers, not quality.
I'm afraid I've seen this tendency before in the WMF's approach; this seems like a continuation and amplification of it.
Quality is hard to measure, but it's worth trying, through finer-grained advice/criteria and examples.
I believe the system as it stands is a step backwards, not forwards.
Tony (talk) 02:00, 15 March 2015 (UTC)
- It's also instructive to see how the "Key questions" section asks all the wrong questions. --Nemo 12:59, 12 September 2015 (UTC)
Metric 4b
[edit]Hi,
In 13170286 @JAnstee (WMF): “disambiguated” Metric 4 into 4a (existing) and 4b, “Number of new images/media uploaded to Wikimedia Commons (Optional)”.
Questions:
- the second column reads “Number of images uploaded to Commons and added to Wikipedia articles or other Wikiproject pages” (same as above before the edit), which does not quite sounds right − I understand the metric as “All media uploaded to Wikimedia Commons, regardless of their usage” Please clarify the intended meaning.
- This actually sounds like this metric 4b is a subset of the formal definition of #5, “Number of articles added or improved on Wikimedia projects”. Indeed: 1/ it applies to « Wikimedia projects », definition which certainly includes Wikimedia Commons ; 2/ it has been clarified above by @Jmorgan (WMF): that what is meant is « the primary content pages for a given project ». Since it can reasonably be argued that media are the “primary content pages” of Wikimedia Commons, it logically follows that Wikimedia Commons files should be included in it. Please clarify if it is not the case, and if so why.
- Please explain both the rationale and the decision process behind this change. Is it based on feedback from grantees? A request from funding committees? Wikimania discussions?
- I am not aware of any announcement of this change (whether on wiki, mailing-list or social media). Please point me to any such announcement in case I missed it, or please make such an announcement in appropriate channels. Given the reach of Global Metrics in our movement, I think any change to it should be widely advertised.
(I guess that in another life I would have argued that such changes should come with “discussions” rather than “announcements”, but ah well.)
Jean-Fred (talk) 20:13, 25 August 2015 (UTC)
- Forgot one: these metrics are headed with “2014-2015”. My understanding is then that we are way in the intended period (rather quite at the end of it), and still changes are being made to it. It would be advisable to do proper versioning on such things. Jean-Fred (talk) 20:50, 25 August 2015 (UTC)
- I'm no longer involved in the Global Metrics project, but I agree with Jean-Frédéric that # of images uploaded to Commons is/should be a valid metric, regardless of how/where those images are used. The original intent of specifying images used was to promote integration of content across projects: images are valuable in and of themselves, but in many cases they are more valuable if they are used in articles. I think both metrics should be acceptable. Jmorgan (WMF) (talk) 20:41, 25 August 2015 (UTC)
- Hello, Jean-Fred, thanks for asking about these points for further clarification. We are working to disambiguate this very point to achieve proper standardization of the metrics. Importantly, the intended global metric 4 set last year is what is now labeled 4a, with the required data point of the number of images used on Wikimedia in the main namespace (i.e., namespace 0). While this count may sometimes be the same as metric number 5, the number of articles improved, it is not always as (1) a single article could be improved by more than one image added as well as (2) a single image could be used to improve more than one article. To clarify that the required global metric does in fact relate to the count of unique images used 4b was added for disambiguation due to the fact that many have, rather than reporting the requested global metric (4a), been reporting the number of images uploaded. While this is an important metric, it was not the requested metrics as it was already a common metric being reported prior to the global metrics initiation. For this reason, we clarify here that it is different from 4a. As far as 4a and which project pages count, the count should be based on any use outside of commons whereas the count for 4b would be the total pages created on Commons. It is my understanding that FDC grantees were messaged directly about this issue before we made the change to the page. Importantly, the global metrics required did not change, we have only clarified that 4b is a different, and optional metric, from the required 4a which has been being misreported in a number of grantee reports. For the reason that this is not an actual change to the required global metrics, no further discussion seemed warranted beyond the grants officers messaging and edit summary. Thanks again for asking these clarifying questions, please let me know if I missed any points or if any fuzziness remains. JAnstee (WMF) (talk) 07:27, 10 September 2015 (UTC)
- Forgot to mention one last clarification. While I made mention that metric 4a is based on namespace 0, I failed to mention with regard to your comment:
- "This actually sounds like this metric 4b is a subset of the formal definition of #5, “Number of articles added or improved on Wikimedia projects”. Indeed: 1/ it applies to « Wikimedia projects », definition which certainly includes Wikimedia Commons ; 2/ it has been clarified above by @Jmorgan (WMF): that what is meant is « the primary content pages for a given project ». Since it can reasonably be argued that media are the “primary content pages” of Wikimedia Commons, it logically follows that Wikimedia Commons files should be included in it. Please clarify if it is not the case, and if so why."
- For pages improved, 4b is a subset of pages created, but in Commons, pages created are namespace 6 for images (the file namespace) and often there are additional ns0 pages also improved by adding images to collections in gallery and other mainspace pages, so it is important to recognize that distinction also between 4b and 5.JAnstee (WMF) (talk) 15:53, 10 September 2015 (UTC)
- Forgot to mention one last clarification. While I made mention that metric 4a is based on namespace 0, I failed to mention with regard to your comment:
- Hello, Jean-Fred, thanks for asking about these points for further clarification. We are working to disambiguate this very point to achieve proper standardization of the metrics. Importantly, the intended global metric 4 set last year is what is now labeled 4a, with the required data point of the number of images used on Wikimedia in the main namespace (i.e., namespace 0). While this count may sometimes be the same as metric number 5, the number of articles improved, it is not always as (1) a single article could be improved by more than one image added as well as (2) a single image could be used to improve more than one article. To clarify that the required global metric does in fact relate to the count of unique images used 4b was added for disambiguation due to the fact that many have, rather than reporting the requested global metric (4a), been reporting the number of images uploaded. While this is an important metric, it was not the requested metrics as it was already a common metric being reported prior to the global metrics initiation. For this reason, we clarify here that it is different from 4a. As far as 4a and which project pages count, the count should be based on any use outside of commons whereas the count for 4b would be the total pages created on Commons. It is my understanding that FDC grantees were messaged directly about this issue before we made the change to the page. Importantly, the global metrics required did not change, we have only clarified that 4b is a different, and optional metric, from the required 4a which has been being misreported in a number of grantee reports. For the reason that this is not an actual change to the required global metrics, no further discussion seemed warranted beyond the grants officers messaging and edit summary. Thanks again for asking these clarifying questions, please let me know if I missed any points or if any fuzziness remains. JAnstee (WMF) (talk) 07:27, 10 September 2015 (UTC)
- I don't understand what "optional" means. Is any of the other metrics "mandatory" and what does that mean? If something is mandatory, that should be the original 4 i.e. the current 4b, is not only more traditional but also more stable. The number of uploaded images can vary a bit (e.g. because of mistagging or deletions), but not much; while the number of usages tends to grow indefinitely for a given set of images. Which raises the question: how many months after the upload should the usages be counted? and is there a prediction model to say that if there are x usages in month m then there will be y usages in month m+n and z usages "eventually"? We're supposedly interested in whether the images are useful in the long-term, but WMF seems to want to produce assessment in the short term; I don't understand how the two are reconciled. --Nemo 12:50, 12 September 2015 (UTC)
- Hello Nemo thank you for your further inquiries. I will do my best to be clear in my responses. Importantly, yes, the global metrics are required for WMF grantees. The original metric number 4 is in fact metric the current 4a, not 4b. Unfortunately, various grantees who have begun reporting the global metrics have demonstrated errors in reporting the required metric in which case some have mistaken the newly outlined 4b as the required metric. For this reason we wish to disambiguate by more clearly defining and separating out 4b (as "optional") since many seem to want to report this, and it is often important, but it is not in fact the required global metric. While 4b has been a more commonly reported metric, it is in fact part of the count that goes in to the count for metric 5 rather, and metric 4 was intended to get at media use, rather than pages created. The global metric for unique images used was thus intended to enhance and reporting across grantees beyond what was uploaded, to what include what is used. As for the follow-up window, reporting periods for grantees are based on their grant period and reporting deadline. It is up to each grantee to set their targets as appropriate for their reporting timeline. You are correct in that, unfortunately, with grant periods that do not extend across multiple years, short-term outcomes are the farthest outcomes that can be required in grantee reporting, however, on occasion, our learning and evaluation team does work to extract and expand evaluative inquiry based on existing grantee reports to see how the data fare at later follow-up points as we have done in the 2013 and 2015 program evaluation report projects. JAnstee (WMF) (talk) 13:37, 12 September 2015 (UTC)
- We could go on nitpicking for ages on what that "and" meant in the original text; let's just say that number 4 was changed. Can you clarify where in the text is said that these metrics are mandatory and where could I read more about their being mandatory, as well as how grantees agreed to their being mandatory? Thanks. Nemo 13:51, 12 September 2015 (UTC)
- Hello Nemo thank you for your further inquiries. I will do my best to be clear in my responses. Importantly, yes, the global metrics are required for WMF grantees. The original metric number 4 is in fact metric the current 4a, not 4b. Unfortunately, various grantees who have begun reporting the global metrics have demonstrated errors in reporting the required metric in which case some have mistaken the newly outlined 4b as the required metric. For this reason we wish to disambiguate by more clearly defining and separating out 4b (as "optional") since many seem to want to report this, and it is often important, but it is not in fact the required global metric. While 4b has been a more commonly reported metric, it is in fact part of the count that goes in to the count for metric 5 rather, and metric 4 was intended to get at media use, rather than pages created. The global metric for unique images used was thus intended to enhance and reporting across grantees beyond what was uploaded, to what include what is used. As for the follow-up window, reporting periods for grantees are based on their grant period and reporting deadline. It is up to each grantee to set their targets as appropriate for their reporting timeline. You are correct in that, unfortunately, with grant periods that do not extend across multiple years, short-term outcomes are the farthest outcomes that can be required in grantee reporting, however, on occasion, our learning and evaluation team does work to extract and expand evaluative inquiry based on existing grantee reports to see how the data fare at later follow-up points as we have done in the 2013 and 2015 program evaluation report projects. JAnstee (WMF) (talk) 13:37, 12 September 2015 (UTC)
- I also don't understand whether 4a is supposed to count used images or usages of the images. I think usages are more interesting, because if an image is used on many wikis that's better; on the other hand, it usually takes time for a useful image to be adopted on many languages where it would be useful. Nemo 12:50, 12 September 2015 (UTC)
- 4a explicitly calls for the count of unique images used rather than the duplicative count of image uses as defined in the example and learning pattern instructions for that metric. Of course, the number of wikis in which images used, and the number of uses may still be reported in addition to the global metrics, however, only the count of distinct images used is required by the global metrics. Still, we encourage program leaders to track and report those metrics which are best fit to telling their program story, and emphasize that the global metrics may only be a basic starting point depending on a program's theory of change and impacts. Importantly, Magnus's [GLAMorous] tool provides output to report all of these metrics in a single report in that tool, by category. The output generated includes the number of uploads to Commons, Number of Uses, Projects in which the images are in use, and the proportion and number of distinct uploads used in any wiki namespace 0/mainspace (i.e., Global metric 4a, formerly metric 4) making it relatively easy to include more than just the required metric depending on program leaders desire to track and report those additional metrics. I hope that I have cleared up any remaining confusion, please do let me know if you have remaining need for clarification as we continue to work to make clear instructions to stabilize the global metrics. JAnstee (WMF) (talk) 13:37, 12 September 2015 (UTC)
- I suggest to reconsider. Usages are more interesting. Nemo 13:51, 12 September 2015 (UTC)
- 4a explicitly calls for the count of unique images used rather than the duplicative count of image uses as defined in the example and learning pattern instructions for that metric. Of course, the number of wikis in which images used, and the number of uses may still be reported in addition to the global metrics, however, only the count of distinct images used is required by the global metrics. Still, we encourage program leaders to track and report those metrics which are best fit to telling their program story, and emphasize that the global metrics may only be a basic starting point depending on a program's theory of change and impacts. Importantly, Magnus's [GLAMorous] tool provides output to report all of these metrics in a single report in that tool, by category. The output generated includes the number of uploads to Commons, Number of Uses, Projects in which the images are in use, and the proportion and number of distinct uploads used in any wiki namespace 0/mainspace (i.e., Global metric 4a, formerly metric 4) making it relatively easy to include more than just the required metric depending on program leaders desire to track and report those additional metrics. I hope that I have cleared up any remaining confusion, please do let me know if you have remaining need for clarification as we continue to work to make clear instructions to stabilize the global metrics. JAnstee (WMF) (talk) 13:37, 12 September 2015 (UTC)
Duplication of Category:Standardized metric
[edit]I don't understand why pages like Grants:Learning patterns/Number of newly registered users are being created, which duplicate content of Category:Standardized metric. The instructions contained on that page should be merged to Research:Newly registered user and the page redirected. The same for all the other standard metrics. --Nemo 12:55, 12 September 2015 (UTC)
Communication and persons involved
[edit]To be clear, does the "number of individuals involved" exclude things such as the number of Twitter/Weibo followers obtained? The page still contains an image featuring such a misleading number.
In general, however, what's the objective of this metric? I suppose the idea is to assess how much we are spreading the ideals of free knowledge, i.e. the humus on which our projects grow and our end objective. When holding a conference, what's more important: the number of persons attending, or the number of persons who learn what has been said e.g. from the media? Most people would say the latter, I think. --Nemo 13:40, 12 September 2015 (UTC)
- Hello again Nemo - another good clarification. No, the number involved should not include passive audience to messages or social media. Involvement should be associated with those actively engaging in a project, program, or event rather than include follower counts. This is not to say that follower counts and passive audience reach is unimportant, it is a critical part of the way volunteers are connected to opportunities; however, it is not the global metrics requested here. Of course, beyond the output metrics of involvement, the outcome of learning would be an additional metric that is also important to many who are seeking to have impact through participant involvement. However, we have not set a requirement for that at this point in time, currently the global metrics include only the output measures of participation. JAnstee (WMF) (talk) 08:09, 13 September 2015 (UTC)
Actual content usage
[edit]Some metrics here seem tailored to the inadequate state of our tools, rather than to our objectives. That might be understandable, but tools are also evolving. For instance, when we explain WLM to public officers to convince them to give us their support, they care very little about how many images are uploaded or how many articles are covered; but they are very interested in hearing mediacounts statistics. It's useful to know not only how frequently files are loaded, but also how many people bother to see larger versions or the originals and how much usage the files receive from outside Wikimedia wikis (suggesting they're possibly going viral). --Nemo 13:47, 12 September 2015 (UTC)
- To some extent the global metrics have been limited to those metrics which are most easily accessible to movement leaders, however, as the analytics team has continued to make progress we expect grantees and movement leaders to continue to extend their local measurement practice to include those metrics which are most relevant for their program story-telling, for instance, page views is a critical metric for GLAM for which the analytics team is working to develop an API currently. Still, others remain somewhat out of reach which are highly desirable, such as global media use outside of Wikimedia. Importantly, the global metrics are only a core set of basic measures that help us to understand the reach of programmatic work and measurement should be customized locally to include not only these global metrics, but the global metrics AND most relevant and accessible local measures as movement leaders develop methodology for capturing things like learning, attitudinal change, and other longer-reaching outcomes. JAnstee (WMF) (talk) 08:15, 13 September 2015 (UTC)
Please help with global metrics at Grants:IEG/Wikipedia likes Galactic Exploration for Posterity 2015
[edit]Dear Fellow editors, I am working on a project of sending Wikimedia into outer space. I wrote what I think is a good list of global metric on the project page at Grants:IEG/Wikipedia likes Galactic Exploration for Posterity 2015 but I need help. This project page said go to the talk page if you need help. Please edit a better and more appropriate list of global metric for the space project. Thank you for your time and attention in this matter. I appreciate and help in getting this project off the ground. Geraldshields11 (talk) 14:47, 15 October 2015 (UTC)
Thoughts at Grants talk:Learning patterns/Calculating global metrics
[edit]I was looking at Grants:Learning patterns/Calculating global metrics and wondering a lot about what I was reading there. I posted some thoughts to Grants talk:Learning patterns/Calculating global metrics. Blue Rasberry (talk) 20:41, 25 November 2015 (UTC)
- See details of Global Metrics history and use as outlined in response to related comment at the learning pattern talk pageJAnstee (WMF) (talk) 04:21, 4 December 2015 (UTC)
Discontinuation of Global Metrics
[edit]@MCruz (WMF) and JAnstee (WMF): Since Global Metrics are no longer required under Community Resource's new grant programs, I'd like to simplify this page for archival purposes (keeping the metric definitions and resources, but moving the "history", "what they are/are not" etc. sections to another sub-page), and add a clear message about their discontinuation and replacement with grant metrics. I believe I did the latter part on the old page in the Grants namespace, but that's disappeared. Let me know if this all is okay. -- Shouston (WMF) (talk) 23:48, 16 October 2017 (UTC)