As Wikimedia program leaders and evaluators work together toward more systematic measurement and evaluation strategies, we face a common challenge to have both systematic quantitative data and qualitative information. By doing so, we can establish successful practices in outreach programs, and understand the depth and variety of Wikimedia programs across our vast Wikimedia landscape. To meet this challenge, we are working to combine efforts and knowledge for two key purposes:
- To generalize program knowledge and design patterns of Wikimedia programs, and
- To deepen our understanding of Wikimedia programs and projects and how they impact our communities.
Often times, program leaders and evaluators question whether methods and measures that are quantitative are preferred over the qualitative, or whether qualitative outcomes can be given value like quantitative outcomes.
A good evaluation requires numbers and stories—there is no meaning of one without the other. How those stories and numbers are collected can, and will, vary greatly, as each program leader seeks to understand their project or program story. Methods and measures for program evaluation should be designed to track how your project or program is doing, what it intends to do, and how well the expected outcomes are reached. Whether those methods and measures are qualitative or quantitative will vary depending on your interests, your measurement point, and your evaluation resources, but should, no matter what, be useful in telling the story of what you did and whether or not your work was a success.
What about the “Quantitative vs. Qualitative” debate?
The divide between quantitative and qualitative is a debate theoretical in origin, one between positivism versus idealism. The first one holds assumptions and beliefs in a single unchanging truth; the latter states constructivism and belief in truth as a collection of varied interpretations and truth as socially constructed.
200px|thumb|right|EKQ ürəyin elektrik fəaliyyətinin göstəricisini verir.

It is a phenomenological divide between physical versus non-physical phenomena. Between one system of thought accepting only phenomena which can be systematically sensed through basic human sensory perception (i.e., sneezing, blinking) and/or scientific instrumentation (i.e., temperature, heart rate, brain activity); and another which contemplates phenomena which cannot be sensed through basic sensory perception but are still thought to exist (i.e., love, fear, happiness).

Çox vaxt bu, həm də deduksiya ilə induksiya arasında metodoloji fərqdir. Kəmiyyət yanaşmalarından istifadə olunur. bir hadisənin baş verməsini sınamaq və sistematik olaraq ölçmək və bir fenomeni araşdırmaq və anlamaq üçün keyfiyyət metodlarından istifadə olunur.
Kəmiyyətlə keyfiyyət arasında mübahisə elm fəlsəfəsinin mərkəzində dayansa da, bugünkü tədqiqatlar daha çox qarışıq metodlar və ölçülərdən istifadə etməklə həm kəmiyyət, həm də keyfiyyət komponentlərini əhatə edir. görünməyən mürəkkəb sosial hadisələr.
Yaxşı qiymətləndirmə rəqəmləri və hekayələri tələb edir - biri olmadan digərinin heç bir mənası yoxdur. Bu hekayələr və rəqəmlərin toplanmasının yolu hər bir proqram rəhbərinin öz layihə və ya proqram hekayəsini anlamağa çalışdığı üçün çox dəyişə bilər və dəyişəcəkdir.
Çox vaxt, kəmiyyət və keyfiyyət ölçülərinin trianqulyasiyası yolu ilə bugünkü sosial tədqiqatçılar maraq fenomenini dəqiqləşdirmək və ölçmək üçün təxmini ölçülər və ya “proxies” müəyyən edirlər. Birgə üçbucaqlı ölçülər, keyfiyyət və kəmiyyət, tək başına olduğundan daha yaxşı nəticələr hekayəsini izah edə bilər. Məsələn, könüllü redaktə davranışı fenomeni:
"edit count" "bytes added/removed" "page views" |
+ | = | |
"article subjects" "categories" "quality" ratings |
Evaluating In a mixed-methods world
Praktikada kəmiyyət və keyfiyyət daha çox ölçmə sikkəsinin iki tərəfi olur: praktikada əlaqəli və demək olar ki, ayrılmazdır. üçbucaqlılıq və qarışıq metodların çox vaxt arxa planda olan nəzəri müzakirələrdən qopduğu və kök fəlsəfi çağırışdan, pozitivist paradiqmadan çox uzaq olduğu və birinin reallığına inamla əlaqəsi olmadığı düşünülür. həqiqət, inad. Fərqləndirməyə çalışdığımız zaman belə bir şey olur:
Phenomena | Measures | Methodology | |
Quantitative |
Qualitative |
All quantitative measures are based on qualitative judgments
In quantitative measures the judgment just takes place in advance, in anticipation of possible responses, and often with direction for numeric assignment, rather than post-hoc after the data are collected. Numbers do not mean anything without assigning a description. Whether it is a question about physical count data, or about an attitude, we must create the meaning of numbers in measurement.
- With count data, such as using «edit count» or «bytes added» we can end up with a story piece that tells us precisely how many times our student cohort hit the save button (i.e., edit count) and how much content was added (i.e., bytes added) but says nothing about the qualities of those contributions or how participants worked to make them.
For this reason some may opt for what such metrics and consider them to be a more rigorous assessment of editing behavior and choose to measure respondents edit count directly through the WMFLabs tool, Wikimetrics, for instance. Still, what «counts» as an edit is qualitatively defined in the tools parameters as the count of each time an editor hits the «save» button, and that edit is «productive» meaning that it is not reverted or deleted [1]. Further, the metric «edit count» does not directly correlate to the amount of text or other data contributed by an edit, or time devoted to editing before that save button is hit.
However, the metrics are very useful for telling a piece of a program story, for instance:
To get a deeper understanding of what took place, we may want to ask participants to self-report their experience editing any Wikimedia project to see if the editing behavior measured on the target project (Wikipedia) tells the whole story of editing behavior.
In presenting a self-report question to learn how often a student edited during a course, we have several choices and routes as well:
- We may ask for students to respond with a direct average estimate: On average, how often did you edit WP during your time in this course? and leave it open ended, which could lead to some giving a number for daily, weekly, or monthly editing sessions and a lot of post-coding of the data into a consistent interval scale.
- Biz intervalı təyin edə bilərik: Bu kursda olduğunuz müddətdə hər həftə orta hesabla, WP-ni nə qədər tez-tez redaktə etmisiniz? və həmçinin cavablardakı dəyişkənliyi azaldaraq, onu açıq buraxırsınız və əsasən [2] cavab intervalına nəzarət edir və cavabların tam çeşidini əldə etmək üçün yalnız minimal poçt koduna malikdir.
- If we already knew the range of responses might be limited, or if we had a target performance level, we could further control responding and reduce variability, by defining count data in ordinal response categories which are assigned a response numeral (1) through (7). Each of these methods will produce slightly different results and lead to different steps and burden in data cleaning and analysis, depending on how we attach meaning.
How often did you edit Wikipedia, or other Wikimedia projects, during your time in this course? | (1) Ayda bir dəfədən az (2) ayda 1-3 dəfə (3) ayda 4-5 dəfə (təxminən həftədə bir dəfə) (4) həftədə 2-3 dəfə (5) həftədə 4-5 dəfə (6) həftədə 6-7 dəfə (təqminən gündə bir dəfə) (7) Gündə bir dəfədən çox |
Defining count data in ordinal response categories that are assigned a response numeral (1) through (7). |
By assigning numeric meaning to quantify behavior we end up with a different story piece, for instance:
On the other hand, we might have a qualitative outcome of interest such as a feeling or attitude about editing. While behaviors are observable and we can easily define and count them, feelings and attitudes cannot so easily be tracked. Similar to the quantitative categorical scale shared above, we can assign numbers to mean different levels of applicability of a sensed state of being which cannot be otherwise observed.[3]
From this, we could end up with another small story piece, for instance:
While we can also observe editing behavior directly to assess whether a person is able to edit by checking how much they edited, we would be making an assumption that performance is equal to experienced preparedness in order to tie the measure back to feeling prepared. Instead it makes more sense to ask for qualitative experience data. Depending on the measurement point one may make sense more than another.
Ideally, the best story would contain multiple descriptive parts, the quantitative observation of online editing behavior as well as the qualitative description of editor’s reported preparedness and the descriptive information about what that editing behavior worked to improve. So that we could say something more like:
All qualitative measures can be coded and analyzed quantitatively
Conversely, any simple qualitative coding table can be easily converted to quantitative numeric coding and analyzed quantitatively to reveal how qualitative themes relate to one another or how similar or dissimilar interviewee responses were to one another.
As seen in the qualitative and Quantitative data tables below, simple qualitative coding table can be easily converted to quantitative through simple binary coding, «0» for not observed and «1» for observed, or into count data using the sum of codeable observations.
Bir dəfə çevrildikdən sonra keyfiyyətli mövzuların bir-biri ilə necə əlaqəli olduğunu və ya müsahibələrin cavablarının bir-birinə necə oxşar və ya fərqli olduğunu araşdırmaq üçün əsas əlaqəli analizlər apara bilərik.
For instance, here we see that all interviewees made mention of their motivation, but were less likely to touch on the learning support theme. Those who discussed their skills did not discuss learning support and vice-versa. Further, there was 100% positive correlation of interviewees 2 and 4 as well as 1 and 5, and 100% negative correlation between interviewee 2 and 3.
In the end, there is a difference in qualitative methods and measures and what they produce in terms of knowing. For the most part, help us to understand deeper while quantitative allow us to systematically check that understanding across contexts. However, the divide is not-so-much, they are more dimensions than dichotomy, and more friends than enemies. In Wikimedia, we know context matters, in the work we do it is likely best for all program evaluations to consider triangulation of quantitative and qualitative measures and use of a mixed methods approach as we explore program implementation in each new context together.
Program Leader Next Steps:
Trying to choose the best measures for your Wikimedia project or program?
Check out Measures for Evaluation, a helpful matrix of common outcomes by program goal that we use to map measures and tools.
Vikimedia proqramını qiymətləndirmək üçün məlumat toplamısınız?
II raund Könüllü Hesabat açıqdır – Biz məlumat axtarırıq. Keçən həftə Proqramın Qiymətləndirilməsi və Dizayn komandası fondun könüllü proqramların hesabatının ikinci mərhələsinə start verdi. Biz bütün proqram rəhbərlərini və qiymətləndiriciləri indiyə qədər etdiyimiz Wikimedia proqramlarının ən epik məlumatların toplanması və təhlilində iştirak etməyə dəvət edirik. Bu il ondan çox müxtəlif proqramı araşdıracağıq:
- Editing Workshops
- On-wiki writing contests
- Editathons
- Wikipedia Education Program
- Conferences
- Wikimedian in Residence
- GLAM content donations
- Wiki Loves Monuments
- Hackathons
- Wiki Loves Earth, Wiki Takes, Wiki Expeditions, and other photo upload events
Did you lead or evaluate any of these programs September 2013 through August 2014? If so, we need your data! For the full announcement visit our portal news pages.
Reporting is voluntary, but the more people do it, the better we can representation of programs to help us understand the depth and impact of programs across different contexts. This voluntary reporting allows us to come together and generate a bird’s eye view of programs so that we can examine further what works best to meet our shared goals for Wikimedia and, together, grow the AWESOME in Wikimedia programs!
- ↑ Case in point, within Wikimetrics, this will be changing to give users to a count that also includes those edits made to pages that were since deleted
- ↑ Əlbəttə, Cavablarda rəqəmlərə xüsusi məna verilməsinə əlavə olaraq, biz həmçinin respondentlərin «redaktə» «Vikipediya», «Vikimedia layihələri» və «kurs» terminlərini başa düşməsinə dair fərziyyələr irəli sürürük. Biz güman edirik ki, iştirakçı həm sualı, həm də cavab variantlarını oxuyub başa düşür və onlar mənalı və dəqiq cavab verirlər.
- ↑ In such a case, we again make an assumption that respondents consistently understand the terms «edit», «Wikipedia», and response labels, as well as understand the intended interval nature of the response scale. Alternatively, we can try to observe the editing behavior directly to assess whether a person is able to edit, however, we would be making an assumption that performance is equal to experienced preparedness in order to tie the measure back to feeling prepared.
