Wiki Education Foundation/Wikipedia Fellows pilot evaluation

The English Wikipedia is a resource people use every day to better understand the world. In a time when terms like "alternative facts" and "fake news" have become shorthand for a wide range of political, educational, and epistemological challenges to public knowledge, it's crucial that we ensure the quality of our most popular source of information. Wikipedia's content has exceeded expectations, but due to the project's volunteer nature, quality is much better in some areas than others. Many of the most important topics are underdeveloped, uneven, based on outdated research, poorly explained, skewed toward a particular perspective, or simply neglected. If we want the general public to understand academic ideas or develop policies and behaviors based on science, we need academic scholars to contribute to Wikipedia.

In the past four years, the Wiki Education Foundation (Wiki Education) has signed formal partnership agreements with academic associations to improve Wikipedia in their topic area. Several of our association partners have indicated an interest in teaching their own members how to contribute, bringing rigorous academic expertise to articles important to their discipline. Academic associations have previously tried teaching members via Wikipedia edit-a-thons at conferences or online. Wiki Education saw an opportunity to tap into this energy and bring experts to Wikipedia. We believe, however, that it takes more time to understand Wikipedia than an edit-a-thon affords, so we developed a pilot program running from January–March 2018 that would attempt to mirror our successes in our Classroom Program, where we've developed a model to successfully enable more than 15,000 college and university students in the United States and Canada to contribute to Wikipedia each year.

We called this new pilot program "Wikipedia Fellows". In the Wikipedia Fellows pilot, we asked three association partners to spread the word among their members that they could learn how to edit Wikipedia by participating in a 3-month program with Wiki Education staff. We sought to find out whether the nine academic experts who joined a program on behalf of their member organizations could learn how to expand or create at least two Wikipedia articles relevant to their areas of expertise.

What follows is an in-depth evaluation report of what we did, what worked well, what didn't work as well, our key learnings, and our plans for this program's future. We welcome questions on the talk page.

Theory of change

Wiki Education has proven it's possible to teach university students to contribute to Wikipedia, improving its breadth of coverage and its quality. But students are early in their careers as scholars and thus limited in their experience and expertise, meaning high-traffic articles core to their discipline aren't usually appropriate for students to tackle. We believe that academic scholars, who have dedicated their lives to studying particular subject areas, have the training, experience, and understanding to contribute specialized, complex, comprehensive information about their subjects to Wikipedia. They are well equipped to evaluate the quality and coverage of content on Wikipedia such that they can identify and address content gaps, bias, and sourcing that less experienced contributors may miss. We have connections with academic associations who have partnered with us over the last four years, and we believe we can leverage those relationships to recruit their members, inspire them to consider Wikipedia a vital platform for public knowledge, and teach them how to improve the public's most popular knowledge source. If successful, the Wikipedia Fellows program would facilitate subject matter experts adding high-quality knowledge to Wikipedia about their topics of study.

Key questions

We sought to explore and answer these key questions:

Can we leverage relationships with associations?
How do subject matter experts specifically impact article quality?
Can we sustain three months regular meetings and avoid dropping out?
Can we retain subject matter experts; will they remain active?
Is this pilot worth continuing?

Preparation phase

Research stage

The basic idea behind Wikipedia Fellows, to help subject-matter experts to contribute to Wikipedia, is something Wiki Education staff, and many in the Wikimedia community, have discussed considerably both internally and externally. We knew from conversations with Wikipedians, academics, and associations that there is interest in such a project, but similar efforts have not consistently shown a clear positive impact to Wikipedia content. We have experience working with academics in other contexts, and a great deal of experience training new Wikipedians via the Classroom Program. What we have not done is provide sustained training and support to academics. Academics have different needs and abilities than students, and working with the group directly rather than supporting an instructor that, in turn, works with the group, is a departure from our regular activities and thus required additional research. At this stage in our organization's development, we believe our support infrastructure and processes have developed and are robust enough to adapt them for engagement with academics. This way, we would not only train them to understand Wikipedia, but also support and motivate them to make substantial contributions to their fields of study.

We knew going in that it would make sense to use our existing systems and practices as a starting point, but to explore ways to adapt them for this unique project. To that end, we looked into other Wikipedia training programs that extend beyond on-wiki documentation/slides. For example, Pete Forsyth has a project called "Writing Wikipedia Articles" (formerly WIKISOO/School of Open), an online course to train anyone who wants to be trained. It involves self-guided learning, providing useful documentation, videos, etc. for those interested, and in select instances of the program it has involved online video meetings. Another example is Lane Rasberry's training for the Cochrane Global Ageing initiative. Whereas Pete's program was 6 weeks, Lane condensed training into 4 weeks. Lane met with the participants in a video call five times over that time. Each meeting included 20 minutes of presentation and 40 minutes of demonstrations and discussion, and each ends with a homework assignment. It's worth pointing out that the Cochrane Global Ageing project was one of the more recent inspirations for Wikipedia Fellows. We also incorporated findings from the psychology Summer Seminar we ran in 2015, another project focused on working with academics.

There are several other training projects, but most are geared toward new Wikipedians generally and/or have as their basis the Wikipedia Education Program materials and other resources that Wiki Education staff were directly involved with, and which our current infrastructure already incorporates or improves upon. Other training projects are strictly online or textual. The Wikimedia Foundation and others have produced videos that provide instruction, but which are geared toward a more general audience and not appropriate for incorporating into this program.

The Wikipedia community has long understood the potential for subject-matter experts to be valuable contributors to the encyclopedia. Thousands of articles are tagged as "needing expert attention" and community members have written on the topic of getting experts or academics more involved. Within these writings are valuable insights about how the community views experts and advice for setting expectations or adapting to the Wikipedia style of writing and interaction. Related are the projects which seek to involve experts in ways other than editing, given the challenges academics often face when starting to contribute to Wikipedia. Wiki Education staff presented at WikiConference North America and Wikimania on the subject of alternative forms of expert engagement, and informative discussions with community members from these presentations informed aspects of Wikipedia Fellows.

Identifying partners

We invited three of our most active partners, the National Women's Studies Association, the American Sociological Association, and the Midwest Political Science Association, to participate in the Wikipedia Fellows pilot. These associations vary in membership size and discipline, giving us the opportunity to test whether membership size affects the level of interest and applicants for such a program. We have close relationships with the staff at these associations, and we believed they would trust Wiki Education to pilot a new program that could benefit their members and mission.

All three partners jumped at the opportunity to participate in the Wikipedia Fellows pilot, so we collaborated to develop communication strategies and establish criteria and requirements for Wikipedia Fellows. The participating partners saw value in training their members to make Wikipedia more accurate, as they understand that Wikipedia is where the public learns about topics related to their discipline. They believed members would communicate valuable knowledge about the discipline to the public, advancing their organizations' missions to educate the world and advance understanding of their relevance. Partners were excited that the pilot was interdisciplinary, as their members would work closely with scholars from other backgrounds, helping them emerge from academic silos.

Recruiting Wikipedia Fellows

Our academic association partners believed their members would join a program like Wikipedia Fellows as service to the profession. One partner recommended targeting department chairs to become Wikipedia Fellows, as they are already in a service role and are thinking about projects to bring back to their departments. To achieve tenure, academic institutions of higher education often require service work, so instructors are evaluated on the work they put toward advancing the field. Thus, some partners included in the call for applications that they would write a letter for the Wikipedia Fellow's portfolio. We encouraged partners to consider offering Fellows honoraria, travel scholarships to their conference, or conference fee waivers. Parnters were amenable to the idea but most said they needed more time to be able to offer it. We hope this might be able to be built into future Fellows cohorts.

Partners recommended our curriculum require no more than three working hours per week, as instructors spend about that much time lecturing just one course per week. We developed an application form using Google Forms, and our partners distributed the request for applications through social media channels, in newsletters to their members, and via email. The application asked members questions about why they should be selected as one of three Wikipedia Fellows per association.

Key Learnings: Preparation

Academic associations see value in training their members to improve Wikipedia articles related to the discipline.
There is a demand for a structured program that teaches academic scholars how to contribute to Wikipedia.
One partner shared their own data about academia's deadline-driven culture with us: 95% of conference submissions come through within 12 hours of a deadline. This helped us understand the need for artificial deadlines, reminders (for applicants), and internal timelines with built-in deadline extensions.
Partners appreciated that Wiki Education already had a ton of experience working with university faculty going in to this pilot.
Partners were not uniformly able to commit to providing an honorarium, travel funding, or a conference fee waiver up front, but may be able to arrange such a small award given more advance planning.
Partners varied slightly in their opinions regarding the amount of time we should request of Fellows. All of them found 3 hours to be reasonable, and one partner opined that 5 hours would be an unrealistic commitment.

Selection phase

By the application deadline, we had received 87 applications: 23 from the American Sociological Association (ASA), 17 from the Midwest Political Science Association (MPSA), and 47 from the National Women's Studies Association (NWSA).

Call for applications

We developed a boilerplate call for applications, which we then tailored to each partner. The call for applications emphasized the importance of Wikipedia as a source of public knowledge, explained that there are significant opportunities for members to make a difference, and offered details about the pilot: it is interdisciplinary, will run for 3 months from January to March, does not require Wikipedia experience, requires a commitment of 3 hours per week, includes training and support by Wiki Education staff, involves making substantial contributions to Wikipedia, and ends with each participant writing a blog post.

The application asked for basic biographical details, information about their experience with Wikipedia, examples of topic areas they would contribute to, why they are interested to participate, and how much time they estimate they can commit each week. Although we asked for a commitment of 3 hours per week in the call for applications, we used an open question about their available time to get a better sense of their availability rather than a simple yes/no question.

The NWSA announcement went out first, and was the most well publicized. The MPSA announcement went out next, and the ASA announcement went out late due to communication mishaps. ASA announced only via social media, with merely a week for members to apply. ASA representatives were not involved in the screening process for the sake of time, as the call went out just before winter break. We knew beforehand that the number of applications we would receive from each would likely be affected by the prominence of our partnership within the association and the size of their member bases, but due to these circumstances we should consider the effects of the variation between prominence of their calls for application, the timing of those announcements, and the amount of time members had to apply.

Selecting Wikipedia Fellows

Each partner agreed that Wiki Education would directly receive applications, review applicants, and recommend program participants, with the association having the opportunity to approve our recommendations. As we began receiving applications, we realized interest in the program exceeded our expectations. We did not expect to see so many applications come in so quickly. NWSA was both the first wave of applications to arrive, and at the time we were not sure if that interest would be matched or even exceeded by the other associations (ASA in particular has a very large member base). As they continued to come in, we sent an email to all of the applicants reaffirming with them that there is an expectation of at least 3 hours/week, and that it's important they be able to make contributions to Wikipedia (in other words, to please not participate if they do not intend to actively contribute or if they are unsure of their ability to commit time). We explained there were already many more applications than available spaces. Only one person dropped out following this email, however. This may indicate that we did a sufficient job of explaining the importance of being an active contributor and committing time in the call for applications.

For each association's applicants, we read through their application responses to ensure they understand what they're signing up for. In a small number of instances, applicants viewed the program as directly related to the Classroom Program, for example. We tried to gauge their enthusiasm for the project based on their text-based answers, and to get a sense of their motivations and interests. We looked for people interested to contribute in addition to learning, and for topic areas that aligned with our goals of improving high traffic core articles on Wikipedia. In some cases, applicants' responses regarding their motivations indicated a perspective likely to run into conflict with regard to Wikipedia's policies and guidelines. Editing Wikipedia can be a powerful form of activism and engagement, but it can be very difficult to negotiate certain types of agenda-based editing goals with Wikipedia's policies like "Neutral Point of View." The question about time commitment was helpful to consider whether 3 hours was a stretch for the applicants, as opposed to being interested to commit even more than that.

Using the Google Sheet of responses to the form, we sorted the applicants by a color scheme roughly equivalent to a 1-5 scale based on the above factors: red (least qualified), orange, yellow, blue, and green (most qualified/most desirable candidates). We then combined the "green" and "blue" candidates, without colors, into separate association-specific spreadsheets to share with our partners: 9 from the ASA applicant pool, 7 from the MPSA applicant pool, and 15 from the NWSA applicant pool.

NWSA was first to issue their call for applications, and thus first to meet with us to review applications. They had done their own evaluations and determined whose membership statuses were current, and together we narrowed the field to three. We met with MPSA next, following the same process. Because the ASA announcement went out so late, just before winter break, it wasn't practical to involve them in the decision-making process. We asked for permission to simply move ahead with the candidates we assessed to be most qualified, and they were amenable.

Recommendations for future selection

At the time of our initial review, we tried not to pay too much attention to the applicants' titles, and thus did not prioritize faculty over graduate students. On reflection, this may have been a mistake. There are differences in the way a grad student and faculty member might approach this program, including the background they bring, the way the amount of time they have spent in the academy affects how they adapt to the Wikipedia style of writing and collaboration, the types of commitments they are likely to have, and the amount of time they can reliably commit. As a pilot, it was important to gauge whether the model of training academics — faculty in particular — to contribute to Wikipedia was a practical model for us to include in our programs, and including grad students could complicate our attempt to answer that question. It was at the partner review stage that we introduced the distinction between faculty and grad student as something we may want to factor into the decision-making process. Our partners involved in these decisions, at NWSA and MPSA, agreed that it was something we should keep in mind. Importantly, there were a significant number of highly qualified grad students who applied to participate, and we were glad to keep their application information on file for the future. This mostly affected MPSA, which saw a particularly high ratio of grad students to faculty among their applicants. In the end, we included one grad student in the pilot cohort. This is something we should be thinking about moving forward. We should clarify who it is open to and vary our approach to learn more about the potential for including different types of participants.

Key Learnings: Selection

Be clear about who the position is open to (specifically, should grad students apply? should staff/non-faculty?). We do not have sufficient data to say that one type of contributor was more successful than another at this stage, however, and we should think about varying cohorts by participant type/career level in the future.
Emphasize the extent to which contribution to Wikipedia is expected.
To avoid people applying when they don't have time in the coming months, make it clear that this is not the only opportunity — that if they don't apply to this one, there will be another chance.
It would be better to have more lead time before the call for applications was announced to ensure they go out at roughly the same time, with the same amount of time available to potential applicants.

Curriculum

Part of the way we could justify running this pilot is that it takes advantage of our existing support infrastructure. We have years of experience supporting new Wikipedians as they learn to contribute via our Classroom Program, and have a robust suite of processes, tools, and materials in place to ensure their success. As a pilot for which there was no guarantee it would continue, it was not practical to devote significant funds to the development of new training or support features. Thus the starting point was with our existing systems and best practices, with the understanding that it will need to be adapted for this particular use case.

As explained above, we searched for other sorts of training systems that involve live meetings and non-student participants, to look at how others have done this. We also have what we learned from the psychology Summer Seminar, when most participants did not follow through with their contributions.

The full timeline is available here. Each week had a meeting agenda, which included project-related items as well as discussion prompts. Each week had a task/assignment and a milestone that broke the process into a series of steps. Milestones and weekly tasks were important to keep Fellows on track, providing actionable tasks and a sense of accountability for them. Several weeks included trainings and additional resources. The idea was to have Fellows do the trainings outside of meetings to then discuss when we came together. It seemed effective, although the notes taken by staff during and after these meetings contain a great deal of useful feedback and reflection regarding specific examples of what worked and what did not work that we will use to adapt the curriculum for future cohorts.

Meetings were supplemented by Slack and, to a lesser extent, on-wiki communication. Several of the Fellows seemed to enjoy using the Slack channel, but a couple never really used it. Slack was not, however, a formal part of the curriculum beyond asking that they sign up for it at the outset.

Key Learnings: Curriculum

The milestones and weekly tasks on the timeline we created helped keep Fellows on track.
We took copious notes on minor changes we should make to the curriculum for future cohorts.

Meetings

Meetings were a core part of the Wikipedia Fellows process. In the beginning, the whole cohort met together each week. When scheduling challenges became apparent, we split the group into two groups based in large part on availability. Those groups then met weekly on Tuesdays and Wednesdays. Meetings were initially planned to be 90 minutes each for the first half of the pilot, and 60 minutes for the second half. To simplify the scheduling process, and to increase the amount of time Fellows could spend actively editing, we changed it to 60 minutes for the entire pilot.

Scheduling and meeting size

Prior to the first meeting, we sent a Doodle poll to participants to schedule the first meeting, to take place the second week of January. It was scheduled during the only time all nine participants were available between 7:00 AM PT and 3:00 PM PT. At the time, multiple Fellows commented that, since the semester had not yet started, their availability would be more limited thereafter.

We sent another Doodle poll for the second week, which proved much more difficult and complicated. Based on feedback from Wiki Education staff on the east coast, available times for the poll initially ended at 1:00 PM PT. Based on feedback from a Fellow on Alaska Time, start times began at 8:00 AM PT. When this did not yield any shared availability, we extended the end times to allow for as late as 3:00 PM PT meeting times. By the time everyone had responded to this updated poll, a couple people had emailed to say that their availability had changed since completing the poll. After working with individuals by email to work toward a shared meeting time, it was necessary to schedule the meeting for that week without being certain everyone could be there, and to split the cohort into two meetings moving forward.

Based largely on availability, but also striving for interdisciplinarity, we created Tuesday and Wednesday meetings, with the understanding that people should try to come to their designated meeting but could switch occasionally if necessary.

Around the third meeting, one of the ASA Fellows had to withdraw from the pilot. As it was early enough in the process, we brought on another of the most qualified applicants from ASA, who was able to catch up quickly and become an active participant in the pilot. Due to his own scheduling constraints, he was unable to take the withdrawn Fellow's place in the Wednesday meetings.

The scheduling difficulties were time-consuming for our staff and confusing and potentially stressful for Fellows. For this reason, in the future, we should keep scheduling in mind at the outset. For example, dividing a cohort into availability-based groups at the outset or including preset days and times in the call for applications, limiting applicants to those who know they can attend.

Interdisciplinarity and peer reviews

When one ASA Fellow dropped from the pilot and was replaced by a Fellow who could not attend the same meeting, the Tuesday meeting then included all three ASA Fellows. Although it is not certain that this is the cause, the Tuesday meetings tended to discuss sociological topics. Having all three of the ASA Fellows together meant the Tuesday meeting could have discussions of ASA-specific topics, which came up from time to time. On one hand, this speaks to how well members of the same association can collaborate and feed off of each other's enthusiasm, but it limits opportunities for interdisciplinary collaboration.

The interdisciplinary component of the pilot in general was a little more limited than we had originally intended. Part of this may be due to the meetings splitting, but may indicate opportunities for structured activities that foster interdisciplinary collaboration. For example, introducing a "buddy system" or small groups based on common interests early in the process, reaching across disciplines. Another possibility is to better integrate peer review processes into the structure.

The timeline originally involved two formal peer reviews, similar to what we suggest for student editors. When time came for the first peer review, most did not take place in their designated week, and by the end of the pilot not everybody had received a review. Fellows said that the technical process for doing the review, through the Dashboard, was unclear, with some people working in sandboxes, some people working in mainspace, people with differently formatted contributions mixed with content taken from existing articles, etc. In some cases they were unsure what sort of feedback they should be providing, themselves being new to Wikipedia's best practices. We could have provided clearer instructions than were given in the timeline by going over the technical elements of the process during our meetings, and in one meeting we spent time doing just that, but after the peer review milestone had come and gone.

At the same time, several Fellows took it upon themselves to leave comments on other people's articles, on article talk pages, on user talk pages, and in Slack. In fact, there were more informal reviews and comments than there were formal peer reviews. Given the time constraints on the pilot, having asked participants to contribute only 3 hours each week, and given the emphasis on contributions, the formal peer review element simply took too much time and added too much confusion to justify a relatively small benefit, and the second formal peer review was removed. Unlike most students, academics are very familiar with how to do an effective peer review, and the structure we use with students may be both unnecessary and hindering to academics. We should consider other ways to implement interdisciplinary review opportunities that feels more natural to participants.

Meeting structure

Wiki Education staff hosted weekly video meetings with Wikipedia Fellows. We used Zoom teleconferencing software to host the meetings, which seemed well-suited for this purpose. None of the Fellows expressed that they had technical difficulty regarding Zoom, and it made simple the process of recording the meeting in order to make it available to participants who could not attend or who wanted to review discussions.

The meetings were useful in establishing friendly relationships and trust between the Fellows and Wiki Education staff. They served to discuss the Fellows' contributions, answer questions about editing, bring up issues relating to the Wikipedia community, explain Wikipedia policies and guidelines, provide additional information or context about topics covered in the trainings, share ideas for topic improvement and article evaluations, and to discuss social, political, and philosophical aspects of Wikipedia.

Toward the beginning of the pilot, opening discussions focused largely on the Fellows' explorations of their topic areas on Wikipedia, sharing observations and criticisms of content they came across. These were often lively and engaging discussions, with several interesting anecdotes and analyses. These played a smaller role toward the end, except when selecting a second article, but were consistently helpful in generating discussion, bringing multiple people into conversation, based on overlapping areas of interest, and understanding some of the issues one comes across in Wikipedia content. In some cases, Fellows noticed interesting talk page discussions about the content, which led to useful discussions about the relationship between discussion on the talk page and collaboration in the article, and the policy basis of talk page discussions. The Fellows were generally highly engaged in these discussions, which likely speaks to their passion for their fields of study and how the public understands them.

Once they started actively editing, discussions of their experiences contributing played a larger role. We looked at the contributions beforehand in order to predict who would have comments, and to see if there were any particular contributions or interactions we wanted to ask them to talk more about. These were often helpful both individually and broadly. In the early stages, Fellows discussed new discoveries about the way Wikipedia works with a sense of discovery and wonder. Other times, however, there was not much to talk about—particularly in later weeks. We did not omit this discussion entirely on any week, since some of the Fellows did research and drafting off-wiki, so it was not safe to assume a lack of activity on the Dashboard meant no activity. There were meetings, however, when only one or two, or even none of the Fellows had made contributions in the preceding week. This is to be expected in a small group of academics, whose availability may come in waves over the course of the year/term. While we were typically able to get a conversation going, the format of this part of the meetings made for some slow conversations and periods of silence until someone took the initiative to jump in. It did seem like some of this time could have been more effectively utilized (i.e., there are situations when it's important to accept silence pending discussion, but if it happens too often, there's something we could improve in our curriculum and discussion prompts).

Most weeks included among the agenda one or more discussion prompts. When meetings went deep into conversation about the Fellows' contributions, discoveries, and interests, we did not always get to these discussion prompts, but they were useful when those discussions began to subside. When discussions did not take off, the discussion prompts were departures into more abstract Wikipedia topics and did not do much to stimulate conversation about the Fellows' own activities. One approach may be to include discussion prompts in the weekly task, to have Fellows consider them and/or come to class with a particular idea or example. If better integrated into their activities, the transition between topics would be smoother, and chances for silence fewer. They could be a more prominent and reliable part of the meeting agenda, rather than letting other conversations take priority and treating them as optional.

Sometimes a subset of Fellows were clearly more talkative and active in discussions. Many times, this served to stimulate conversation that involved everyone, but we should be conscious to avoid a situation in which some people aren't able to participate to the same extent, making them feel less connected to the overall program. This would not just include less talkative and/or less assertive participants, but people with worse internet connections or limited fluency in spoken English. We will need to think about ways to involve all participants on a consistent basis.

Cohort size and managing sub-cohorts

The first two meetings were attended by 9 and 8 Fellows, respectively. It is hard to take away clear lessons from the size of those meetings given there were only two, but it seemed like a manageable size at the time, with many people getting involved in discussion and little interruption. As mentioned above, however, the Fellows were necessarily split into two meetings starting in week 3. One meeting had five people and the other had four. When everyone was present, the group with five people seemed like an appropriate size. Four-person meetings could still be highly engaging, but when meetings had three or fewer participants, discussion was noticeably slower and meetings generally less productive. Given our limited data, it is unclear to what extent the quality of the meetings was due to size, general talkativeness of the participants, and/or business/availability of the participants (for example, if three of four participants had a busy week and could not substantially contribute to Wikipedia in that time, it may be less successful than a meeting with only three people in which all three had been highly engaged). Due to differences in the groups themselves and in the week-to-week activities of each group's participants, the two meetings were often quite different. In a couple instances, when one group had not done much editing in the preceding week, we ended the meeting early to use that time to focus on editing rather than trying to stimulate conversation about other things. In the future, we may want to experiment with either doing more to ensure different meetings of the same cohort go at the same pace or considering each meeting time to be a different cohort.

Key Learnings: Meetings

Scheduling should be sorted out in advance to any extent possible, and possibly even built into the call for applications. With participants spanning four time zones and the pilot beginning at the start of a term, scheduling was a major obstacle that took considerable staff time to sort out, complicating the pilot's flow in early weeks.
For a cohort of 4-5 people, an hour seemed like an appropriate meeting length. During the slower weeks, less time would have worked, too. Additional time was rarely needed, although most meetings did take up the full hour.
Meetings with fewer than 4 people were often slower and quieter than meetings with 5 or more.
Meetings were much more productive when Fellows had been actively contributing in the preceding week.
Relatively open-format discussions work well with small groups, and seemed to build camaraderie and cohesiveness to the group. We cannot determine their effectiveness with larger groups, and this unstructured format may not be as successful if participants are less engaged.
In an interdisciplinary cohort, meetings comprising largely members of a single association may lean disproportionately toward discussions relevant to that discipline. Disciplinary balance is important in the make-up of the meeting participants.
Academics understand peer review differently than students, and we need to adapt the style of peer review we use with students to ensure it's relevant rather than having Fellows try to comment on early-stage drafts for the sake of understanding wiki markup. That time in this case would have been better spent working on their own articles, with peer review left more informal.
Discussions of Fellows' article evaluations and topic exploration were often lively, with anecdotes springboarding us into key policy-related or community-related topics.
The curriculum had a logical flow to it which helped conversation grow as the program continued. Participants got to know each other better and became more comfortable sharing their experiences.
Given that this was a small cohort, the chemistry between the individuals had a big influence on the conversations, and when one or two people didn't show up, conversation could get slow and plodding. A larger group would allow flexibility for one or two participants to miss a meeting without the other participants suffering for it.
More use could be made of discussion prompts/talking points when discussion slowed down. This is especially relevant in the meetings when Fellows had not been very active in the preceding week and thus had fewer questions.
We should be mindful to avoid using Wikipedia abbreviations and other jargon, both in meetings and other communication platforms.
One participating partner recommended 6–8 people as the maximum number of participants to include in a conference call or online meeting when your goal is to have a highly interactive meeting. If you can accomplish your goals with more of the audience remaining passive, the group can get much bigger. In that case, the cohort may not know and talk to each other, so you risk losing this personal touch, but you can scale your impact by involving a larger number of participants.
In hindsight, having a staff note-taker during the meeting was helpful. It is not realistic to have the meeting facilitator take meticulous notes, as opposed to select bullet points and reflections, but it is useful enough that it's worth determining who will take notes up front.
Several Fellows commented on how important it was to hear from staff about their early editing experiences, in order to explain that there is no one "right" way to become a Wikipedian.
Important to emphasize "be bold" early, and to reassure them regularly that they cannot "break" Wikipedia.
Fellows had a couple of minor disagreements with other Wikipedians. In both cases, it was ultimately productive, but part of the challenge in one case had to do with the fact that the article being edited was already a Good Article. It may be a good idea to provide best practices specific to editing already-highlighted content (which would not be among the articles that most need expert attention anyway).
We should have potential articles to work on at the outset, based on participants' interests and/or the theme. These articles are potential projects for participants, but more importantly can serve as basis for early edits/evaluations (they will be topics that clearly need work). This eliminates the time needed to settle on a single article while still learning about Wikipedia and exploring the topic area.

Wiki Education staff roles

Educational Partnerships Manager Jami Mathewson participated in the Wikipedia Fellows pilot by recruiting academic associations to participate. She helped build out the program's structure and tried to represent the partners' needs for this program. During the program, she participated in approximately 1/3 of the meetings to learn about program curriculum and the progress Wikipedia Fellows were making in learning about Wikipedia. In order to recruit future partners to participate, she believed it would be helpful to join one cohort to share that experience with academic associations, helping her make a compelling argument for them to participate.

Community Engagement Manager Ryan McGrady worked with Jami prior to launch to develop calls for applications for our partners, drafted the application, worked with partners through the announcement phase, reviewed applications, met with partners to finalize review, communicated with applicants during and after the selection process, and worked with Jami and and our Communications Associate on our announcement and other communication materials. He set up the timeline and overall structure of the milestone-based curriculum on the Dashboard, drawing from our existing timeline and with feedback from other staff members. After launch, Ryan scheduled and led the weekly meetings. Ryan facilitated discussions and offered advice and answers to questions along with other Wiki Education staff.

Wikipedia Content Experts Ian Ramjohn and Shalor Toncray participated in sessions as experts on everything Wikipedia. They provided suggestions, clarifications, and supported the Fellows on-wiki during the program. They articulated Wikipedia policy, provided specific examples of articles, and acted as a bridge from the program to the Wikipedia community. They fielded questions over email, during meetings, and on Slack. They were able to provide tailored advice about content contributions in sandboxes and on talk pages, making for a much smoother editing experience.

Program Manager Will Kent shadowed the pilot manager, Ryan, throughout the program. Ryan is taking a leave of absence for six months in 2018, and Will is scheduled to run the next batch of Wikipedia Fellows cohorts. He contributed to discussions, helped answer questions, and communicated over Slack with Fellows. He attended almost every Fellows meeting (as many as scheduling would allow) to inform program changes and growth during the next round of cohorts.

Outcomes

Quantitative impact on Wikipedia

Numeric run-down

Simply looking at words added, two Fellows stood out as making the most contributions.

But when averaging by articles edited, the impact evens out more.

At the end of the pilot, Fellows had contributed to 64 articles, including 2 new articles. They added about 29,100 words. The articles they made significant improvements to received about 1 million pageviews, and all of the articles they improved together received 3 million pageviews. Individual numbers of words added ranged from 464 to 7,946. The mean number of words added per Fellow was 3,233, and the median was 2,233. The difference between mean and median reflects the fact that the two most active contributors (in terms of words added — some important activities such as removing problematic content and rewriting existing content do not add as much to this figure) contributed 54.3% of all of the words. Some Fellows made a small number of large edits, while others made a large number of small-to-medium edits.

We had hypothesized based on past projects like this that some Fellows would make either minor or no edits; it's rare that an entire cohort of people stays active enough through a three-month program to significantly improve at least one article. We were pleasantly surprised that all of the Fellows added at least several hundred words to at least one article; the average words per article across the cohort was 650, which is right around the same average for our Classroom Program student editors. There was a wide variation in number of articles edited and amount of content added, but we were impressed that all nine made at least moderate contributions.

From ASA:

Sine Anahita contributed 7,865 words to 37 articles. Her most substantial contribution was the creation of the article about verbal interruptions, which covers the phenomenon and patterns of interruption related to gender, social status, race/ethnicity, culture, and political orientation. Sine made substantial contributions to the articles about social science, race, and multiple gender-related topics.
Michael Ramirez contributed 2,170 words to 4 articles. Michael concentrated his efforts on the article about masculinity, improving its quality in a number of ways, adding and developing multiple sections. He also added a section about social construction to the article about adulthood.
Bradley Zopf contributed 2,233 words to 3 articles. Bradley made non-minor improvements to each of the three articles: race and ethnicity in the United States, Arab immigration to the United States, and definitions of whiteness in the United States.

From MPSA:

R.G. Cravens contributed 1,281 words to 1 article. R.G. focused his energies on the article about LGBT conservatism in the United States, improving its sourcing and developing its history section.
Madeline Gottlieb contributed 464 words to 1 article. Madeline concentrated on improving the mineral rights article, removing problematic content and improving information and sourcing in multiple sections.
Nicole Kalaf-Hughes contributed 1,195 words to 2 articles. Her primary focus was on the article about procedures of the United States House of Representatives, creating a new section about speaking on the floor.

From NWSA:

Jenn Brandt contributed 7,946 words to 8 articles. Most of her efforts were in an overhaul of the Margaret Atwood article and major improvements to the article about women's studies.
Michelle Gohr contributed 3,148 words to 10 articles. Her focus was the creation of an article on the body horror genre, but also improved the sourcing or prose in a variety of other topics.
Maria Velazquez contributed 2,798 words to 4 articles. Her primary contributions were to the article about black science fiction and author Jewelle Gomez.

The number of articles edited ranged from 1 to 37. Part of the call for applications asked that Fellows be prepared to make substantial contributions to 2 articles over the course of the pilot. While all participants made substantial improvements to at least 1 article, only 5 of the 9 participants made substantial improvements to 2 articles. Approaches for the future that may increase the likelihood that participants will contribute to more than one article include:

spending less time in the exploratory phase in the beginning, getting into contributions more quickly
elongating the total length of the program
scheduling the program around less busy times for participants (this includes academic schedules as well as not overlapping with major conferences, since participants may have associated obligations)
selecting articles beforehand and recruiting participants based on interest in improving those topics

Structural completeness measurement with ORES

At Wiki Education, we've tried various methods of quantifying quality improvement of articles. To date, all of our successful efforts are not scalable: They require human assessment of each article before and after a program participant worked on it. This method produces good results, but is extremely resource-heavy, both on the staff end to recruit volunteer Wikipedians and subject matter experts to hand-review the quality of articles at both stages and then to analyze the results, and on the volunteer end for them to actually do the assessment. Thus, we've moved forward organizationally with the idea of measuring "structural completeness" as an automated stand-in for quality, using ORES's "wp10" score, which assesses at points in time how much a specific Wikipedia article resembles the typical structural features of a mature Wikipedia article. ORES calculates a score from 1 to 100 based on things like the amount of prose, the number of wikilinks to other articles, the numbers of references, images, headers and templates, and a few other basic features. While ORES is not a stand-in for quality because it doesn't measure the quality of the sources or the coverage of the topic, it is to our knowledge the closest scalable quality metric available to date in the Wikimedia movement. Wiki Education has set a 10-point ORES score improvement as a general metric for quality improvement of an article. We've integrated it into our Dashboard platform so anyone can see both the individual development in ORES over time of participant-edited articles, as well as a cohort-level view of all articles edited by that group.

At the outset of the pilot, we defined a goal for the Fellows pilot cohort of the improvement of 12 articles by at least 10 points on the ORES scale. We fell short of this goal, with only 8 articles having that 10-point improvement. We think there are several factors at play here:

Fellows selected articles that were already in the middle of the ORES rating. The average ORES score of articles Fellows improved by 10 points was 41.7. In comparison, the average ORES score of articles students in our Classroom Program improved by 10 points during the same term was only 24.9, demonstrating that student editors (correctly) select lower quality articles to begin with; Fellows are choosing mid-rated articles, which are harder to show significant improvement to on the ORES scale.
While we had anticipated that Fellows would each choose two articles to substantively improve, and some did, we found that other Fellows made moderate improvements to a lot of articles. Fellows added more than 1,000 bytes (a threshold we use for moderate improvements) to 19 articles.
Much of the work Fellows did to add in specific perspectives of their disciplines doesn't show up well on rankings like ORES; it gets more at subject matter expertise and quality of coverage rather than anything a machine can pick up.

While we would like to see Fellows contribute more content in future cohorts, and we will test whether we can encourage this through curriculum changes as outlined in the previous section, we will need to do future evaluations of whether this ORES measurement is a good fit for the Fellows program. We don't want to encourage Fellows away from adding a particular discipline's expertise in a short section of a longer article, for example, just because that won't show up in the ORES ratings. Edits like that are still an important value-add to Wikipedia that only subject matter experts like Wikipedia Fellows can bring to Wikipedia content, and we don't want to lose sight of that as an important outcome for the program.

Qualitative impact on Wikipedia

General quality assessment

As subject-matter experts, Fellows were able to easily recognize gaps in Wikipedia's coverage of topics, had the knowledge to fill those gaps, and had knowledge of the relevant literature that let them support their additions with high-quality sources that represented mainstream academic thinking in their discipline. For example, when Sine Anahita read through the sociology article, she was able to identify the lack of information about the sociology of gender, and was able to fill this gap. Similarly, Anahita was able to draw upon her knowledge of the sociology of race to add an overview of the sociological view of race in the race (human categorization) article and Michael Ramirez was able to introduce the "social construction of masculinity" into the masculinity article. The ability to identify important, but non-obvious gaps is a skill that allows subject-matter experts to improve the quality of of articles that seemed otherwise complete.

Articles that span an entire field of study can be especially difficult to write well because they require both breadth and depth of knowledge of the field. Far too often, articles like this end up giving disproportionate attention to many small details and controversies that catch the attention of Wikipedia, while still lacking a broad overview of the topic. Since she was able to approach it from an expert perspective, Jenn Brandt was able to make substantial improvements to women's studies by expanding the article to add missing details, trim or remove areas that had excessive detail, and rework it in general to more appropriately define and contextualize the field of study. When Nicole Kalaf-Hughes looked at the Procedures of the United States House of Representatives she found an article that lacked a crucial component of the what goes on in the House, and was able to add a section on speaking from the Floor of the House. In a similar vein, Maria Velazquez's additions to Black science fiction, R.G. Cravens' additions to LGBT conservatism in the United States, and Bradley Zopf's additions to Arab immigration to the United States, Definitions of whiteness in the United States, and Race and ethnicity in the United States were all examples of areas where Fellows were able to identify gaps and fill them using high-quality sources.

As academics, the Fellows were skilled at using the scholarly literature even when topics fell outside their primary areas of interest. When Sine Anahita discovered that there was no article on interruption, she was able to draw on the scholarly literature to create Interruption (speech), while Michelle Gohr was able to do something similar with body horror. Jenn Brandt's improvements to the Margaret Atwood article were sufficient to warrant a Good Article nomination. After the conclusion of the pilot, she shepherded the article through a lengthy and detailed review and promotion. For many Wikipedians, the back and forth of the Good Article and Featured Article review processes require mastery of a fairly steep learning curve, but academics experienced in the peer review process come to Wikipedia with that skill in place.

Impact on program participants

The nine Wikipedia Fellows created a strong community with each other. Two participating ASA members met up in person at a conference, which is a powerful connection to make from an online program. Ultimately, we do not know what kind of future collaboration Fellows will have with each other, but there was a lot of enthusiasm around collaborating on posters, research papers, or contacting their university's press office about this experience. The three ASA Fellows are collaborating in June 2018 to publish an article about their experience for the American Sociological Association's member magazine, Footnotes.

Several Fellows commented that they felt that building community was a significant part of their experience. During later meetings, there were conversations about collaborating in a research capacity about this experience or about Wikipedia in general.

While our retention efforts for our Classroom Program focus on retaining instructors, we hoped that we would be able to retain Wikipedia Fellows as editors. It is too soon to draw conclusions about whether we will retain any past the formal end of the cohort. The exit survey indicated that all participants want to continue to contribute to Wikipedia. This has been a common sentiment among our Classroom Program student editors, most edit-a-thon participants, and other programs in the Wikimedia movement — participants express the interest and desire to keep editing Wikipedia, but that doesn't materialize into editing. We are eager to look back in September (which is the six-month mark from the end of the program) to see if any of the nine Fellows are still editing Wikipedia.

Impact on educational partnerships

In the exit survey, several Fellows cited the partnerships as being integral to their Fellows experience. They either found out about the program through their association, felt that it was a significant program as a result of the association partnership, and/or felt compelled to follow through with the program because of the association partnership.

Anecdotally, the program seems to be a success with partner associations. MPSA, NWSA, and ASA have already signed on for future Fellows cohorts. It is our hope that by growing this program, it can serve as a significant role in professional development, personal growth, or even a certification someday.

Survey results

All 9 participants provided feedback on the 33 questions on our survey. Everyone was enthusiastic about the program. In addition to exceeding expectations, the program made almost everyone in the cohort feel more comfortable editing Wikipedia, creating articles, engaging with the Wikipedia community and contributing regularly. Most felt confident about teaching Wikipedia to others. Almost every participant felt like this program helped them understand how to better explain Wikipedia to students and colleagues. They agreed that contributing to Wikipedia will ensure that the public has access to accurate information, we can address systematic bias, and will ensure that we can learn more about how one of the most popular websites works.

Regarding the management of the program, almost the whole group found Zoom meetings helpful. More than anything else the meetings provided a space to learn, ask questions, and build community with the other participants. When asked about the trainings, participants unanimously agreed that the trainings were essential and well-crafted. Participants felt positively about the Dashboard timeline, although there were multiple opinions regarding pacing, assignment construction, and setting expectations. The other communication tool, Slack, had mixed reviews from this cohort. Either participants were excited about the immediacy of replies or they found it to be another chat platform, among too many they already used, which was redundant in an unhelpful way. There was a lot of enthusiasm around communicating because of how quickly participants were able to build community.

We asked several questions about the timing of the program. All participants responded that they spent more time on the program than the advertised three hours. There was some agreement that listing three hours or a range three–six, for example, would be more accurate. Some recommended keeping the requirement at three hours, but advising that it will probably take more. Participants had conflicting recommendations for the length of the program. Half wanted the program to be longer, while the other half cited the difficulty of timing during the semester.

All participants said they want to continue to edit Wikipedia.

Conclusion

Did we answer our key questions?

Can we leverage relationships with associations?

As evidenced by our exit-survey, almost all Fellows found the association partnerships useful. These partnerships introduced them to the program, incentivized them to apply, and will (we hope) act as a driver moving forward.

How do subject matter experts specifically impact article quality?

We saw the expert influence in quality of sources, related articles, restructuring of articles, clarity of writing, updating article information, identifying related topics, and identifying missing articles/space for new articles.

Can we sustain three months of regular meetings and avoid dropping out?

Yes. On average, Fellows missed at least one meeting, but everyone was able to contribute and we had only one Fellow drop out (who was replaced early on in the program) due to scheduling.

Can we retain subject matter experts; will they remain active?

The exit survey indicated that all Fellows want to remain active. We will check in on this again in September 2018.

Is this pilot worth continuing?

Yes. Associations have expressed interest, participants have found merit, there are variables to test, and we see potential to scale and broaden the impact of a program like this.

Adapting the pilot

We are interested in testing multiple variables for this program relating to the structure and approach to individual instances of the program, as well as to our ability to scale the program while remaining effective in teaching experts how to contribute high quality content to Wikimedia projects. In future cohorts, we are interested in experimenting with the following:

Size of cohort, using the meeting size from the pilot as a baseline/minimum
Timing of cohort, based both on Fellows' availability and interest and Wiki Education staff availability with relation to our other programs
Length of cohort, from two to five months
Recruiting from a single partner association's membership vs. recruiting from multiple partners' memberships vs. wider call for participation
Participant-led article selection, as with the pilot vs. thematic cohorts, based on particular topics, topic areas, Wikimedia projects, etc.
Curriculum design, emphasizing different elements of the process such as Wikipedia policy, hands-on article development, etc., changing the order of milestones/tasks, incorporating different approaches to peer review
Mentorship models in connection with past program participants
Face-to-face video meetings vs. one-to-many meeting style, which would be essential for large cohorts
Program Manager differences (Wikipedian vs. non-Wikipedian)
Cross-program participation, making connections between Wikipedia Fellows and the Classroom Program
Approaches to how we work with partners on an organizational level, e.g. collaborative write-ups about the program, conference-related events, responsibilities of Fellows to the association, etc.
Approaches to facilitating the ability of Fellows to claim participation in the program as a form of recognized academic service or providing certificates or other mechanisms of "credit" that may help the Fellows' CVs.
Content Expert roles. The larger sections in particular will greatly benefit from having at least a part-time Content Expert that can give them direct attention. Creating a list of suggested articles to edit can help direct their attention to areas that are in sore need of attention, as well as help resolve potential issues with the larger cohorts, who may be working in groups. Larger numbers of participants will be harder to keep track of and give feedback on article choices, so a list will assist with that.

After this three-month pilot, we believe academic experts are highly interested in learning how to contribute to Wikipedia. With coaching about how Wikipedia works and how it differs from their past experiences, they can contribute high-quality, complex knowledge to Wikipedia. Thus, we believe this pilot was successful in proving we can run a program to train academic scholars in bringing their expertise to Wikipedia.

Closing thoughts

The Wikipedia Fellows program has merit for several reasons. The first is that we have people editing who are both knowledgeable about the topic areas and have access to sources that the average editor may not, which makes them potentially better equipped to interpret the sourcing correctly. The program has potential for growth in all areas of academia. Additionally, this program could branch out into new directions — libraries, museums, cultural institutions, non-profits, government institutions, and beyond.

As participants often teach entry level courses, educators have the potential to have a better grasp of writing for a lay audience, which Wikipedia prefers, instead of overly technical and complicated prose. This means that a Fellow can be someone who is already predisposed to interpreting source material and writing in a format similar to that of Wikipedia. One of the most common views of Wikipedia is that it is edited by people without credentials, access to resources, or topic awareness — this project puts people with all of those things in the editor seat.

Based on survey results and blog posts, the Fellows have an increased likelihood of seeing Wikipedia in a more favorable light, thus making it more likely that they will recommend that people visit Wikipedia and use it in their daily lives, and show people how it can be used responsibly in an educational environment. The Fellows expressed that they may be interested to publish materials about their experiences on Wikipedia and in the Wikipedia Fellows pilot. This program immediately brought together participants in the real world. That lasting co-collaborative effect, traversing virtual to in-person is a tangible impact most programs lack.

Fellows become aware of Wikipedia's policies and guidelines, coming to understand the way Wikipedia, as a tertiary source, depends on existing publishing models. With that awareness, they may be more likely to publish materials about topics that need better coverage on Wikipedia.

This cohort of Fellows demonstrated several points mentioned in the Theory of Change. Non-Wikipedians were able to deliberately contribute to Wikipedia articles in their area of expertise. They all had a positive experience, as indicated by their wish to continue to contribute after the program's end. What has not been proven yet is how this program can scale to increase impact, remain sustainable, and engage a broader set of participants.

We will be offering four new cohorts, starting in June 2018. The four cohorts will be based on the following themes...

Midterms (Large number of Fellows)
Midterms (Small number of Fellows)
General, multi-discipline
Communicating Science

This pilot indicates this program is full of potential. Not only does this program ensure quality in popular, important articles, but it has the ability to inspire widespread participation throughout academia. This bridge from academia to Wikipedia will help to draw more attention to articles that are incomplete, missing key perspectives, erroneous, or nonexistant. We are eager to realize the potential of this program with the aim of bringing new editors into the Wikipedia community, improving content, and producing knowledge readers find reliable, urgent, and essential.