Jump to content

Research:Wikipedia Administrator Recruitment, Retention, and Attrition

From Meta, a Wikimedia project coordination wiki
Tracked in Phabricator:
Task T368791

This page documents a research project in progress.
Information may be incomplete and change as the project progresses.
Please contact the project lead before formally citing or reusing results from this page.


As part of our contributions to the Wikimedia Foundation's 2024-2025 Annual Plan, the research team and collaborators are working to study the recruitment, retention, and attrition patterns among long-tenure community members in official moderation and administration roles.

Updates

[edit]
  • 4 Sep: We've published our § Wiki selection criteria, along with a § candidate shortlist summary table.
  • 20 August: We have updated this page to include our approach to privacy for this study. We have also added some preliminary work comparing different Wikipedias' administrator candidacy requirements.
  • 2 August: We've officially kicked off the project. Thanks for checking this page out. If this topic relates to you and your work, please see the opportunities to become involved at our talk page. We are currently working on the annotated bibliography, definition of key terms, and some other planning activities.

Hypothesis Text

[edit]

If we study the recruitment, retention, and attrition patterns among long-tenure community members in official moderation and administration roles, and understand the factors affecting these phenomena (the ‘why’ behind the trends), we will better understand the extent, nature, and variability of the phenomenon across projects. This will in turn enable us to identify opportunities for better interventions and support aimed at producing a robust multi-generational framework for editors.

Background & Goals

[edit]

Since 2008 there have been reduced editing rates and only recently have we turned our attention to administrators as a distinct body of editors. English Wikipedia in particular has been concerned about administrator attrition for a long time, having published multiple Signpost articles about it. Simultaneously we know that administrators are vital to the healthy operation of any wiki. While we are learning more about what administrators do, we still face a substantial knowledge gap when it comes to how administrators join, and why they stay around or leave these administrative activities. This research may also have crossover in looking at general retention rates for long-tenure editors, as well as looking at changes in activity patterns over the length of an editor’s tenure.

Thus, the goal of this project is to investigate the recruitment, retention, and attrition patterns among long-tenure community members in official Administration roles, and understand the factors affecting these phenomena. By doing so, we will better understand the extent, nature, and variability of the phenomenon across projects. This will in turn enable us to identify opportunities for better interventions and support aimed at producing a robust multi-generational framework for editors.

We first want to understand the current state of affairs around administrator arrivals and departures. This includes identifying metrics we have available to serve as either direct or indirect measures of retention, attrition, and relevant activities, as well as understanding what we might consider as baselines for different projects. These trends are expected to vary across language versions of Wikipedia, but this is unconfirmed. Secondly, we want to understand the reasons behind the trends we observe. For example, reasons for attrition, what motivates administrators to stay around, and what motivates editors to pursue administrator status, among other related questions.

Approach

[edit]

Preliminary work

[edit]

We will begin with a very focused (targeted and contained) literature review, resulting in annotated bibliography, to ground the work in the most relevant literature. This work will support alignment around a number of key terms that will be central to the topic of the project. For some of these key terms and concepts, a commonly agreed upon definition may exist, and for others we may need to motivate one for the purposes of keeping work moving forward. This initial desk research will be used to identify potentially productive lines of inquiry for subsequent quantitative and qualitative inquiry.

Quantitative metric analysis

[edit]

For the quantitative work we first need to identify and justify the selection of relevant metrics; some may be direct and others may serve as proxies. We have generated an initial list of sample questions to guide this inquiry (see 'guiding questions').

Qualitative data collection and analysis

[edit]

We are planning two components of our qualitative work, which will investigate reasons of attrition, retention, and challenges and opportunities around recruitment. First, a survey utilizing contrastive sampling will help cast a wide net for the purposes of data collection. Secondly, we will conduct in-depth interviews that may contain some contextual inquiry components as relevant. This will allow us to more deeply understand the trends and patterns observed in survey responses. See the 'guiding questions' section below for more detail on the topics we aim to investigate.

Privacy policy

[edit]

See also the Wikimedia Foundation data publication guidelines and data retention guidelines.

In keeping with our other human subjects research, particularly our qualitative work on this topic, we want to ensure the privacy of all participants in the study. Survey data collection and retention will be governed by their own specific privacy policies, detailing what data is collected, and the ways in which that data will be used.

All interviews will also be accompanied by a privacy statement. Interview participants will be anonymized by default, and direct quotations or close paraphrases from interviews will be sent to the participants in advance for their review. Interview participants may choose to be de-anonymized, but we will not do this without the explicit written consent of the participant.

Participants who have been contacted for this study may, at any point, choose to opt-out of the study. We will honor requests to delete personal data to the best of our ability. Note that, if the collected data has already been de-identified or aggregated, we may not be able to delete personal data.

Wiki selection

[edit]

For this project, we have opted to take a comparative approach, selecting a target group of language versions (wikis) for this project. Our primary criteria for selection is wiki activity and number of monthly active administrators. Based on prior research, very small wikis have significantly different moderation pressures, which we infer would also affect administrator push/pull factors. Since the genesis of this project came from issues expressed by members of the largest wikis, we believe it is appropriate to also focus on higher-activity wikis for this project.

Our shortlist cutoff point is 1000 weekly non-bot edits, and more than 20 monthly active admins, as captured by the Wiki Comparison data. We will also restrict our scope to Wikipedias only. The resulting list of 21 Wikipedia editions is presented below, along with additional characteristics of relevance for contrastive purposes. From this list of 21, we examined a number of factors to arrive at our final list of five wikis for survey sampling and interview recruitment. These are:

  • English Wikipedia
  • French Wikipedia
  • Spanish Wikipedia
  • Russian Wikipedia
  • Indonesian Wikipedia

The factors we examined included administrator candidacy requirements, RFA success rates, monthly pageviews, monthly active administrators, median administrator actions, total sysops over time (2018 to present), ratio of sysops to monthly active administrators, analyses from the risk observatory, as well as geographic distribution and some basic consideration of external factors and influences on projects, such as government censorship of Wikipedia.

To highlight a few of our considerations:

  • We have deliberate inclusion of wikis with both short (≤6 month)(English, Russian, Indonesian) and longer (Spanish, French) account age requirements for administrators.
  • This list includes wikis for which there’s a range of edit requirements for administrator candidacy. From the most extreme case of English (10k) to the moderate requirements of French (3k) to either low (100 for Russian, 500 for Indonesian) or none stated (Spanish).
  • We also include a few wikis for which there are other explicit candidacy requirements, such as English (member of the extended-confirmed user group).
  • We have a relatively wide geographic distribution of the administrators who participate in these wikis, and examples of languages that have both wide geographic distribution (e.g., across many countries: English, Spanish) and more narrow distribution (spoken more primarily in one country or region).
  • Through the inclusion of Russian, we have at least one example of a wiki for which there are meaningful external factors that may affect administrators, such as the role of censorship of Wikipedia (currently partial).
  • Examining MAA/members of sysop group ratios, the two longest are English and Chinese. The two highest are Portuguese and Russian. A high ratio suggests that there are users conducting “admin actions” that are not part of the sysop group. Therefore, this short list includes wikis that fall into both the high and low ratio groups through the inclusion of English and Russian.
  • Looking at RFA success rates for 2013-2023, the two highest are French and Dutch, whereas the two lowest are Chinese and Spanish. As such, we’ve ensured we include one from each of these groups, French and Spanish.
  • The selected wikis exhibit a range of median actions per admin, from English on the low end with a median of <20, and Spanish on the high end with >300.

Note that the metrics "monthly active admins" and "median actions per admin" currently may include bot activity, in addition to non-bot account activity. This means it may be counting actions taken by bots that can block users, or move or protect pages.

Limitations

[edit]

Due to our use of a survey, we require larger populations in order to come up with representative results. Smaller wikis tend to have very few administrators (the median number of monthly active admins across all projects is 2), and surveys with such few participants are not capable of producing generalizable results. In addition to considerations of generalizability, such small numbers of admins also means that this data would be extremely granular, which increases the risks associated with publishing it. As such, we are aware that the shortlist of candidates and our final selected wikis for the survey and interviews are skewed towards languages with European origins (though they are spoken across many countries and regions) and languages with large speaker populations.

Guiding questions

[edit]

The following subsections contain what we consider an initial starting point for the questions that will guide inquiry and development of research materials and instruments. These are offered as a sample set of questions that may evolve as we continue to learn about the topic of this investigation.

Quantitative inquiry

[edit]

Administrator activity across projects:

  • What is the general state of admin presence (including both inflow and outflow) on the different language versions of Wikipedia (henceforth 'wikis'/'projects' for shorthand)?
    • How have these trends changed, or not, in the past x months?
    • For which, if any, wikis do we observe significantly stable or increasing numbers of administrators? For which, if any, do we observe significantly decreasing numbers?
    • Which projects, if any, appear to have long-term stagnant recruitment? Do we have any indicators that this is undesirable?
  • How many admins are arriving and departing on a regular basis? What is the average tenure distribution?
  • Are individuals active as admins on multiple projects, and how does activity on multiple projects appear in the trends? (e.g., is there a meaningfully large number of individuals who may appear as having left one project, when in fact they’ve switched their project focus in terms of admin activities?

Foundational questions may include:

  • What is the best metric, or set of metrics, to use for determining departure, dormancy, or disengagement?
  • What is the best metric (or set of metrics as a sort of proxy) for capturing the activity of admins?

Admin productivity and other activity (aspirational topics for inclusion):

  • What’s the relationship between key productive bots or other tools on projects and the administrators who operate them?
  • What else relevant can we measure regarding the impact of admin departures? For example, impacts on the general cue of admin work that resultantly falls upon remaining admins.
  • Is there any way to identify metrics that could predict (either in terms of correlation or causation) likelihood of departure?
  • What else relevant can we measure about the activity of these individuals, particularly in terms of impact on projects?

Qualitative inquiry

[edit]

Attrition (core question) - Why do (some) people in community administration roles leave the projects? Conversely, why do others stay?

  • Who is impacted, and how, when administrators leave?
  • How do people leave? What are alternatives that people do aside from “leaving”? (e.g., changing focus, role, or primary activities but remaining engaged on the project(s)?)
  • What happens to bots and other volunteer-maintained tools associated with individual administrators when they leave projects?

Pull factors

  • What activities do admins engage in when leaving admin work on Wikipedia?
  • What motivates them to direct their activity elsewhere?
  • What are alternative pathways that individuals choose over administratorship on Wikimedia projects? (e.g., economic market factors, amongst others)

Push factors

  • What are factors that drive admins away from staying engaged in the projects?
  • What are on-project phenomena that discourage and/or disengage administrators?

Recruitment and retention. What motivates people to become administrators in the first place?

  • What are the conditions necessary to support administrator function? (device access, availability of free time, expertise, availability of internet, etc.)
  • (Social importance of adminship) What values are important, what motivations keep people within the role, what factors influence retention?
  • What prevents people from being as productive as an admin as they would like to be? (e.g., technical, social, and other challenges)

Timeline

[edit]

We've organized the work for this project into four basic phases, presented below with some key activities and tentative date ranges.

  • Phase 0: Scope, organize, guiding work (22 July - 15 August)
    • Finalize research brief, including project roles, phases, and timeline
    • Brief annotated bibliography and definition of key terms
  • Phase 1: Preparation (8 August - 27 September)
    • Quantitative work: identification of metrics
    • Qualitative data collection pre-work: distribution plan, recruitment plans, interview recruitment, acquisition of appropriate privacy statements and participant release forms
    • Survey development: initial and final sampling plan, initial and final draft of survey instrument, localization preparation
    • Interview development: discussion guide development and localization preparation
  • Phase 2: Data collection (27 September - 11 November)
    • Quantitative work: metric analysis
    • Survey work: survey goes live, ongoing recruitment of respondents and analysis of incoming data
    • Interview work: conduct interviews, ongoing recruitment of participants
  • Phase 3: Analysis (1 November - 6 December)
    • Analysis and synthesis of data, triangulate multiple data sources, prepare reporting and recommendations (may include both policy and product recommendations, amongst others based on results)
  • Phase 4: Reporting and socialization of results (6 - 20 December)
    • Finalize report, presentation and discussion with stakeholders

Results

[edit]

TBD

Resources

[edit]

Annotated bibliography

[edit]

As part of this project, we conducted a brief review of literature related to the topic of Wikipedia administration recruitment, retention, and attrition. These sources included independent research, WMF-sponsored research and metrics. For each entry, we included a very brief summary of the source, along with some accompanying commentary as related to the current project. The goal of this work was to both ground the investigating group in a survey of past work, and it may also serve as a resource for others studying similar topics. This resource can be accessed here.

Candidate shortlist summary table

[edit]
Short-list of candidate wikis (final candidates in bold)
Wiki Admin user group size Monthly active admins Median actions per admin (2023) RFA success rates (2013-2023) Censorship of Wikipedia Admin candidacy guidelines
Account age (months) Main namespace edits Other factors/notes
English (en) 911 398 19 50% 6 10000 Candidate needs email; part of extended-confirmed user group
Spanish (es) 60 44 314 28% 12 None stated Candidate must have email
German (de) 186 125 178 57% 24 1000
Japanese (ja) 41 30 149 4 500
French (fr) 155 88 102 71% 12 3000
Russian (ru) 74 87 169 37% Current partial block; previously blocked content 3 100
Chinese (zh) 67 27 13 24% China currently blocking all content; COI editing by state; prosecuted editors 12 3000 Membership in patroller or rollbacker user groups; not blocked in last year
Italian (it) 120 109 302 None stated None stated Member of autopatrolled group
Portuguese (pt) 52 70 58 12 300
Persian (fa) 35 31 36% Current partial block in Iran; previously blocked content 6 1000
Polish (pl) 100 72 128 51% 3 500 Candidate must have email
Indonesian (id) 44 30 127 60% 3 500 Has email, has user page over 500bytes, no blocks in past 6mo
Dutch (nl) 34 31 111 68% 6 1000 Candidate must have email
Ukrainian (uk) 47 38 150 50% 6 2000 At least 200 additional edits in service namespace (Help, Template, Module, Wikipedia)
Hebrew (he) 29 30 970 9 2000
Czech (cs) 31 25 183 6 250 Candidate must have email
Swedish (sv) 66 51 128 None stated None stated
Finnish (fi) 33 20 34 “A few” months None stated
Norwegian Bokmål (no) 44 32 106 4 1000 Has email and user page
Catalan (ca) 29 20 174 9 1000 Reqs were explicitly removed, but remain “as guideline”
"—" denotes metrics for which data was not obtained.

References

[edit]