Research:Test External AI Models for Integration into the Wikimedia Ecosystem
This page documents a research project in progress.
Information may be incomplete and change as the project progresses.
Please contact the project lead before formally citing or reusing results from this page.
As part of our contributions to WMF's 2024-2025 Annual Plan, Research and collaborators are working on identifying which AI and ML technologies are ready for WMF to start testing with (at the feature, product, ... levels), among the sea of models that are out there and continue to be made available.
Hypothesis Text
[edit]Q1 Hypothesis
[edit]If we gather use cases from product and feature engineering managers around the use of AI in Wikimedia services for readers and contributors, we can determine if we should test and evaluate existing AI models for integration into product features, and if yes, generate a list of candidate models to test.
Methods and Tasks
[edit]Define and prioritize existing use-cases for AI integration into products
[edit]See T370134
Use-case definition
[edit]- Gather existing documented use cases from Product teams, based on past conversations, and drafted an initial list, grouped by type of task, intended audience, and impact.
- Conduct a set of interviews/conversations with 7 product leaders to speak about their views on product needs for AI. This allowed to gather 3 new use-cases: OCR, image vandalism, talk page translation. Most of the previously gathered cases have been confirmed and refined based on the feedback. We also collected early indication of high- and low- priority use-cases.
- Survey Product Managers for further input. We asked 13 product managers to look at the current list of use-case and indicate the top-priority, the low-priority, and the missing ones. The ranked use-cases mostly match the indications given in the initial conversations with leaders, and the top include: Edit-check related use-cases such as automatically assigning categories to articles (which can be useful beyond edit checks) and policy violations; Structured-tasks and mobile-friendly tasks such as automatic article outlines and worklist generation; Automated image tagging and descriptions are also highly ranked.
Results after this stage are here
Use-case prioritization
[edit]After reviewing responses from the above process, and after the model selection phase is completed, we rank and select use-cases based on the following criteria:
- Priority signaled: has this use-case mentioned during conversations with Product Leadership or as part of the PM survey as a top-priority use-case?
- Ai Strategy Alignment: is the use-case aligned with WMF’s new AI Strategy?
- Model availability: have we identified during the model selection phase existing models developed externally that can be applied to this use-case? Based on criteria of: Effectiveness, Multilinguality, Infrastructure and Openness
- Data Availability: do we have enough labeled data to test and if not, what does it take to put it together e.g. through crowdsourcing / manual evaluation?
- Measurability: can we in practice estimate the effectiveness of existing models on the proposed use-cases based on quantitative indicators?
Define a set of criteria to identify existing models to test, and select candidate models for use-cases
[edit]We review literature on existing AI models to find good matches for each use-case defined above based specific criteria.
Criteria for selecting models to be tested
[edit]- Effectiveness: does the model have a chance to be effective for this task (based on Research and what we know)? Has the model been used for this task or similar tasks before?
- Multilinguality: has the model been designed to work with the languages required by the use case? In general, is the model handling languages other than English by design (intentionally trained/tested on multiple languages)?
- Infrastructure: is it realistic to host the model in our infrastructure (LiftWing)? Have similar models been hosted before? Is there a plan B for hosting the model outside LiftWing?
- Openness: is the model open-sourced (in some form, e.g. available through HuggingFace) and could we potentially use this into production? Is there public documentation about the model's architecture and training data?
Define a protocol for external model evaluation
[edit]Test models on WMF infrastructure
[edit]Timeline
[edit][Q1 24-25] Tasks 1 and 2 [Q2 24-25] Tasks 3 and 4
Results
[edit]TODO: Add initial results for each task when ready
Provisional List of Defined Product-AI Use-Cases
[edit]
Macro-Category | Use-Case | Audience | What could this help our movement learn/achieve? | Impact I | Impact II |
Structured/Edit Tasks | Detect grammar / typos / mispellings
Detect errors in text and propose ways to correct them |
Contributors | Support Newcomers | Automate Patrolling | |
Structured/Edit Tasks | Detect valid categories for Wikipedia articles
Given an article, recommend the top X categories that the articles could be tagged with |
Contributors | Support Newcomers | Address Knowledge Gaps | |
Structured/Edit Tasks | Detect Policy violations: e,g, WP:NPOV; WP:NOR | Contributors | "Moderators can see 'non-neutral language' in an article highlighted automatically; edit checks for newcomers;
Editors could accept suggested corrections to edits based on policies and norms" |
Improve Content Integrity | Support Newcomers |
T&S/Moderator Tools | Talk page tone detection
Detect negative sentiments and harassment in talk page conversations |
Readers and Contributors | Functionaries can see talk page tone and manner issues in contributor stats; Editors receive constructive feedback on their tone/manner in talk page discussions. A Reader can see if an article has an unusual debate profile on its talk pages. | Automate Workflows | Improve Content Integrity |
Structured/Edit Tasks | Source verification
Verify that the text in an article is supported by the source specificed in its inline citation |
Contributors | Use LLMs to find new or better sources for claims on Wikipedia | Improve Content Integrity | Automate Workflows |
T&S/Moderator tools | Talk page summaries
Generate summaries of talk pages highlighting the main points of discussion and the final consensus |
Contributors | New editors can generate a summary of talk page dialog before joining the discussion. Moderators can generate a summary of a discussion | Automate Patrolling | Support Newcomers |
Structured/Edit Tasks | Automatic article outlines
Generate a structure of sections and subsections for a new article |
Contributors | Editors can automatically generate outlines for articles they want to write. | Automate Workflows | |
Reader Tools | Article summaries
Summarize the content of an article in a few sentences |
Readers | Readers can browse summaries of articles related to the article they’re on; The platform provides an article summary API for first or 3rd party use | Improve Content Discovery | Retain New Readers |
Structured/Edit Tasks | Wikipedia text generation from sources
Generate sentences or paragraphs for a wikipedia article based on existing reliable sources |
Contributors | Achieve new content. Inspire new/old editors that prefer draft suggestions instead of editing from scratch. Use GenAI to generate suggestions to Wikipedia articles based on given source content. | Automate Workflows | Support Newcomers |
Reader Tools | Text to speech
Audio format for the encyclopedic content |
Readers | Readers can access articles or content in audio format. This could be just pronunciation or full article audio. | Accessibility | |
Structured/Edit Tasks | Automated image metadata tagging
Tag images on Wikipedia and Commons with relevant Wikidata items |
Readers and Contributors | Commons users can search using intuitive key words and find images that have been tagged in arcane ways. Editors can browse and easily add images related to the topic they’re editing. + semi-autoamted image decsription generation for structured tasks | Improve Search | Automate Workflows |
Reader Tools | Automated Q/A generation from Wikipedia articles
Generate questions and answers that can help navigate the content of a Wikipedia article |
Readers | Use AI to autogenerate quizzes on articles; Readers can see all the questions the article they're reading has answers to | Improve Content Discovery | Retain New Readers |
WikiSource | Optical Character Recognition system
Digitizing documents require an OCR system that works for all the languages we support. |
Contributors | Knowledge processing tools like OCR helps volunteers to digitize documents for projects like Wikisource. These tools are not easy to find for low resource languages. Assisting them with right tools helps them to contribute more and save time | Automate Workflows | Address Knowledge Gaps |
T&S/Moderator Tools | Image Vandalism detection
Detect images that are maliciously added to articles |
Contributors | Patrollers can visualize images that appear to be out-of-context or misplaced in wikipedia articles | Automate Workflows | Improve Content Integrity |
T&S/Moderator Tools | Automated worklist generation
Generate lists of articles that are relevant to an editor and that need improvement |
Contributors | Editors have an automatically generated list of articles they've contributed to that need additional work | Automate Workflows | |
T&S/Moderator Tools | Edit Summaries
Given an edit, generate a meaningful summary of what happened in the edit (and why) |
Contributors | Editors can automatically get an Edit Summary generated from their edits. Patrollers can see a summary of a user's recent edits | Automate Workflows | Automate Patrolling |
Reader Tools | Automated reading list generation
Recommend relevant articles to read based on current reader interest |
Readers | A Reader can access a list of "the next 5 things you might want to read, based on what you've already read this session". | Improve Content Discovery | Retain New Readers |
Structured/Edit Tasks | Suggest Templates for a given editor
Retrieve relevant templates for a Wikipedia articles |
Contributors | Help us learn whether we can effectively and reliably suggest templates for users who want to insest a template on a page.
Measured as: when a user views "suggested templates" they insert a suggested template 20% of the time |
Support Newcomers | Automate Workflows |
Structured/Edit Tasks | Policy discovery during editing
Retrieve policies that are relevant to the current edit activities |
Contributors | Editors can easily find documentation on policies and norms; Relevant policies are automatically surfaced to editors in the edit workflow;
New editors can ask questions to get help with editing |
Support Newcomers | Automate Workflows |
Wishlist | Talk page translations
See translation of messages if that is in a different language |
Readers and Contributors | Discussion venues supports multilinguality. Can be helpful for ambassadors posting messages in various wikis, Meta wiki discussions, Community wishlist discussions, strategy discussions and so on. | Community inclusivity | Multilingual discussions and communication |
Reader Tools | Improve Natural Language Search
Improve search so that people can ask questions in natural language |
Readers | Readers can ask questions in natural language to retrieve answers and articles ; can pull out factoids from deep in Articles; can navigate directly to relevant anchor links in articles | Improve Search | |
Structured/Edit Tasks | De-orphaning articles (suggest articles related to orphans) | Contributors | Improve Content Discovery | Support Newcomers |