User:Trokhymovych/drafts/Multilingual revert risk model card

Model card
Model card
This page is an on-wiki machine learning model card.
	A model card is a document about a machine learning model that seeks to answer basic questions about the model.
Model Information Hub
Model creator(s)	Mykola Trokhymovych, Muniza Aslam, Ai-Jou Chou, and Diego Saez-Trumper
Model owner(s)	Diego Saez-Trumper
Code	training and inference
Uses PII	No
In production?	No
	This model uses revision content and metadata to predict the risk of being reverted.
	v; t; e;

This model card page currently has a draft status. It is a piece of model documentation that is in the process of being written. Once the model card is completed, this template should be removed.

How can we help editors to identify revisions that need to be “patrolled”?

The goal of this model is to detect revisions that might be reverted independently if they were made in good faith or with the intention of creating damage. Wikipedia has a group of dedicated volunteer editors, known as patrollers, who work to ensure the accuracy and integrity of the information on the site. These patrollers review and edit articles, monitor for vandalism, and enforce community guidelines. However, their work is not easy, as they have to keep up with the fast pace and language diversity of Wikipedia, where on average, around 16 pages are edited per second in 250+ languages ^[1]. The aim of this model is to help patrollers quickly identify potential problems, prioritize the work, and revert damaging edits when needed.

This model is deployed on LiftWing. Right now, it is available for internal usage. This model can be used to detect revisions that might need to be reverted.

Motivation

Knowledge Integrity is one of the strategic programs of Wikimedia Research with the goal of identifying and addressing threats to content on Wikipedia, increasing the capabilities of patrollers, and providing mechanisms for assessing the reliability of sources^[2]. The main goal of the project is to create a new generation of patrolling models, improving accuracy, fairness, and maintainability compared to previous state-of-the-art ORES^[3].

The current model is able to work on almost any Wikipedia article in any of the 47 chosen languages: ['ka', 'lv', 'ta', 'ur', 'eo', 'lt', 'sl', 'hy', 'hr', 'sk', 'eu', 'et', 'ms', 'az', 'da', 'bg', 'sr', 'ro', 'el', 'th', 'bn', 'no', 'hi', 'ca', 'hu', 'ko', 'fi', 'vi', 'uz', 'sv', 'cs', 'he', 'id', 'tr', 'uk', 'nl', 'pl', 'ar', 'fa', 'it', 'zh', 'ru', 'es', 'ja', 'de', 'fr', 'en']

Users and uses

Use this model for

Define the revert risk of Wikipedia article revision

Don't use this model for

making predictions on language editions of Wikipedia that are not in the listed 47 languages or other Wiki projects (Wiktionary, Wikinews, Wikidata, etc.)
making predictions on the revisions that are created by bots
making predictions on the revisions that create a new article
Using a model as a stand-alone tool (without a human patroller in the loop)

Current uses

Ethical considerations, caveats, and recommendations

Model

The presented model is based on content features extracted using fine-tuned language model mBERT^[4], mwedittypes^[5] based features, along with user and page metadata. It is built in a paradigm of having one generalized model for all covered languages, which is currently the 47 most frequently edited languages in Wikipedia. The system includes the following steps:

1. Text features preparation:

Process wikitext and compare with parent revision
Extract mwedittypes-based features
Extract texts that were added, removed, and changed

2. Masked Language Models (MLMs) features extraction:

Pass each of the texts that were added, removed, or changed to the pre-trained classification model
Apply mean and max pooling to the list of scores of each signal to extract the final unified feature set

3. Final Classification

Combine all extracted features with user and revision metadata
Pass the features to the final classifier

Performance

Implementation

The presented model is a multistage solution that includes the fine-tuned masked language model (mBERT) for feature extraction and the final classifier (CatBoost) for getting the probability of being reverted based on the extracted features.

Model architecture

mBERT models tunning (four models for the title, changes, inserts, and removes):

Learning rate: 2e-5
Weight Decay: 0.01
Epochs: 5
Maximum input length: 512
Number of encoder attention layers: 12
Number of decoder attention layers: 12
Number of attention heads: 12
Length of encoder embedding: 768

CatBoost:

Iterations: 5000
Learning Rate: 0.01
Loss: Logloss

Output schema

{
  lang: <language code string>,
  rev_id: <revision_id string>,
  score: {
     prediction: <boolean decision result>
     probability: {
        true: <probability of being reverted>,
        false: <probability of being NOT reverted>
  }
}

Example input and output

Input

GET ......

Output

{
  lang: "ru",
  rev_id: 123855516,
  score: {
     prediction: true
     probability: {
        true: 0.9392203688621521,
        false: 0.0607796311378479
  }
}

Data

The model was trained on a dataset collected using the two tables from the Wikimedia Data Lake. We used the MediaWiki History table, and the Wikitext History one. Snapshot dated 2022-07 was used with the observation period from 2022-01-01 to 2022-07-01 (6 months) for training and the following week for testing. We also filtered out revisions related to edit wars and revisions created by bots.

Data Pipeline

The data was collected using Wikimedia Data Lake and Wikimedia Analytics cluster.

For each language, we collected revisions data. Then we merged the wikitext data and extracted the required features from the content using udf functions. Data collection pipeline for one language can be found in data collection script

Training data

Data period: 6 months
Number of revisions: 8,586,362
IP users edits rate: 0.17
Revert rate: 0.08
Random sample of up to 300,000 revisions per language

Test data

Data period: 1 week
Number of revisions: 1,079,265
IP users edits rate: 0.19
Revert rate: 0.07

Licenses

Code: Apache 2.0 License
Model: Apache 2.0 License

Citation

Cite this model as: ... to be added soon.

References

↑ https://stats.wikimedia.org/
↑ Zia, Leila and Johnson, Isaac and Mansurov, Bahodir and Morgan, Jonathan and Redi, Miriam and Saez-Trumper, Diego and Taraborelli, Dario. 2019. Knowledge Integrity. https://doi.org/10.6084/m9.figshare.7704626
↑ https://www.mediawiki.org/wiki/ORES
↑ https://huggingface.co/bert-base-multilingual-cased
↑ https://github.com/geohci/edit-types

[1] ttps://stats.wikimedia.org/

[zia2019-2] Zia, Leila and Johnson, Isaac and Mansurov, Bahodir and Morgan, Jonathan and Redi, Miriam and Saez-Trumper, Diego and Taraborelli, Dario. 2019. Knowledge Integrity. https://doi.org/10.6084/m9.figshare.7704626

[3] ttps://www.mediawiki.org/wiki/ORES

[4] ttps://huggingface.co/bert-base-multilingual-cased

[5] ttps://github.com/geohci/edit-types

[1]

[2]

[3]

[4]

[5]