Machine learning models/Production/Wikidata item quality
Model card | |
---|---|
This page is an on-wiki machine learning model card. | |
Model Information Hub | |
Model creator(s) | Aaron Halfaker (User:EpochFail) and Amir Sarabadani |
Model owner(s) | WMF Machine Learning Team (ml@wikimediafoundation.org) |
Model interface | Ores homepage |
Code | ORES Github, ORES training data, and ORES model binaries |
Uses PII | No |
In production? | Yes |
Which projects? | Wikidata |
This model uses data about a revision to predict the likelihood that the item is of a certain quality level. | |
Motivation
[edit]This model card describes a model for predicting the quality of Wikidata items. It uses structural features extracted from the item to label Wikidata items with a probability score for each item quality class.
Wikidata items range in quality from rich, well-illustrated, fully-referenced items that fully cover their topic and are easy to read to stubs that define the topic of the item but do not offer much more information. It is very useful to be able to reliably distinguish between these extremes and the various stages of quality along this spectrum. Wikidata editors have developed rich rubrics for how to evaluate the quality of Wikidata items and are constantly assessing item quality to assist in coordinating work on the wikis. Editors use these quality scores to evaluate and prioritize their work. Researchers use these quality scores to understand content dynamics. Developers use these quality scores as filters when building recommender systems or other tools.
Wikidata is always changing, which makes it time-consuming (and largely impossible) for editors to keep these quality assessments complete and up-to-date. An automatic quality model can help fill these gaps by evaluating the quality for items that are unassessed or have changed substantially since they were last assessed. In doing so, it can provide researchers and tool developers with more consistent data and even potentially help editors identify items that would benefit from a human assessment.
Users and uses
[edit]- high-level analyses of item quality trends
- filtering / ranking items in tools – e.g. only show low-quality items in a recommender system
- identifying potential ways to improve items – e.g. using the lowest-value feature from the model as a recommendation
- projects outside of Wikidata
- namespaces outside of 0, disambiguation pages, and redirects
This model is a part of ORES, and generally accessible via API. It is used for high-level analysis of Wikidata, platform research, and other on-wiki tasks.
Example API call:https://ores.wikimedia.org/v3/scores/wikidatawiki/1907686315/itemquality
Ethical considerations, caveats, and recommendations
[edit]- The source data for this model is several years old — data drift may skew current outputs relative to the training data.
- The model does not currently take into account the quality of the specific writing, so a detailed item with many fake statements may register as high quality. It does take into account structure though, so a long item would be penalized if it was poorly referenced.
- Different wikis have different labeling schemes — do not use this model in conjunction with other models to conduct an interwiki analysis.
Model
[edit]Performance
[edit]Test data confusion matrix:
Label | n | ~A | ~B | ~C | ~D | ~E |
---|---|---|---|---|---|---|
A | 895 | 761 | 80 | 53 | 1 | 0 |
B | 786 | 92 | 443 | 231 | 20 | 0 |
C | 2295 | 37 | 132 | 1993 | 128 | 5 |
D | 1992 | 0 | 8 | 117 | 1724 | 143 |
E | 3002 | 0 | 0 | 4 | 82 | 2916 |
Test data sample rates:
A | B | C | D | E | |
---|---|---|---|---|---|
sample | 0.1 | 0.088 | 0.256 | 0.222 | 0.335 |
Test data performance:
Statistic | A | B | C | D | E |
---|---|---|---|---|---|
match_rate | 0.099 | 0.074 | 0.267 | 0.218 | 0.342 |
filter_rate | 0.901 | 0.926 | 0.733 | 0.782 | 0.658 |
recall | 0.85 | 0.564 | 0.868 | 0.865 | 0.971 |
precision | 0.855 | 0.668 | 0.831 | 0.882 | 0.952 |
f1 | 0.853 | 0.611 | 0.849 | 0.874 | 0.961 |
accuracy | 0.971 | 0.937 | 0.921 | 0.944 | 0.974 |
fpr | 0.016 | 0.027 | 0.061 | 0.033 | 0.025 |
roc_auc | 0.981 | 0.951 | 0.967 | 0.972 | 0.989 |
pr_auc | 0.901 | 0.706 | 0.911 | 0.916 | 0.987 |
Implementation
[edit]{
"type": "GradientBoosting",
"params": {
"scale": true,
"center": true,
"labels": [
"A",
"B",
"C",
"D",
"E"
],
"multilabel": false,
"population_rates": null,
"ccp_alpha": 0.0,
"criterion": "friedman_mse",
"init": null,
"learning_rate": 0.01,
"loss": "deviance",
"max_depth": 5,
"max_features": "log2",
"max_leaf_nodes": null,
"min_impurity_decrease": 0.0,
"min_impurity_split": null,
"min_samples_leaf": 1,
"min_samples_split": 2,
"min_weight_fraction_leaf": 0.0,
"n_estimators": 500,
"n_iter_no_change": null,
"presort": "deprecated",
"random_state": null,
"subsample": 1.0,
"tol": 0.0001,
"validation_fraction": 0.1,
"verbose": 0,
"warm_start": false,
"label_weights": null
}
}
{
"title": "Scikit learn-based classifier score with probability",
"type": "object",
"properties": {
"prediction": {
"description": "The most likely label predicted by the estimator",
"type": "string"
},
"probability": {
"description": "A mapping of probabilities onto each of the potential output labels",
"type": "object",
"properties": {
"A": {
"type": "number"
},
"B": {
"type": "number"
},
"C": {
"type": "number"
},
"D": {
"type": "number"
},
"E": {
"type": "number"
}
}
}
}
}
https://ores.wikimedia.org/v3/scores/wikidatawiki/1907686315/itemquality
Output:
{
"wikidatawiki": {
"models": {
"itemquality": {
"version": "0.5.0"
}
},
"scores": {
"1907686315": {
"itemquality": {
"score": {
"prediction": "A",
"probability": {
"A": 0.9572882580680742,
"B": 0.026314884273835548,
"C": 0.012222062518243197,
"D": 0.0026337155710863883,
"E": 0.0015410795687606218
}
}
}
}
}
}
}
Data
[edit]Licenses
[edit]- Code: MIT license
- Model: MIT license
Citation
[edit]Cite this model card as:
@misc{
Triedman_Bazira_2023_Wikidata_item_quality,
title={ Wikidata item quality model card },
author={ Triedman, Harold and Bazira, Kevin },
year={ 2023 },
url={ https://meta.wikimedia.org/wiki/Machine_learning_models/Production/Wikidata_item_quality }
}