Jump to content

Machine learning models/Production/Armenian Wikipedia article topic

From Meta, a Wikimedia project coordination wiki


Model card
This page is an on-wiki machine learning model card.
A diagram of a neural network
A model card is a document about a machine learning model that seeks to answer basic questions about the model.
Model Information Hub
Model creator(s)Aaron Halfaker (User:EpochFail) and Amir Sarabadani
Model owner(s)WMF Machine Learning Team (ml@wikimediafoundation.org)
Model interfaceOres homepage
Codedrafttopic Github, ORES training data, and ORES model binaries
Uses PIINo
In production?Yes
Which projects?Armenian Wikipedia
This model uses article text to predict the likelihood that the article belongs to a set of topics.


Motivation

[edit]

How can we predict what general topic an article is in? Answering this question is useful for various analyses of Wikipedia dynamics. However, it is difficult to group a very diverse range of Wikipedia articles into coherent, consistent topics manually.

This model, part of the ORES suite of models, analyzes an article to predict its likelihood of belonging to a set of topics. Similar models (though not necessarily with the same performance level or topics, are deployed across about a dozen other projects. There is also a language agnostic article topic model.

This model may be useful for high-level analyses of Wikipedia dynamics (pageviews, article quality, edit trends) and filtering articles.

Users and uses

[edit]
Use this model for
  • high-level analyses of Wikipedia dynamics such as pageview, article quality, or edit trends — e.g. How are pageview dynamics different between the physics and biology categories?
  • filtering to relevant articles — e.g. filter articles only to those in the music category.
Don't use this model for
  • definitively establishing what topic an article pertains to
  • automated editing of articles or topics without a human in the loop
Current uses

This model is a part of ORES, and generally accessible via API. It is used for high-level analysis of Wikipedia, platform research, and other on-wiki tasks.

Example API call:
https://ores.wikimedia.org/v3/scores/hywiki/8658862/articletopic

Ethical considerations, caveats, and recommendations

[edit]
  • This model was trained on data that is now several years old (from mid-2020). Underlying data drift may skew model outputs.
  • This model uses word2vec as a training feature. Word2vec, like other natural language embeddings, encodes the linguistic biases of underlying datasets — along the lines of gender, race, ethnicity, religion etc. Since Wikipedia has known biases in its text, this model may encode and at times reproduce those biases.
  • This model has highly variable performance across different topics — consult the test statistics below to get a sense of inter-topic performance.

Model

[edit]

Performance

[edit]

Test data confusion matrix:

Test data confusion matrix
Label n True positive False positive False negative True Negative
Culture.Biography.Biography* 15686 14429 1257 647 39218
Culture.Biography.Women 4518 3350 1168 476 50557
Culture.Food and drink 1662 1335 327 105 53784
Culture.Internet culture 2507 2098 409 144 52900
Culture.Linguistics 1693 1244 449 66 53792
Culture.Literature 5298 4042 1256 486 49767
Culture.Media.Books 1758 1483 275 100 53693
Culture.Media.Entertainment 2153 1049 1104 175 53223
Culture.Media.Films 3018 2658 360 122 52411
Culture.Media.Media* 13233 11667 1566 1138 41180
Culture.Media.Music 3428 2921 507 239 51884
Culture.Media.Radio 233 170 63 28 55290
Culture.Media.Software 2201 2065 136 162 53188
Culture.Media.Television 2078 1685 393 102 53371
Culture.Media.Video games 766 728 38 16 54769
Culture.Performing arts 1734 1203 531 121 53696
Culture.Philosophy and religion 3901 2179 1722 366 51284
Culture.Sports 3288 2775 513 95 52168
Culture.Visual arts.Architecture 2452 1945 507 239 52860
Culture.Visual arts.Comics and Anime 1071 909 162 44 54436
Culture.Visual arts.Fashion 820 610 210 45 54686
Culture.Visual arts.Visual arts* 5439 4133 1306 482 49630
Geography.Geographical 4862 3583 1279 608 50081
Geography.Regions.Africa.Africa* 3298 2034 1264 186 52067
Geography.Regions.Africa.Central Africa 401 261 140 32 55118
Geography.Regions.Africa.Eastern Africa 256 188 68 25 55270
Geography.Regions.Africa.Northern Africa 1335 896 439 93 54123
Geography.Regions.Africa.Southern Africa 605 466 139 14 54932
Geography.Regions.Africa.Western Africa 140 99 41 39 55372
Geography.Regions.Americas.Central America 1256 585 671 51 54244
Geography.Regions.Americas.North America 5910 4274 1636 778 48863
Geography.Regions.Americas.South America 1669 1311 358 62 53820
Geography.Regions.Asia.Asia* 12149 9899 2250 1078 42324
Geography.Regions.Asia.Central Asia 1088 718 370 68 54395
Geography.Regions.Asia.East Asia 2652 2201 451 100 52799
Geography.Regions.Asia.North Asia 2562 1910 652 333 52656
Geography.Regions.Asia.South Asia 1924 1380 544 65 53562
Geography.Regions.Asia.Southeast Asia 1456 774 682 55 54040
Geography.Regions.Asia.West Asia 4499 3564 935 369 50683
Geography.Regions.Europe.Eastern Europe 4478 3397 1081 396 50677
Geography.Regions.Europe.Europe* 16951 13906 3045 1930 36670
Geography.Regions.Europe.Northern Europe 4245 3158 1087 315 50991
Geography.Regions.Europe.Southern Europe 4409 3414 995 289 50853
Geography.Regions.Europe.Western Europe 4936 3955 981 396 50219
Geography.Regions.Oceania 1674 1025 649 72 53805
History and Society.Business and economics 3475 2362 1113 297 51779
History and Society.Education 1893 1199 694 125 53533
History and Society.History 5858 3535 2323 743 48950
History and Society.Military and warfare 5200 3672 1528 626 49725
History and Society.Politics and government 4691 2637 2054 481 50379
History and Society.Society 7986 4395 3591 744 46821
History and Society.Transportation 2514 2168 346 85 52952
STEM.Biology 3086 2423 663 156 52309
STEM.Chemistry 1651 1400 251 130 53770
STEM.Computing 2282 1963 319 168 53101
STEM.Earth and environment 1861 1308 553 120 53570
STEM.Engineering 2812 2190 622 200 52539
STEM.Libraries & Information 503 391 112 37 55011
STEM.Mathematics 592 492 100 30 54929
STEM.Medicine & Health 2084 1592 492 140 53327
STEM.Physics 1667 1315 352 142 53742
STEM.STEM* 16786 15011 1775 848 37917
STEM.Space 2077 1950 127 39 53435
STEM.Technology 4483 3448 1035 498 50570

Test data sample rates:

Test data sample rates
Label Sample Population
Culture.Biography.Biography* 0.282 0.123
Culture.Biography.Women 0.081 0.015
Culture.Food and drink 0.03 0.002
Culture.Internet culture 0.045 0.003
Culture.Linguistics 0.03 0.007
Culture.Literature 0.095 0.015
Culture.Media.Books 0.032 0.004
Culture.Media.Entertainment 0.039 0.004
Culture.Media.Films 0.054 0.011
Culture.Media.Media* 0.238 0.058
Culture.Media.Music 0.062 0.024
Culture.Media.Radio 0.004 0.002
Culture.Media.Software 0.04 0.001
Culture.Media.Television 0.037 0.009
Culture.Media.Video games 0.014 0.003
Culture.Performing arts 0.031 0.003
Culture.Philosophy and religion 0.07 0.011
Culture.Sports 0.059 0.071
Culture.Visual arts.Architecture 0.044 0.011
Culture.Visual arts.Comics and Anime 0.019 0.002
Culture.Visual arts.Fashion 0.015 0.001
Culture.Visual arts.Visual arts* 0.098 0.018
Geography.Geographical 0.088 0.024
Geography.Regions.Africa.Africa* 0.059 0.008
Geography.Regions.Africa.Central Africa 0.007 0.001
Geography.Regions.Africa.Eastern Africa 0.005 0
Geography.Regions.Africa.Northern Africa 0.024 0.001
Geography.Regions.Africa.Southern Africa 0.011 0.001
Geography.Regions.Africa.Western Africa 0.003 0.001
Geography.Regions.Americas.Central America 0.023 0.003
Geography.Regions.Americas.North America 0.106 0.064
Geography.Regions.Americas.South America 0.03 0.006
Geography.Regions.Asia.Asia* 0.219 0.045
Geography.Regions.Asia.Central Asia 0.02 0.001
Geography.Regions.Asia.East Asia 0.048 0.011
Geography.Regions.Asia.North Asia 0.046 0.001
Geography.Regions.Asia.South Asia 0.035 0.015
Geography.Regions.Asia.Southeast Asia 0.026 0.006
Geography.Regions.Asia.West Asia 0.081 0.011
Geography.Regions.Europe.Eastern Europe 0.081 0.013
Geography.Regions.Europe.Europe* 0.305 0.076
Geography.Regions.Europe.Northern Europe 0.076 0.031
Geography.Regions.Europe.Southern Europe 0.079 0.013
Geography.Regions.Europe.Western Europe 0.089 0.019
Geography.Regions.Oceania 0.03 0.015
History and Society.Business and economics 0.063 0.01
History and Society.Education 0.034 0.007
History and Society.History 0.105 0.011
History and Society.Military and warfare 0.094 0.014
History and Society.Politics and government 0.084 0.028
History and Society.Society 0.144 0.013
History and Society.Transportation 0.045 0.015
STEM.Biology 0.056 0.034
STEM.Chemistry 0.03 0.002
STEM.Computing 0.041 0.003
STEM.Earth and environment 0.034 0.005
STEM.Engineering 0.051 0.005
STEM.Libraries & Information 0.009 0.001
STEM.Mathematics 0.011 0
STEM.Medicine & Health 0.038 0.006
STEM.Physics 0.03 0.001
STEM.STEM* 0.302 0.069
STEM.Space 0.037 0.006
STEM.Technology 0.081 0.005

Test data performance:

Test data performance
Label Match rate Filter rate Recall Precision f1 Accuracy ROC AUC PR AUC
Culture.Biography.Biography* 0.127 0.873 0.92 0.888 0.904 0.976 0.983 0.956
Culture.Biography.Women 0.02 0.98 0.741 0.54 0.625 0.987 0.981 0.627
Culture.Food and drink 0.004 0.996 0.803 0.504 0.62 0.998 0.984 0.697
Culture.Internet culture 0.006 0.994 0.837 0.52 0.641 0.997 0.985 0.714
Culture.Linguistics 0.007 0.993 0.735 0.816 0.773 0.997 0.978 0.781
Culture.Literature 0.021 0.979 0.763 0.554 0.642 0.987 0.976 0.696
Culture.Media.Books 0.005 0.995 0.844 0.647 0.732 0.998 0.987 0.762
Culture.Media.Entertainment 0.005 0.995 0.487 0.348 0.406 0.995 0.969 0.332
Culture.Media.Films 0.012 0.988 0.881 0.801 0.839 0.996 0.987 0.897
Culture.Media.Media* 0.077 0.923 0.882 0.671 0.762 0.968 0.98 0.854
Culture.Media.Music 0.025 0.975 0.852 0.82 0.836 0.992 0.986 0.894
Culture.Media.Radio 0.002 0.998 0.73 0.757 0.743 0.999 0.916 0.533
Culture.Media.Software 0.004 0.996 0.938 0.291 0.444 0.997 0.988 0.496
Culture.Media.Television 0.009 0.991 0.811 0.791 0.801 0.996 0.983 0.807
Culture.Media.Video games 0.003 0.997 0.95 0.895 0.922 1 0.985 0.945
Culture.Performing arts 0.004 0.996 0.694 0.472 0.562 0.997 0.979 0.52
Culture.Philosophy and religion 0.013 0.987 0.559 0.459 0.504 0.988 0.955 0.506
Culture.Sports 0.062 0.938 0.844 0.973 0.904 0.987 0.979 0.948
Culture.Visual arts.Architecture 0.013 0.987 0.793 0.653 0.716 0.993 0.986 0.729
Culture.Visual arts.Comics and Anime 0.003 0.997 0.849 0.698 0.766 0.999 0.986 0.802
Culture.Visual arts.Fashion 0.001 0.999 0.744 0.423 0.539 0.999 0.982 0.439
Culture.Visual arts.Visual arts* 0.023 0.977 0.76 0.595 0.668 0.986 0.977 0.673
Geography.Geographical 0.029 0.971 0.737 0.597 0.66 0.982 0.975 0.704
Geography.Regions.Africa.Africa* 0.008 0.992 0.617 0.577 0.596 0.993 0.969 0.552
Geography.Regions.Africa.Central Africa 0.001 0.999 0.651 0.415 0.507 0.999 0.976 0.331
Geography.Regions.Africa.Eastern Africa 0.001 0.999 0.734 0.425 0.538 0.999 0.944 0.311
Geography.Regions.Africa.Northern Africa 0.003 0.997 0.671 0.325 0.438 0.998 0.979 0.359
Geography.Regions.Africa.Southern Africa 0.001 0.999 0.77 0.781 0.775 0.999 0.975 0.618
Geography.Regions.Africa.Western Africa 0.001 0.999 0.707 0.407 0.517 0.999 0.882 0.24
Geography.Regions.Americas.Central America 0.002 0.998 0.466 0.621 0.532 0.997 0.956 0.414
Geography.Regions.Americas.North America 0.061 0.939 0.723 0.76 0.741 0.968 0.972 0.801
Geography.Regions.Americas.South America 0.006 0.994 0.786 0.812 0.799 0.998 0.981 0.841
Geography.Regions.Asia.Asia* 0.061 0.939 0.815 0.61 0.698 0.968 0.971 0.76
Geography.Regions.Asia.Central Asia 0.002 0.998 0.66 0.314 0.426 0.998 0.976 0.358
Geography.Regions.Asia.East Asia 0.011 0.989 0.83 0.835 0.833 0.996 0.985 0.856
Geography.Regions.Asia.North Asia 0.007 0.993 0.746 0.099 0.175 0.993 0.984 0.18
Geography.Regions.Asia.South Asia 0.012 0.988 0.717 0.901 0.799 0.995 0.977 0.84
Geography.Regions.Asia.Southeast Asia 0.004 0.996 0.532 0.76 0.625 0.996 0.962 0.576
Geography.Regions.Asia.West Asia 0.016 0.984 0.792 0.547 0.647 0.991 0.982 0.676
Geography.Regions.Europe.Eastern Europe 0.017 0.983 0.759 0.56 0.644 0.989 0.979 0.677
Geography.Regions.Europe.Europe* 0.109 0.891 0.82 0.575 0.676 0.94 0.961 0.768
Geography.Regions.Europe.Northern Europe 0.029 0.971 0.744 0.793 0.767 0.986 0.978 0.83
Geography.Regions.Europe.Southern Europe 0.016 0.984 0.774 0.644 0.703 0.991 0.979 0.741
Geography.Regions.Europe.Western Europe 0.023 0.977 0.801 0.667 0.728 0.989 0.981 0.798
Geography.Regions.Oceania 0.011 0.989 0.612 0.876 0.721 0.993 0.97 0.743
History and Society.Business and economics 0.013 0.987 0.68 0.548 0.607 0.991 0.971 0.624
History and Society.Education 0.007 0.993 0.633 0.668 0.65 0.995 0.973 0.603
History and Society.History 0.021 0.979 0.603 0.307 0.407 0.981 0.953 0.393
History and Society.Military and warfare 0.022 0.978 0.706 0.448 0.548 0.984 0.972 0.577
History and Society.Politics and government 0.025 0.975 0.562 0.633 0.595 0.978 0.957 0.63
History and Society.Society 0.022 0.978 0.55 0.31 0.397 0.979 0.935 0.386
History and Society.Transportation 0.015 0.985 0.862 0.892 0.877 0.996 0.985 0.903
STEM.Biology 0.029 0.971 0.785 0.902 0.839 0.99 0.984 0.903
STEM.Chemistry 0.004 0.996 0.848 0.354 0.5 0.997 0.989 0.571
STEM.Computing 0.005 0.995 0.86 0.424 0.568 0.996 0.987 0.579
STEM.Earth and environment 0.005 0.995 0.703 0.589 0.641 0.996 0.974 0.616
STEM.Engineering 0.008 0.992 0.779 0.519 0.623 0.995 0.982 0.659
STEM.Libraries & Information 0.001 0.999 0.777 0.418 0.544 0.999 0.964 0.422
STEM.Mathematics 0.001 0.999 0.831 0.388 0.529 0.999 0.978 0.456
STEM.Medicine & Health 0.007 0.993 0.764 0.653 0.704 0.996 0.981 0.686
STEM.Physics 0.003 0.997 0.789 0.203 0.323 0.997 0.986 0.278
STEM.STEM* 0.082 0.918 0.894 0.752 0.817 0.972 0.978 0.901
STEM.Space 0.006 0.994 0.939 0.886 0.912 0.999 0.993 0.946
STEM.Technology 0.014 0.986 0.769 0.289 0.42 0.989 0.978 0.514

Implementation

[edit]
Model architecture
Model architecture
{
    "type": "GradientBoosting",
    "params": {
        "scale": false,
        "center": false,
        "labels": [
            "Culture.Biography.Biography*",
            "Culture.Biography.Women",
            "Culture.Food and drink",
            "Culture.Internet culture",
            "Culture.Linguistics",
            "Culture.Literature",
            "Culture.Media.Books",
            "Culture.Media.Entertainment",
            "Culture.Media.Films",
            "Culture.Media.Media*",
            "Culture.Media.Music",
            "Culture.Media.Radio",
            "Culture.Media.Software",
            "Culture.Media.Television",
            "Culture.Media.Video games",
            "Culture.Performing arts",
            "Culture.Philosophy and religion",
            "Culture.Sports",
            "Culture.Visual arts.Architecture",
            "Culture.Visual arts.Comics and Anime",
            "Culture.Visual arts.Fashion",
            "Culture.Visual arts.Visual arts*",
            "Geography.Geographical",
            "Geography.Regions.Africa.Africa*",
            "Geography.Regions.Africa.Central Africa",
            "Geography.Regions.Africa.Eastern Africa",
            "Geography.Regions.Africa.Northern Africa",
            "Geography.Regions.Africa.Southern Africa",
            "Geography.Regions.Africa.Western Africa",
            "Geography.Regions.Americas.Central America",
            "Geography.Regions.Americas.North America",
            "Geography.Regions.Americas.South America",
            "Geography.Regions.Asia.Asia*",
            "Geography.Regions.Asia.Central Asia",
            "Geography.Regions.Asia.East Asia",
            "Geography.Regions.Asia.North Asia",
            "Geography.Regions.Asia.South Asia",
            "Geography.Regions.Asia.Southeast Asia",
            "Geography.Regions.Asia.West Asia",
            "Geography.Regions.Europe.Eastern Europe",
            "Geography.Regions.Europe.Europe*",
            "Geography.Regions.Europe.Northern Europe",
            "Geography.Regions.Europe.Southern Europe",
            "Geography.Regions.Europe.Western Europe",
            "Geography.Regions.Oceania",
            "History and Society.Business and economics",
            "History and Society.Education",
            "History and Society.History",
            "History and Society.Military and warfare",
            "History and Society.Politics and government",
            "History and Society.Society",
            "History and Society.Transportation",
            "STEM.Biology",
            "STEM.Chemistry",
            "STEM.Computing",
            "STEM.Earth and environment",
            "STEM.Engineering",
            "STEM.Libraries & Information",
            "STEM.Mathematics",
            "STEM.Medicine & Health",
            "STEM.Physics",
            "STEM.STEM*",
            "STEM.Space",
            "STEM.Technology"
        ],
        "multilabel": true,
        "population_rates": null,
        "ccp_alpha": 0.0,
        "criterion": "friedman_mse",
        "init": null,
        "learning_rate": 0.1,
        "loss": "deviance",
        "max_depth": 5,
        "max_features": "log2",
        "max_leaf_nodes": null,
        "min_impurity_decrease": 0.0,
        "min_impurity_split": null,
        "min_samples_leaf": 1,
        "min_samples_split": 2,
        "min_weight_fraction_leaf": 0.0,
        "n_estimators": 150,
        "n_iter_no_change": null,
        "presort": "deprecated",
        "random_state": null,
        "subsample": 1.0,
        "tol": 0.0001,
        "validation_fraction": 0.1,
        "verbose": 0,
        "warm_start": false,
        "label_weights": {}
    }
}
Output schema
Output schema
{
    "title": "Scikit learn-based classifier score with probability",
    "type": "object",
    "properties": {
        "prediction": {
            "description": "The most likely labels predicted by the estimator",
            "type": "array",
            "items": {
                "type": "string"
            }
        },
        "probability": {
            "description": "A mapping of probabilities onto each of the potential output labels",
            "type": "object",
            "properties": {
                "Culture.Biography.Biography*": {
                    "type": "number"
                },
                "Culture.Biography.Women": {
                    "type": "number"
                },
                "Culture.Food and drink": {
                    "type": "number"
                },
                "Culture.Internet culture": {
                    "type": "number"
                },
                "Culture.Linguistics": {
                    "type": "number"
                },
                "Culture.Literature": {
                    "type": "number"
                },
                "Culture.Media.Books": {
                    "type": "number"
                },
                "Culture.Media.Entertainment": {
                    "type": "number"
                },
                "Culture.Media.Films": {
                    "type": "number"
                },
                "Culture.Media.Media*": {
                    "type": "number"
                },
                "Culture.Media.Music": {
                    "type": "number"
                },
                "Culture.Media.Radio": {
                    "type": "number"
                },
                "Culture.Media.Software": {
                    "type": "number"
                },
                "Culture.Media.Television": {
                    "type": "number"
                },
                "Culture.Media.Video games": {
                    "type": "number"
                },
                "Culture.Performing arts": {
                    "type": "number"
                },
                "Culture.Philosophy and religion": {
                    "type": "number"
                },
                "Culture.Sports": {
                    "type": "number"
                },
                "Culture.Visual arts.Architecture": {
                    "type": "number"
                },
                "Culture.Visual arts.Comics and Anime": {
                    "type": "number"
                },
                "Culture.Visual arts.Fashion": {
                    "type": "number"
                },
                "Culture.Visual arts.Visual arts*": {
                    "type": "number"
                },
                "Geography.Geographical": {
                    "type": "number"
                },
                "Geography.Regions.Africa.Africa*": {
                    "type": "number"
                },
                "Geography.Regions.Africa.Central Africa": {
                    "type": "number"
                },
                "Geography.Regions.Africa.Eastern Africa": {
                    "type": "number"
                },
                "Geography.Regions.Africa.Northern Africa": {
                    "type": "number"
                },
                "Geography.Regions.Africa.Southern Africa": {
                    "type": "number"
                },
                "Geography.Regions.Africa.Western Africa": {
                    "type": "number"
                },
                "Geography.Regions.Americas.Central America": {
                    "type": "number"
                },
                "Geography.Regions.Americas.North America": {
                    "type": "number"
                },
                "Geography.Regions.Americas.South America": {
                    "type": "number"
                },
                "Geography.Regions.Asia.Asia*": {
                    "type": "number"
                },
                "Geography.Regions.Asia.Central Asia": {
                    "type": "number"
                },
                "Geography.Regions.Asia.East Asia": {
                    "type": "number"
                },
                "Geography.Regions.Asia.North Asia": {
                    "type": "number"
                },
                "Geography.Regions.Asia.South Asia": {
                    "type": "number"
                },
                "Geography.Regions.Asia.Southeast Asia": {
                    "type": "number"
                },
                "Geography.Regions.Asia.West Asia": {
                    "type": "number"
                },
                "Geography.Regions.Europe.Eastern Europe": {
                    "type": "number"
                },
                "Geography.Regions.Europe.Europe*": {
                    "type": "number"
                },
                "Geography.Regions.Europe.Northern Europe": {
                    "type": "number"
                },
                "Geography.Regions.Europe.Southern Europe": {
                    "type": "number"
                },
                "Geography.Regions.Europe.Western Europe": {
                    "type": "number"
                },
                "Geography.Regions.Oceania": {
                    "type": "number"
                },
                "History and Society.Business and economics": {
                    "type": "number"
                },
                "History and Society.Education": {
                    "type": "number"
                },
                "History and Society.History": {
                    "type": "number"
                },
                "History and Society.Military and warfare": {
                    "type": "number"
                },
                "History and Society.Politics and government": {
                    "type": "number"
                },
                "History and Society.Society": {
                    "type": "number"
                },
                "History and Society.Transportation": {
                    "type": "number"
                },
                "STEM.Biology": {
                    "type": "number"
                },
                "STEM.Chemistry": {
                    "type": "number"
                },
                "STEM.Computing": {
                    "type": "number"
                },
                "STEM.Earth and environment": {
                    "type": "number"
                },
                "STEM.Engineering": {
                    "type": "number"
                },
                "STEM.Libraries & Information": {
                    "type": "number"
                },
                "STEM.Mathematics": {
                    "type": "number"
                },
                "STEM.Medicine & Health": {
                    "type": "number"
                },
                "STEM.Physics": {
                    "type": "number"
                },
                "STEM.STEM*": {
                    "type": "number"
                },
                "STEM.Space": {
                    "type": "number"
                },
                "STEM.Technology": {
                    "type": "number"
                }
            }
        }
    }
}
Example input and output
Input:
https://ores.wikimedia.org/v3/scores/hywiki/8658862/articletopic

Output:

Example output
{
    "hywiki": {
        "models": {
            "articletopic": {
                "version": "1.4.0"
            }
        },
        "scores": {
            "8658862": {
                "articletopic": {
                    "score": {
                        "prediction": [
                            "Geography.Regions.Asia.Asia*",
                            "Geography.Regions.Asia.South Asia"
                        ],
                        "probability": {
                            "Culture.Biography.Biography*": 0.02422244388067302,
                            "Culture.Biography.Women": 0.004590189485908635,
                            "Culture.Food and drink": 0.000593925273532745,
                            "Culture.Internet culture": 0.0012627822764779287,
                            "Culture.Linguistics": 0.003555844037516116,
                            "Culture.Literature": 0.005544604789740323,
                            "Culture.Media.Books": 0.001158495969631809,
                            "Culture.Media.Entertainment": 0.0018323597994658433,
                            "Culture.Media.Films": 0.00033560605162936974,
                            "Culture.Media.Media*": 0.0045343813847354905,
                            "Culture.Media.Music": 0.0004843346548800057,
                            "Culture.Media.Radio": 1.392574877538206e-05,
                            "Culture.Media.Software": 0.00025844073978464595,
                            "Culture.Media.Television": 0.00034419434308919776,
                            "Culture.Media.Video games": 4.280533183432128e-05,
                            "Culture.Performing arts": 0.00063179156072444,
                            "Culture.Philosophy and religion": 0.3711155137903629,
                            "Culture.Sports": 0.0036244519038555697,
                            "Culture.Visual arts.Architecture": 0.0028136699436777427,
                            "Culture.Visual arts.Comics and Anime": 0.0001390074106899022,
                            "Culture.Visual arts.Fashion": 0.0007178288823863084,
                            "Culture.Visual arts.Visual arts*": 0.005269038325102219,
                            "Geography.Geographical": 0.2524351124002908,
                            "Geography.Regions.Africa.Africa*": 0.012126621454447192,
                            "Geography.Regions.Africa.Central Africa": 5.914727944463523e-05,
                            "Geography.Regions.Africa.Eastern Africa": 0.0009634788615136871,
                            "Geography.Regions.Africa.Northern Africa": 0.0014164304776163943,
                            "Geography.Regions.Africa.Southern Africa": 0.0002584714601394708,
                            "Geography.Regions.Africa.Western Africa": 9.301139644499533e-06,
                            "Geography.Regions.Americas.Central America": 0.004175898142649278,
                            "Geography.Regions.Americas.North America": 0.009485683925882019,
                            "Geography.Regions.Americas.South America": 0.000401632042878274,
                            "Geography.Regions.Asia.Asia*": 0.9910897147526467,
                            "Geography.Regions.Asia.Central Asia": 0.004504171246660892,
                            "Geography.Regions.Asia.East Asia": 0.01792040515979525,
                            "Geography.Regions.Asia.North Asia": 0.0011675076943681829,
                            "Geography.Regions.Asia.South Asia": 0.998360899663115,
                            "Geography.Regions.Asia.Southeast Asia": 0.09796124031125475,
                            "Geography.Regions.Asia.West Asia": 0.022082349127207978,
                            "Geography.Regions.Europe.Eastern Europe": 0.007670121398609532,
                            "Geography.Regions.Europe.Europe*": 0.017986477795028887,
                            "Geography.Regions.Europe.Northern Europe": 0.006012507157508805,
                            "Geography.Regions.Europe.Southern Europe": 0.00604176949302703,
                            "Geography.Regions.Europe.Western Europe": 0.002840008039240512,
                            "Geography.Regions.Oceania": 0.005502290390648899,
                            "History and Society.Business and economics": 0.028731862343375242,
                            "History and Society.Education": 0.008949469019074941,
                            "History and Society.History": 0.1739217023221256,
                            "History and Society.Military and warfare": 0.03266142222137065,
                            "History and Society.Politics and government": 0.1491903434201552,
                            "History and Society.Society": 0.07254632815381397,
                            "History and Society.Transportation": 0.0003174916188457565,
                            "STEM.Biology": 0.004020816180798944,
                            "STEM.Chemistry": 0.005676249499468716,
                            "STEM.Computing": 0.0017880809490195585,
                            "STEM.Earth and environment": 0.005972258034398003,
                            "STEM.Engineering": 0.003907983077647486,
                            "STEM.Libraries & Information": 0.0005256470127341649,
                            "STEM.Mathematics": 0.00015714078387499814,
                            "STEM.Medicine & Health": 0.005778276394043048,
                            "STEM.Physics": 0.0004302543273463213,
                            "STEM.STEM*": 0.09532948612026247,
                            "STEM.Space": 0.0001287338555615804,
                            "STEM.Technology": 0.00731848551345249
                        }
                    }
                }
            }
        }
    }
}

Data

[edit]
Data pipeline
The data to train was fetched from a set of revision IDs. Then various pieces of information about the revision were extracted using automated processes, and the revision text was fed into word2vec to get an article embedding. Finally, labels are derived from the mid-level WikiProject categories that the article is associated with.
Training data
Training data was automatically and randomly separated from test data during training using the drafttopic git repository (which trains both drafttopic and articletopic models).
Test data
Test data was automatically and randomly split off from train data using the drafttopic git repository (which trains both drafttopic and articletopic models). The model then makes a prediction on that data, which is compared to the underlying ground truth to calculate performance statistics.

Licenses

[edit]

Citation

[edit]

Cite this model card as:

@misc{
  Triedman_Bazira_2023_Armenian_Wikipedia_article_topic,
  title={ Armenian Wikipedia article topic model card },
  author={ Triedman, Harold and Bazira, Kevin },
  year={ 2023 },
  url={ https://meta.wikimedia.org/wiki/Machine_learning_models/Production/Armenian_Wikipedia_article_topic }
}