Machine learning models/Production/gogologo

Model card
Model card
This page is an on-wiki machine learning model card.
	A model card is a document about a machine learning model that seeks to answer basic questions about the model.
Model Information Hub
Model creator(s)	MFossati_(WMF)
Model owner(s)	WMF Structured Content team
Model interface	Commons API
Code	GitLab
Uses PII	No
In production?	Yes
Which projects?	Commons
	Given an image file on Commons, detect whether it's a logo.
	v; t; e;

Wikimedia Commons is a multimedia repository of publicly usable files, which must be released under a free license. Hence, files that are subject to copyright are candidates for deletion. Understanding copyright is a complex task that can lead to its infringement. Following an analysis of deletion requests, we observed that a significant amount of media are deleted due to copyright violations, typically ranging from freedom of panorama to threshold of originality reasons. Non-free logo images both stand out as the second most frequent reason for deletion and represent a fairly unambiguous target that fits a machine learning task.

Content on Commons is usually curated by the community, with specialized contributors that patrol new uploads and delete inappropriate files. We argue that automatic approaches to detect problematic media can alleviate moderators' burden, thus simplifying problematic media detection. Therefore, we trained an image classifier with available Commons images to predict whether a given input image is a logo. The model is publicly available through a Commons API endpoint, and should be used to distinguish logo images from non-logo ones at a high level. On the other hand, it's not suitable to drill down into specific classes of graphic images: for instance, coat of arms, glyphs, chemical structures, or flags are likely to get classified as logos.

Motivation

Copyrighted logo images on Wikimedia Commons are the second reason for media deletion, according to an analysis of deletion requests. This model detects them and aims at facilitating content moderation through automatic identification of problematic media.

Users and uses

Use this model for

distinguishing logo images from non-logo ones.

Don't use this model for

fine-grained classification of graphics like diagrams or road signs.

Current uses

Monthly datasets of logo uploads, announced in the Commons Administrators' noticeboard, e.g., November 2024.

Ethical considerations, caveats, and recommendations

The model offers a high-level distinction between graphic images (typically logos) and photographic ones. It's not suitable to drill down into specific classes of graphic images: for instance, coat of arms, glyphs, chemical structures, or flags are likely to get classified as logos.

Model

Performance

Test dataset: available Commons images
# test samples: 47,976 - half belonging to commons:Category:Logos, half random
accuracy: 96.9
AUC precision/recall: 98.8
AUC ROC: 99
loss: 10.2

Metrics definitions

Accuracy
Area under the curve (AUC), computed separately for each class and then averaged across classes, see also en:Receiver operating characteristic#ROC curves beyond binary classification
- AUC precision/recall
- AUC ROC
model's loss function, i.e., categorical cross-entropy

Implementation

Model architecture

Image classifier with an EfficientNetV2 backbone pre-trained on the ImageNet classification task (i.e., efficientnetv2_b0_imagenet preset from [1]). Fine-tuned on available Commons images.

# Layers & their parameters
Input = 0
EfficientNetV2 backbone = 5,919,312
Global average pooling 2D = 0
Predictions = 2,562

# Parameters
Total = 17,644,408 (67.31 MB)
Trainable = 5,861,266 (22.36 MB)
Non-trainable = 60,608 (236.75 KB)
Optimizer = 11,722,534 (44.72 MB)

# Dataset
Validation split = 0.2
Image size = (224, 224)
Batch size = 64

# Data augmentation
Contrast factor = 0.11
Rotation factor = 0.16
Translation factor = 0.084

# Model
Classes = 2
Epochs = 25
Optimizer = Adam
Learning rate = 1e-2
Loss = categorical cross-entropy

Output schema

{
  "filename": <Commons file name>,
  "target": "logo",
  "prediction": <logo probability score (0,1)>,
  "out_of_domain": <non-logo probability score (0,1)>
}

Example input and output

Input:

$ curl 'https://commons.wikimedia.org/w/api.php?action=mediadetection&format=json&formatversion=2&filename=Kanion_Co.png'

Output:

{
  "predictions": [
    {
      "filename": "Kanion_Co.png",
      "target": "logo",
      "prediction": 0.9978,
      "out_of_domain": 0.0022
    }
  ]
}

Data

Data pipeline

Download a dataset of Commons image thumbnails from the API:
- one half belongs to commons:Category:Logos and its sub-categories, as returned by this PetScan query
- the other half is a random sample of available images
Build the training & validation sets:

import keras

train, val = keras.utils.image_dataset_from_directory(
    INPUT_DIR,
    label_mode='categorical',
    class_names=('out_of_domain', 'logo'),
    batch_size=64, image_size=(224, 224),
    seed=1984, validation_split=0.2, subset='both',
)

Augment the training set:

import tensorflow as tf

def augment(image, augmentation_layers):
    for layer in augmentation_layers:
        image = layer(image)
    return image

augmentation_layers = [
    keras.layers.RandomContrast(0.11, seed=1984),
    keras.layers.RandomFlip(seed=1984),
    keras.layers.RandomRotation(0.16, seed=1984),
    keras.layers.RandomTranslation(
        height_factor=0.084,
        width_factor=0.084,
        seed=1984,
    ),
]
train = train.map(
    lambda img, label: (
        augment(img, augmentation_layers),
        label,
    ),
    num_parallel_calls=tf.data.AUTOTUNE,
)

Training data

24 k samples, half logos, half out of domain.

Test data

48 k samples, half logos, half out of domain.

Licenses

Code: GNU General Public License v3.0
Model: Creative Commons CC0 1.0