Machine learning models/Production/gogologo
Model card | |
---|---|
This page is an on-wiki machine learning model card. | |
![]() A model card is a document about a machine learning model that seeks to answer basic questions about the model. | |
Model Information Hub | |
Model creator(s) | MFossati_(WMF) |
Model owner(s) | WMF Structured Content team |
Model interface | Commons API |
Code | GitLab |
Uses PII | No |
In production? | Yes |
Which projects? | Commons |
Given an image file on Commons, detect whether it's a logo. | |
Wikimedia Commons is a multimedia repository of publicly usable files, which must be released under a free license.
Hence, files that are subject to copyright are candidates for deletion.
Understanding copyright is a complex task that can lead to its infringement.
Following an analysis of deletion requests, we observed that a significant amount of media are deleted due to copyright violations, typically ranging from freedom of panorama to threshold of originality reasons.
Non-free logo images both stand out as the second most frequent reason for deletion and represent a fairly unambiguous target that fits a machine learning task.
Content on Commons is usually curated by the community, with specialized contributors that patrol new uploads and delete inappropriate files. We argue that automatic approaches to detect problematic media can alleviate moderators' burden, thus simplifying problematic media detection. Therefore, we trained an image classifier with available Commons images to predict whether a given input image is a logo. The model is publicly available through a Commons API endpoint, and should be used to distinguish logo images from non-logo ones at a high level. On the other hand, it's not suitable to drill down into specific classes of graphic images: for instance, coat of arms, glyphs, chemical structures, or flags are likely to get classified as logos.
Motivation
[edit]Copyrighted logo images on Wikimedia Commons are the second reason for media deletion, according to an analysis of deletion requests. This model detects them and aims at facilitating content moderation through automatic identification of problematic media.
Users and uses
[edit]Ethical considerations, caveats, and recommendations
[edit]The model offers a high-level distinction between graphic images (typically logos) and photographic ones. It's not suitable to drill down into specific classes of graphic images: for instance, coat of arms, glyphs, chemical structures, or flags are likely to get classified as logos.
Model
[edit]Performance
[edit]- Test dataset: available Commons images
- # test samples: 47,976 - half belonging to commons:Category:Logos, half random
- accuracy: 96.9
- AUC precision/recall: 98.8
- AUC ROC: 99
- loss: 10.2
Metrics definitions
[edit]- Accuracy
- Area under the curve (AUC), computed separately for each class and then averaged across classes, see also en:Receiver operating characteristic#ROC curves beyond binary classification
- AUC precision/recall
- AUC ROC
- model's loss function, i.e., categorical cross-entropy
Implementation
[edit]Image classifier with an EfficientNetV2 backbone pre-trained on the ImageNet classification task (i.e., efficientnetv2_b0_imagenet
preset from [1]). Fine-tuned on available Commons images.
# Layers & their parameters
Input = 0
EfficientNetV2 backbone = 5,919,312
Global average pooling 2D = 0
Predictions = 2,562
# Parameters
Total = 17,644,408 (67.31 MB)
Trainable = 5,861,266 (22.36 MB)
Non-trainable = 60,608 (236.75 KB)
Optimizer = 11,722,534 (44.72 MB)
# Dataset
Validation split = 0.2
Image size = (224, 224)
Batch size = 64
# Data augmentation
Contrast factor = 0.11
Rotation factor = 0.16
Translation factor = 0.084
# Model
Classes = 2
Epochs = 25
Optimizer = Adam
Learning rate = 1e-2
Loss = categorical cross-entropy
{
"filename": <Commons file name>,
"target": "logo",
"prediction": <logo probability score (0,1)>,
"out_of_domain": <non-logo probability score (0,1)>
}
Input:
$ curl 'https://commons.wikimedia.org/w/api.php?action=mediadetection&format=json&formatversion=2&filename=Kanion_Co.png'
Output:
{
"predictions": [
{
"filename": "Kanion_Co.png",
"target": "logo",
"prediction": 0.9978,
"out_of_domain": 0.0022
}
]
}
Data
[edit]- Download a dataset of Commons image thumbnails from the API:
- one half belongs to commons:Category:Logos and its sub-categories, as returned by this PetScan query
- the other half is a random sample of available images
- Build the training & validation sets:
import keras
train, val = keras.utils.image_dataset_from_directory(
INPUT_DIR,
label_mode='categorical',
class_names=('out_of_domain', 'logo'),
batch_size=64, image_size=(224, 224),
seed=1984, validation_split=0.2, subset='both',
)
- Augment the training set:
import tensorflow as tf
def augment(image, augmentation_layers):
for layer in augmentation_layers:
image = layer(image)
return image
augmentation_layers = [
keras.layers.RandomContrast(0.11, seed=1984),
keras.layers.RandomFlip(seed=1984),
keras.layers.RandomRotation(0.16, seed=1984),
keras.layers.RandomTranslation(
height_factor=0.084,
width_factor=0.084,
seed=1984,
),
]
train = train.map(
lambda img, label: (
augment(img, augmentation_layers),
label,
),
num_parallel_calls=tf.data.AUTOTUNE,
)
Licenses
[edit]- Code: GNU General Public License v3.0
- Model: Creative Commons CC0 1.0