Jump to content

AI Sauna/AI for Wikimedia Commons

From Meta, a Wikimedia project coordination wiki

AI for Wikimedia Commons

[edit]

Description

[edit]

Experiment with existing AI models to classify, categorize, describe and title Commons images, also to help Wiki Loves Monuments

The team

[edit]

What were the roles of each?

Created by: Ferdinando Traversa, Jens B., User:TuukkaH, User:Tohaomg, User:Piracalamina

Results

[edit]

Our method

[edit]

We started by exploring existing AI models to see if they are suitable to reach our goals.

For example: using an AI model to evaluate whether an image is suitable for WLM (e.g. it has no fingers, no people etc.). In Italy we have the panorama view sections that always gets a lot of images which are not panorama views. Using AI could be an idea.

Categories sugestions
Picture Question Model Answer good enough?
Archivo:Palacio Santos, interior 25.jpg What 10 Wikimedia Commons Categories can I use for this picture? blip-2 building, ceiling, floor, tile, tiled, tiled floor, tiled, tiled floor, tiled, tiled floor, tile no
I used four old files in c:Category:Photographs_of_Donostia-San_Sebastián_from_the_Foto_Car_Collection Asked for captioning blip-2 Captions are very generalistic, and not context aware it can't access Commons images directly, must be downloaded and uploaded. The captioning is not bad, but also not useful.
I used four old files in c:Category:Photographs_of_Donostia-San_Sebastián_from_the_Foto_Car_Collection
  • Asked for images descriptions
  • Asked to give relevant Wikipedia articles for the description
lava 13-b Description is very good, and could be interesting for describing the images. Anyway, it lacks cultural referencing. it can't access Commons images directly, must be downloaded and uploaded.
  • The image description is interesting, with small tweaks it could create a valid description.
  • Some of the articles don't exist, but the interesting thing is that they could exist (even, they should exist).
File:Statue-de-la-liberte-new-york.jpg Asked for Wikidata item of the image (Statue of Liberty) blip-2 "The image features the Statue of Liberty, a famous landmark in New York City, United States. The Wikidata item for the Statue of Liberty is Q1120." It recognizes a very evident element, but Nice model (Q1120) is not the Statue of Liberty
Gave a photo of Barack Obama. Then four images form the Wikipedia page of four very well known Basque personalities. Asked to recognize those. blip-2 Barack Obama was easily recognized. All the others were regonized as another person, with no relation. What is the limit to know someone?
File:Club Uruguay, Montevideo, Uruguay.JPG Which of the following Wikimedia Commons categories would you suggest for this picture: Arches in Uruguay,Balconies in Uruguay,Balustrades in Uruguay,Bartizans in Uruguay,Bathtubs in Uruguay,Battlements in Uruguay,Dome interiors in Uruguay,Coffered ceilings in Uruguay llava-13-b Based on the image, I would suggest the following Wikimedia Commons categories: Arches in Uruguay, Doors in Uruguay, and Wooden doors in Uruguay. The image features a large wooden door with an arched top, which is a notable architectural element in the scene. Yes! It can be problematic that it makes up Categories, but these categories make sense. We should avoid returning a parent category of a more accurate category that is also suggested. (We can't suggest both Doors in Uruguay and Wooden doors in Uruguay
File:Club Uruguay, Montevideo, Uruguay.JPG Which of the following Wikimedia Commons categories would you suggest for this picture: Arches in Uruguay,Balconies in Uruguay,Balustrades in Uruguay,Bartizans in Uruguay,Bathtubs in Uruguay,Battlements in Uruguay,Dome interiors in Uruguay,Coffered ceilings in Uruguay blip-2 Bartizans in Uruguay not really the most correct but sticks to the prompt
File:Alberobello_-_Chiesa_del_Carmine_-_2023-09-23_11-11-22_001.jpg In which of these category do this picture belong?

Alberobello train station
Buildings in Alberobello
Città dei Trulli (folk group)
Night in Alberobello
Streets in Alberobello
Urban squares in Alberobello

blip-2 Streets in Alberobello very good
File: Barletta_-_Basilica_del_Santo_Sepolcro_-_2023-09-04_17-39-07_004.jpg In which of these categories do this picture belong?

Architecture in Barletta

Barletta in art

Art in Barletta

Culture of Barletta

Disfida di Barletta

Economy of Barletta

Events in Barletta

History of Barletta

Landscapes of Barletta

Maps of Barletta

Palazzo Gentile (Barletta)

People of Barletta

Porto di Barletta

Roman Catholic Archdiocese of Trani-Barletta-Bisceglie

Streets in Barletta

Teleferica di Barletta

Barletta train station

blip-2 Architecture in Barletta very good
File: Barletta_-_Basilica_del_Santo_Sepolcro_-_2023-09-04_17-39-07_004.jpg In which of these categories do this picture belong?

Architecture in Barletta

Churches in Barletta

Barletta in art

Art in Barletta

Culture of Barletta

Disfida di Barletta

Economy of Barletta

Events in Barletta

History of Barletta

Landscapes of Barletta

Maps of Barletta

Palazzo Gentile (Barletta)

People of Barletta

Porto di Barletta

Roman Catholic Archdiocese of Trani-Barletta-Bisceglie

Streets in Barletta

Teleferica di Barletta

Barletta train station

blip-2 Churches in Barletta very good. It was able to identify it is a church
In which of these categories do this picture belong?

Agenzia nazionale per le nuove tecnologie, l'energia e lo sviluppo economico sostenibile (6 C, 2 F)

Ancient Roman architecture in Rome

Art Déco architecture in Rome

Art Nouveau architecture in Rome

Baroque architecture in Rome

Byzantine architecture in Rome

Eclectic architecture in Rome

Gothic architecture in Rome

Gothic Revival architecture in Rome

Mannerist architecture in Rome

Neoclassical architecture in Rome

Paleochristian architecture in Rome

Rationalist architecture in Rome

Renaissance architecture in Rome

Romanesque architecture in Rome

Romanesque Revival architecture in Rome

Cityscapes of Rome

Culture of Rome

Economy of Rome

Films set in Rome

Films shot in Rome

Geography of Rome

Government of Rome

Historic Centre of Rome, the Properties of the Holy See in that City Enjoying Extraterritorial Rights and San Paolo Fuori le Mura (5 C, 11 F)

History of Rome

Infrastructure in Rome

Inscriptions in Rome

Nature of Rome

Objects in Rome

Panoramics in Rome

People of Rome

Public services in Rome

Honorary citizens of Rome

Satellite pictures of Rome

Science in Rome

Things named after Rome

Twin towns of Rome

Villages in Rome

Archivio di Stato (Rome)

Art of Rome

Books about Rome

Books published in Rome

Bookshops in Rome

Catacombs of Rome

Cinema of Rome

Cinemas in Rome

Cuisine of Rome

Devotion in Rome

Education in Rome

Festivals in Rome

Folkstudio

GBR (television channel)

Gruppo dei Romanisti

Guilds of Rome

Homosexuality in Rome

Klimt 1918

Libraries in Rome

Media of Rome

Museums in Rome

Music of Rome

Playgrounds in Rome

Polish Institutes in Rome

Politics in Rome

Printers from Rome

Published in Rome

Rome Prize

Signs in Rome

Spoken Rome

Sports in Rome

Symbols of Rome

The Lust Syndicate

Theatres in Rome

Culture of the Vatican City

École française de Rome

blip-2 Ancient Roman architecture in Rome spot on!

Ideas

[edit]
  1. Call the API that gives us possible commons categories from a coordinate (talk to Edward)
  2. Ask llava-1.6 model: Which of the following Wikimedia Commons categories would you suggest for this picture: [categories]
  3. Check if categories exist
  4. Fetch each category's subcategories.
  5. Ask llava-1.6 (?) to pick from the subcategories
  6. Repeat?
  7. Return the most specific categores

Resources we used

[edit]

What resources did you use? Were they useful?

Conclusion

[edit]

What did you learn?

  • Defusing/cleaning up categories with AI seems promising. A specifically trained model could yield even better and more precise results
  • We should not expect the model to know about Wikimedia Commons but suggest a set of categories
  • We should run our non-proprietary models on own servers

What next

[edit]

Do you wish to continue exploring this? What was not covered? What did you get curious about?

Try apple AIM with specific training for our categories

Links, images, documentation

[edit]

Upload at least one image to Wikimedia Commons for the image of the page banner.