Jump to content

User:HDothiduc (WMF)/GLAM data and media partnerships workflow

From Meta, a Wikimedia project coordination wiki
GLAM
GLAM

Galleries • Libraries • Archives • Museums
Email the team: glam(_AT_)wikimedia.org'

This page summarizes the basic steps in data and media partnerships (for Wikidata and structured data on Wikimedia Commons), between Wikimedians and cultural institutions. It is meant to help globally understand the overall workflow and to give pointers to the most often-used tools.

[Graphic and table should have same direction]


Introduction

Step in workflow

💡Tips

🛠Tools

Negotiations between a GLAM partner and Wikimedia community members
  • Both sides can get to know each other by starting with smaller activities (e.g. an edit-a-thon or internal Wikimedia course).
  • Agreements about the co-operation can be made explicit in a Memorandum of Understanding. (Guide on how to create a MoU)

 

0 - Source data and media files provided
Data and media files are made available for Wikimedia Commons and/or Wikidata.
  • Website scraping/ingest tools (if the data is available online but the partner can't produce data exports from its database)
    • Tabula - open source tool to extract tables from PDF files
    • PAWS - Python programming notebook environment on Wikimedia Tools Lab that can transfer records from an institution's API

 

Pre Upload

Step in workflow

💡Tips

🛠Tools

1 - Clarify copyright status
Make sure that copyright of the data and media files is compatible with Wikimedia projects.

 

If permissions and licenses for copyrighted media files aren't published in a public place: make sure the permissions are clarified via an e-mail to OTRS, the platform used by the Wikimedia projects to manage and archive e-mail conversations. (Licensing images: when do I contact OTRS?)

 

2 - Prepare data
Clean up the data to be consistent and compatible with Wikimedia Commons and/or Wikidata.
  • Look at similar media or data items on Wikimedia Commons or Wikidata for inspiration how to model the data.
  • Wikidata's WikiProjects – the 'groups' where volunteers work together on common interests – often have recommendations on data modelling for specific subjects.

 

  • Spreadsheet software - allows non-programmers to run checks against existing Wikimedia content
  • OpenRefine (formerly Google Refine) - popular tool for advanced data cleaning, transformation and matching against Wikidata content. Its homepage includes video tutorials and a guide on how to use version 3.0 and higher for Wikidata manipulation and uploading.
  • PAWS and Pywikibot - for those with some programming experience allows for large scale querying and advanced actions.

 

3 - Check what is already on Wikimedia projects
Always check which data and media items are already present on Wikidata and Wikimedia Commons.
  • Volunteers have often already autonomously uploaded quite a few images from GLAM collections.
  • Wikidata will probably already contain quite a few data items about creative works, people and topics related to specific GLAM collections.
  • On Wikimedia Commons, it is considered good practice to upload new (higher-quality) media files. Don't overwrite existing files.
  • On Wikidata, duplicate items must be avoided and merged when they are discovered. It is OK (and even highly recommended) to add extra sources and statements to existing items though.

 

Reconciliation is the step where data items from a source dataset are matched with their corresponding Wikidata items.
  • Be thorough during this step. Creating many duplicate Wikidata items must be avoided, as these cause a lot of cleanup work for the Wikidata community!

 

 

Upload

Step in workflow

💡Tips

🛠Tools

5 - Upload
Upload the new data items and/or media files to Wikidata and/or Commons.
  • Start with small test batches to check for structural errors.
  • Upload in manageable batches. Don't make your batches too large (hundreds rather than thousands) – correcting mistakes in thousands of data items or files at once is not fun.
  • Occasionally check uploads during the process, to prevent errors from propagating.
Wikimedia Commons:
  • Upload Wizard for simple uploads of up to 50 files. Offers no options for refined metadata.
  • Pattypan, a user-friendly batch upload tool that works with spreadsheets and that allows for refined details in metadata.
  • GLAMwiki Toolset, an advanced upload tool for XML feeds of large file batches. Requires days of lead time and a request for permission to use the tool.

Wikidata:

  • QuickStatements, create or update Wikidata items using tab-delimited or CSV files
  • OpenRefine (3.0+) tool that has powerful upload functionality for Wikidata

For both:

 

Post Upload

Step in workflow

💡Tips

🛠Tools

6 - Corrections
Fix mistakes and omissions that were made during the upload.
  • Mistakes happen! Take responsibility for them, and make sure to correct and improve your own uploads.
Wikimedia Commons:
  • Cat-a-lot, a gadget on Wikimedia Commons to help with categorizing images by pointing and clicking. Activate in your Commons user preferences.
  • VisualFileChange.js, a gadget on Wikimedia Commons that allows you to do batch edits to groups of media files
  • AutoWikiBrowser, a semi-automated editor

Wikidata:

  • QuickStatements, create or update Wikidata items using tab-delimited or CSV files
  • OpenRefine (3.0+) tool that has powerful upload functionality for Wikidata
  • EditGroups allows to 'undo' faulty batch edits that were performed with QuickStatements and with OpenRefine
  • PetScan, the advanced search and query tool for Wikimedia projects, also has (limited) editing functionalities for Wikidata items.

 

7 - Enrich
Work with Wikimedia communities to enhance and enrich the data and media.

Improvements can include:

  • More precise metadata (e.g. what are the places, objects, people depicted in a media file?)
  • More references
  • Translations of metadata

 

 

8 - Re-use
Encourage use of the media and data in Wikimedia projects and beyond.
  • Campaigns can help a lot: Wikipedia article writing contests, photography events...
  • Think beyond Wikipedia; perhaps the media or data can be re-used on other platforms too.

 

 

Impact

Step in workflow

💡Tips

🛠Tools

9 - Evaluate and report
Evaluate the impact of the media files and/or data by measuring improvements and (re-)use

Measurable aspects may include

  • (Number of) people who worked on the data and media
  • Types of enrichment
  • Inclusions in Wikimedia project
  • Pageviews of pages where data/media is used

 

Wikimedia Commons:
  • GLAMorous shows how often media files from a Commons category (or uploaded by Commons user) are used in other Wikimedia projects
  • BaGLAMa shows Wikimedia page views over time, for specific categories of media files on Wikimedia Commons. Get in touch with its maintainer, Magnus Manske, who can add your own category/ies.
  • GLAMorgan shows Wikimedia page views for a specific Wikimedia Commons category for a specific month.
  • Fae's GLAM Dashboard, a set of templates that show interesting data about a Commons category, including the most edited files and the most active volunteers who have contributed to them.

Wikidata:

 


Potential template for illustrations with labels:

This could be more versatile for translations. In this version, people only have to upload a png with cut out background.

Short label



This is a rather long label


Source: Data and media partnerships workflow