GLAM CSI/User story – Image reconciliation uploader

Persona: Casey, art museum collections manager

Background: Casey has been working as a collections manager at a museum of fine art for the last year. The museum has recently released high-resolution versions of their open access collection to the public. Casey knows that the museum uploaded a mix of low and high resolution images to Wikimedia Commons many years ago, and wants to supplement what Commons has now.
Goals: Provide greater public access to higher resolution images of their fine art collections, and with better metadata to make them discoverable.
Skills: Knowledgeable of working with image metadata using python and other tools such as OpenRefine. They have uploaded using Upload Wizard, but doesn't have experience with bulk upload tools.
Challenges: Limited technical support for large-scale digital projects, navigating the complexities of copyright permissions for artists' works, engaging a broader audience beyond the local community.

User Story: Consolidating and supplementing images on Wikimedia Commons

As Casey, the art museum collections manager...
I want to supplement existing Wikimedia Commons uploads with new high-resolution images and metadata from our collections

So that the collections on Commons are more complete and discoverable for re-use within Wikimedia projects, and for the rest of the world

User Scenario: Image reconciliation and uploading


Step	Narrative	Notes
1	Casey finds a Wikimedia Commons category of uploaded files from their museum's collection performed by their institution over the years and wants to supplement it with new or updated files.
2	Casey needs to find out what images are currently in Commons and to compare that to what is now available from the institution.
3	Casey creates a dump of Commons filenames and relevant metadata.
4	Casey compares the institution's content to what is on Commons to eliminate duplicates if possible, resulting in a working list of files.
5	Casey then uploads the relevant files to Wikimedia Commons in the proper categories, knowing that there could be duplicates of the same image but that additional processes (bot cleanup) can help resolve any duplication later. Metadata is written to Structured Data on Commons.
6	As a post-process, Casey enhances the media files with Structured Data on Commons, and eliminates any possible problems with duplicates.

User Journey: Engaging with the Wikimedia Community and Beyond


Phase	Narrative	Challenges	Tools and links
Preparation	Casey identifies a collection of objects and images their institution is interested in contributing to and enhancing on Commons.		Spreadsheet or database tool
Permissions	The institution ensures rights are cleared for uploading images from their collection.	Wikimedia Commons permissions system is rather complex if the uploader does not own the copyright	Commons:Volunteer_Response_Team requires an email to the VRTS system.
Discovery	Casey finds a Wikimedia Commons category of uploaded files from their museum's collection performed by their institution over the years and wants to supplement it with new or updated files. The category also has some uploaded files by volunteers, making things a bit more complex.	Uploads performed over time by multiple parties may be inconsistent	Petscan, pywikibot
Normalization	Casey needs to find out what images are currently in Commons and compare that to what is available from the institution. They use various tools to explore and consolidate files and clean up the Commons category tree.	Data and category cleanup required before proceeding	Cat-a-lot
Metadata evaluation	Casey inspects the metadata of the Commons files, to see if there are any unique identifiers in the description fields, such as a URL pointing to the original file or object page, or if there is an accession number or catalog number so that it can be exactly matched with the institution's records.
		Metadata in Commons templates (e.g. Artwork) being "semi-structured" isn't easy to work with, requiring coding solutions.	pywikibot, PAWS
		Metadata in Structured Data on Commons still not mature, with the query service (WCQS) hard to use in a scripting/bot environment.	WikiCommons Query Service (WCQS) or OpenRefine
Comparison	Casey creates a dump of Commons filenames and relevant metadata – unique identifiers, basic resolution, file size. Casey then compares them to the available files from the institution, seeing if they are the same or different.	Matching of files through checksums is imperfect, so comparison of basic resolution and file size is needed	Google Sheets and/or OpenRefine, or pywikibot
Task list generation	Casey can eliminate duplicates if the object number, filesize and resolution are the same. Otherwise, a list of files to be uploaded is generated, with relevant metadata.	Generated upload list may contain duplicates of previous uploads	Google Sheets, python
Upload	Casey bulk uploads the files to Wikimedia Commons in the proper categories, knowing that there could be duplicates of the same image, but in different resolutions.
		Bulk upload options to Commons are varied, depending on complexity of metadata.	Pattypan, url2Commons, flickypedia, pywikibot or OpenRefine
		Run a bot cleanup procedure to help resolve any duplicated uploads.	pywikibot
		Perform more individual categorization though Cat-a-lot runs slower than in the past due to API limits.	Cat-a-lot gadget
		Write relevant data to Structured Data on Commons	Quickstatements/Petscan for SDC, pywikibot, or OpenRefine
Feedback and Follow-up	Monitors the usage of uploaded images, gathers feedback from the Wikimedia community, and assesses metrics from relevant tools.	Use Commons and GLAM specific metrics tools.	GLAMorgan GLAMorous GLAM Wiki Dashboard