Jump to content

Structured data for GLAM-Wiki/Prepare data

From Meta, a Wikimedia project coordination wiki
Structured data for GLAM-Wiki

Select the data and media files you want to contribute, and prepare them to be compatible with Wikimedia Commons and Wikidata.

Header 1

Clean up the data to be consistent and compatible with Wikimedia Commons and/or Wikidata.

Tools:

  • Spreadsheet software - allows non-programmers to run checks against existing Wikimedia content
  • OpenRefine (formerly Google Refine) - popular tool for advanced data cleaning, transformation and matching against Wikidata content. Its homepage includes video tutorials and a guide on how to use version 3.0 and higher for Wikidata manipulation and uploading.
  • PAWS and Pywikibot - for those with some programming experience allows for large scale querying and advanced actions.


Website scraping/ingest tools (if the data is available online but the partner can't produce data exports from its database)

  • Tabula - open source tool to extract tables from PDF files
  • PAWS - Python programming notebook environment on Wikimedia Tools Lab that can transfer records from an institution's API

    Header 2