Jump to content

AvoinGLAM/Reconciliation

From Meta, a Wikimedia project coordination wiki

Reconciliation

[edit]

This page is a project page for the explorations of reconciliation user interfaces at AvoinGLAM.

The problem

[edit]

When a user wants to import data to Wikidata or other Wikibases, including Structured Data on Wikimedia Commons, the most tedious task is to find out whether that data exists already. There are several tools and processes that the users are using, but there is clearly need for better choices.

Current options

[edit]

OpenRefine + Reconciliation API

[edit]

OpenRefine uses this API almost exclusively, which makes users turn to OpenRefine for the whole process. The advantages of using OpenRefine is that the whole workflow including data import to the tool, data wrangling, reconciliation and finally the import/update of the data in the destination. It also has a stable userbase and community. The disadvantages are the steep learning curve, inflexibility to cater for Wikimedia-specific needs and the shortcomings of the reconciliation interfaces that cannot handle the richness of Wikimedia data anymore.

Google Sheets + Wikipedia and Wikidata tools

[edit]

The Google reconciliation tool leverages reconciliation to users of everyday tools. The advantage is that it can easily be used by anyone who is accustomed to use Excel or Google Sheets. The biggest disadvantage is that the development has halted in favor of other options. The tools were initially created to search Wikipedia and find the related Wikidata items, and are not as powerful addressing Wikidata primarily.

Mix'n'Match

[edit]

More

Considerations

[edit]

Sustainability

[edit]
  • Community maintaining and fostering the tool(s).
  • Use of openly available APIs and technologies to allow anyone to contribute to the tool or make their own version of it.
  • Secure funding is possible from diverse sources.
  • Support multiple configurations to avoid the pitfalls of monopolies and governance challenges.

Ease of use

[edit]

Contributing to Wikimedia projects should be easier for users less familiar with the intricacies of the Wikimedia environment.

  • Using reconciliation with familiar tools that are preferably free and hosted online.
  • Premeditated choices for the users to avoid knowing all the details under the hood.

User interfaces

[edit]

Most of the tools try to manage reconciliation in tiny textual dialogs, while the information itself is multifaceted and multimodal. The user interfaces should take screen estate boldly and use the affordances of the rich, linked data available through the Wikimedia projects.

Harness the benefits of open source

[edit]
  • Tools developed can be made to serve multiple purposes and user bases.
  • Several different technologies can be combined in lightweight tools. Search can use the reconciliation API, MediaWiki API, SPARQL etc. and a combinations of these. Different tools can be linked to work with a reconciliation tool, for example plugging a recon interface to Google Sheets or OpenRefine.

AvoinGLAM reconciliation initiative

[edit]

AvoinGLAM has experimented with reconciliation interfaces with the following design principles.

  • The tool would be full-screen and use the screen estate to display data in several ways: Wikimedia project pages and content based on Wikidata ID, external web pages based on authority IDs on Wikidata or the dataset, map with coordinates from the dataset or the Wikidata candidate items. Further configurations can be explored.
  • It would sit on top of different tools for data manipulation, for example OpenRefine, Google Sheets, Open Data Editor or other spreadsheet apps and keep the exchange of data between the environments to a minimum.
  • It could combine the results from different matching methods and APIs, such as Wikimedia Reconciliation API, MediaWiki Action API, REST API, SPARQL, SDC, image hashes, SQL (for categories or links, for example).

User interface sketches

[edit]

Prototype

[edit]

The prototype is an exploration, a working proof of concept with a few functionalities. With that, we seek collaboration to develop it more and integrate with a few tools. The project will be part of our broader contributions to the cultural commons, aiming to work more collaboratively across institutional barriers to support the stewardship of cultural resources online.