Jump to content

Structured data for GLAM-Wiki/Roundtripping/KMB

From Meta, a Wikimedia project coordination wiki

Metadataroundtripping Wikidata - Wikicommons - Runestones pictures

[edit]
by Magnus Sälgö user: salgo60 twitter salgo60 ORCID 0000-0003-2568-267X
GITHUB: salgo60/Litteraturbanken_wd_runes start: 3 Mar 2021
at LD4 2021 https://ld42021.sched.com/ - 2021 LD4 Conference on Linked Data in Libraries this paper was presented see video https://www.youtube.com/watch?v=GeDXzInR_mA 

connecting Wikidata (Q2013) / Wikimedia Commons (Q565) / RAÄ Swedish National Heritage Board (Q631844) / Uppsala University (Q185246) / Swedish Literature Bank (Q10567910) using Wikidata / Wikicommons and Structured Data in Commons that use Wikibase and implements semantic interoperability with Wikidata easy accessible with SPARQL federation.

Background

[edit]

A new project has been created Everlasting Runes (Q105378723) by Swedish National Heritage Board (Q631844) and Uppsala University (Q185246) see https://app.raa.se/open/runor / urn:nbn:se:uu:diva-354747 “Everlasting Runes”: A Research Platform and Linked Data Service for Runic Research - link grant.

Crowdsourcing starting with low hanging fruits were we easy can trust each other

[edit]
Runestone sign about DR 279
Example how a book published 1958 has its unique persistent identifier of a runeinscription Vg 92 but also have a reference to a numbering schema from a book Bautil published 1750 B 936. Wikidata store this relation in Västergötlands runic inscription 92 (Q105707871) Q105707871 and in Wikicommons we have semantic interoperability with Wikidata --> find pictures depicting Q105707871 = haswbstatement:P180=Q105707871

One problem we see when doing crowdsourcing is that the domain experts not always trust what a crowdsourcing community deliver. As runeinscriptions are a never ending interest of Swedish researchers and I guess the "rocket scientist of the 17th century" started this work and they early identified the "identification problem" and early understood the need of persistent unique identifiers as a need for not getting metadatadebt - something the GLAM sector still in 2021 not always deliver ;-) (see sad example from Europeana and artist "Carl Larsson"). We have now scanned books from the 1700 available online about the runinscriptions with those persistent unique identifiers and also 100 years old books with another numbering system that is Wikidata Scandinavian Runic-text Database ID (P1261) --> its rather easy to identify in a book for a crowd sourcing community what runeinscriptions they speak about as the experts already have added the unique persistent number to the object --> less trust issues between crowdsourcing people and domain experts plus in this case the photo in itself can confirm the metadata what the picture depict.

Runeinscriptions and structured data on Commons is a good example how Wikipedia can add value with crowdsourcing. Taking pictures of runestones can also be a rather easy task and need no major domain skills as the runeinscriptions often have a sign next to it 😉 with the unique persistant number (see example sign DR 279). And in old scanned books we often see runinscriptions are mentioned by the unique numbering schema that is easy to "translate" to the Wikidata Q-number using a property like Scandinavian Runic-text Database ID (P1261).

What has been done

[edit]

Since project SDC (video) - Structured Data on Commons has been implemented and have its own Wikibase it now opens up new possibilities for dataroundtripping with pictures and with a SPARQL query get information from both Wikidata and pictures in Wikimedia commons this access to machine readable data about WD objects and objects in Wikicommons that depicts those WD objects will be a big step forward of more advanced dataroundtripping and also possibilities tracing the original source for a picture in an machine readable easy way and also compare if Wikicommons depicts (P180) statement depicts the same THING as an external source state it depicts.

By uploading pictures to Wikicommons and add depict in SDC this is maybe a way for a community like Wikipedia to start add value to domain specialist by giving them the possibility to decide if the pictures add value or not for them. Long term maybe we can use the knowledge of domain specialist that confirms that a picture they have and we also store in Wikicommons depicts what we state in Wikicommons, maybe we need Signed statements in Wikicommons see EPIC so we can see that the depicts in Wikicommons was what the uploading institution stated.

The result is that we with a SPARQL/query(8725 pictures /same on a map) easy can fetch Wikicommons pictures of Runestones with the RAÄ id for a runestone in the application Everlasting Runes (Q105378723) and also see if this picture originally was uploaded from RAÄ....

miniatyr

What I have done

  1. Created some more Wikidata objects for Runestones
  2. Wikicommons: start add depict to pictures of Runestones in SDC - about 8200 pictures
  3. Wikicommons: start move the information of the KMBid to SDC - about 34 600 pictures (not just Runestones)
  1. Created SPARQL that finds all Runestones in Wikidata that is connected to RAÄ and finds all the pictures in Wikicommons that has depicts any of those Runestones in SDC
    1. Pictures / Map - 8200 pictures
    2. Pictures of Runestones that are from KMB/RAÄ / on a map - 4162 pictures
      1. Pictures of Runestones with source of file (P7482) = original creation by uploader (Q66458942) on a map - 2727 pictures

miniatyr

Issues

[edit]
  1. SPARQL using WCQS is just updated once a week and is in Beta see SPARQL in the shadow of Structured Data on Commons
  2. more difficult than expected to find "all" Wikicommons pictures depicting a Rune - issues/6
  3. SDC is new and I guess most pictures will not have depict, I guess we need better support for adding depicting in Wikicommons...
  4. today we have no dialogue with RAÄ/Uppsala how we describe a pictures like this is a picture from behind etc... issues/12 maybe not necessary
  5. some changes needs to be done how I have added data to SDC see Kanban board and last status report (swedish)

Next step

[edit]
Metadata problem in Europeana runestone S_FBM_photo_2M16_S_0096_107_87 that lack reference to earlier works as the book from 1750 with identifier -> persistant unique identifier Bautil 980 = Wikidata runic inscription Vg 130 (Q29576301) Q29576301 and in Wikicommons we have semantic interoperability with Wikidata --> find pictures depicting Q29576301 = haswbstatement:P180=Q29576301

Smaller steps

[edit]

Outside the scope

[edit]
  1. new Wikidata property: a WD property proposal for a dedicated property for "Everlasting runes" has been written as I think we loose functionality when one aggregator (Wikidata) - links to another aggregator K-samsök/Swedish Open Cultural Heritage URI (P1260) - but that is outside the scope of this initiative and I feel its more in the interest of institutions outside Wikidata how and if they will use Wikidata for Runestones and how they would like to link "Everlasting runes".
  2. en:Wikipedia/sv:Wikipedia: no linking using added Wikidata objects/properties will be done to Swedish Literature Bank or Everlasting runes in articles in en:Wikipedia or sv:Wikipedia
  3. no "Wikidata driven" templates will be developed in en:Wikipedia or sv:Wikipedia using the added data

RAÄ benefits with Dataroundtripping

[edit]

TBA ?!??!

[edit]

Swedish Literature Bank (Q10567910) is a Swedish project scanning all Swedish books from 19th century. They also have started to create a Literature map (litteraturkartan.se) with places related to literature

miniatyr

miniatyr

Swedish Literature Bank benefits with Dataroundtripping

[edit]

TBA Swedish Literature Bank ?!??!

Next step Dataroundtripping Swedish Literature Bank <-> Wikidata - position in a book and location on a map

[edit]

I would like to see that the map application (litteraturkartan.se) and Wikidata starts to share information.

Challenges I see

Misc

[edit]

GITHUB

[edit]

Entity Schema

[edit]

To get better quality in Wikidata we have also developed a schema for Runestones EntitySchema:E290, my wish is that we also will use schemas in Wikicommons for Runestone pictures.

Runestones and unique persistant identifiers since 1750

[edit]

One lesson learned is that Runestones research has been done in Sweden since before 1700 and as early as 1750 a book was published Bautil (Q10427451) that had its own numbering schema Bautil 1-1173 see book and map (work in progress). In 1900 a multi-volume catalogue was published en:Sveriges_runinskrifter that used the numbering schema we find in Wikidata property Scandinavian Runic-text Database ID (P1261) in this book we can see that they reference the older numbering schema from the book Bautil and many more ie. feels like "early Linked data" ;-) All those persistent identifiers makes it rather easy for me as a non domain specialist to connect pages in those books to Wikidata objects about the Runestones.

[edit]

Lesson learned

[edit]
  • persistent identifiers in the domain as Runestones has since before 1750 makes information easier to access and understand also for a non domain expert as me compare me trying to help FactGrid
  • if people as early as 1750 could use a numbering schema and unique persistent identifiers why cant we start share pictures 2021 with unique persistent identifiers from all sources and like the book from 1958 reference other sources using those persistent unique identifiers? i.e. when we upload pictures store also the history of this picture so that we can "backtrack its original source" and also find if any of the "previous locations" of this picture has added some useful trusted metadata after the picture was uploaded...
    • Recommended reading DOI: 10.5334/dsj-2019-054 "Proper Attribution for Curation and Maintenance of Research Collections: Metadata Recommendations of the RDA/TDWG Working Group"

Notebook examples - SPARQL

[edit]

More reading

[edit]