Jump to content

Wikidata For Wikimedia Projects/Research/Statement Signals

From Meta, a Wikimedia project coordination wiki

Statement Signals: Measuring Wikidata Usage on other Wikis

[edit]

Throughout Q2 and Q3, we worked with a data analytics consultant to investigate, research and produce a report on potential metrics that can be used for measuring Wikidata usage across Wikimedia projects.
Wikidata is widely utilized in many different ways, but the extent of its types and frequency of usage is poorly documented and difficult to measure.

Read the full report Wikimedia Commons or click on the PDF thumbnail below.

Goals

[edit]


  • Gain insights from trace data on Wikimedia servers of Wikidata usage across other wikis.

  • Establish a series of initial definitions for potential metrics of Wikidata usage on other wikis.

  • Develop code and notebooks to output, visualise and analyse these newly-defined metrics.

The scope was limited to Wikidata statements used on content pages.
Other Wikidata uses (such as sitelinks and lists) have not been included but are suggested for potential future projects.


Background Summary

[edit]

Despite Wikidata's widespread use in other Wikimedia projects, details and scale of its usage are neither well-documented or thoroughly studied. This is partly due to the technical complexity of Wikidata and the code-integrations needed to make it function in the other projects. Our Data analytics consultant has sought to equip us with quantitative data to establish a series of baseline metrics that can be used to better evaluate Wikidata usage.

This research is in part exploratory as the few past efforts to measure Wikidata usage have been limited with few reliable metrics.

This research aims to lay a foundation from which future projects and work can be evaluated against to track the increasing usage of Wikidata in the other Wikimedia projects.

The attached report includes findings of the current measurable prevalence and proliferation of Wikidata in Wikipedia and other Wikimedia project pages. Section 12 in particular outlines the recommended steps for improving the ability to measure Wikidata usage.


Main Takeaways

[edit]

The report found that broadly,

  • Widespread but uneven Wikidata usage:

Wikidata statements are heavily used on content pages across multiple Wiki-projects, but the distribution is uneven.

  • Differences by project type:

Wikidata usage varies greatly between projects (Wikipedia, Wikisource, Wikivoyage etc.), likely due to differences in purpose, community practices/policy and available resources and technical understanding.

  • Variation within project types:

The range of types of Wikidata usage within a project type varies greatly between instances (Catalan Wikipedia has a very wide use of Wikidata compared to many other language Wikipedia's).

  • Growth trend:

Usage of Wikidata statements is generally increasing, based on recent data (though only covering four months).

  • The results generally support the hypothesis that variation in Wikidata usage is mainly due to differences in communities and page content, particularly across:
    • Project type: Encyclopedia information vs. dictionary vs. travel listing (noted above).
    • Namespace: Article/content page vs. discussion or documentation page.
    • Community: Community and organisational policy, linguistic and cultural differences.

  • Additional notes: The {{#statements}} parser function sees relatively low use compared to {{#property}}, and property labels are rarely used instead of property IDs.

An expanded and more comprehensive version of the takeaways is available in the report below:

This exploratory project offers an initial proposal for metrics about Wikidata usage on other wikis, to support Wikimedia Deutschland’s Wikidata for Wikimedia Projects Team.

Glossary

[edit]

For a full glossary of terms mentioned here, please see page 65 of the full report.

  • Trace data: server data that has a human origin, and can be used as a record of activity.
  • Statements: provide structured data about an entity. They describe specific facts or claims about that entity, along with additional context or supporting information like references.
  • Snaks: parts or building blocks of a statement. Statements are comprised of a pair or more of snaks:
  • Main snak: The core fact or claim, such as a property-value pair.
  • Example: The entity Albert Einstein (Q937) has a Date of Birth (P569) property with a value (14 March 1879).
  • (optional) Qualifiers: Optional extra snaks that provide context for the claim.
  • (optional) References: Optional lists of snaks that give sources for the claim.
  • Content pages: A Wikimedia project page (excluding user-interface elements), usually but not exclusively in the main (or article) namespace, containing information intended for general public consumption.


List of Statement Signals

[edit]


To see a complete list of statement signals, click the pdf link below:

External contribution

[edit]

Andrew Russell Green (User:AndyRussG, formerly User:AGreen (WMF)) collaborated with the team to produce this report.