Jump to content

Wikidata For Wikimedia Projects/Research

From Meta, a Wikimedia project coordination wiki

State of Wikidata usage on other Wikis

[edit]

Our UX designer and researcher, Elisha Cohen has created a comprehensive foundational research plan and below you can see some early high-level findings. This review gave us a basic understanding of how Wikidata integrations are working and started to identify gaps or areas for potential improvement. Information gathered this way establishes a knowledge base we can expand and improve, onboard our future team members, and identify redundant or duplicate documentation to update for the community and reveal gaps or areas of opportunity.


Primary Goals

[edit]


1: Gain and document a comprehensive, shared understanding of the current state of Wikidata integrations in the other Wikimedia projects and how the data flow works.

2: Learn about editor motivations, existing workflows, and problems encountered while using Wikidata’s data on Wikipedia and the sibling projects.

Supporting Goals

[edit]
  • Create a foundation and knowledge base the team will build on for the future
  • Arm ourselves with the data to make project-wide scoping decisions and target user groups
  • Identify first, small projects for onboarding developers


Click on this thumbnail to open the PDF on interim progress report on the state of Wikidata usage

High-Level Findings

[edit]


These are some early insights into how Wikimedians use Wikidata in the other sibling projects.

The progress report is available on Commons or you can click on the Thumbnail on the right of this page.

There are 3 main categories of Wikidata usage for Wikimedia projects:

  1. Sitelinks (slide 17)
    Wikidata items link directly to the corresponding content on Wikimedia projects.
    The Wikidata item page (Q#) acts as a central repository, facilitating language selection and platform switching (e.g., from Wikipedia to Wikisource or Wikivoyage).
  2. Embedding Statements (slide 18)
    Data on Wikidata is structured and language-neutral, allowing reuse across different language versions.
    Centralized data reduces redundancy; updates in one location automatically reflect everywhere it is used.
    Data is primarily invoked through Modules and Templates using Parser and Lua functions, commonly for Authority Control, Infoboxes, Coordinates, and References.
    Wikidata also needs to be able to track how its data is being used. This is required for page caching and updating data where it is being displayed.
    It is also important for these changes to be shown to the editors, particularly for anti-vandalism actions and watchlist entries.
  3. Supporting editors (slide 21)
    Wikidata helps in article creation by generating worklists and assessing notability through the number of existing sitelinks.
    Tools can generate rudimentary 'stub' articles from Wikidata facts using natural-language generation, aiding editors in article creation (MBabel).
    When Wikidata data is combined with content from other Wikimedia projects and visualised, it can reveal knowledge gaps and help inspire editors.


Additional Frameworks for understanding Wikidata integration


Data Flow Editor workflows typically revolve around data flow, which includes Wikidata's data being linked to or displayed on other sibling projects, and being editable on Wikidata. Changes on Wikidata are automatically updated where they are displayed, and a log of changes is tracked and presented in a readable format for editors, patrollers, and anti-vandalism roles.

Namespaces Certain types of use, such as embedding statements and visualisations, occur in the Main or Content namespaces. Uses on other, non-main namespaces include List-generation. Currently, inserting a statement is primarily done through the use of Templates, but in the future, this may also be done via WikiFunctions.

Content insertion How is the content being inserted visually represented on the page? It can be directly displayed (such as embedded statements and visualisations) or it can be integrated into the UI, such as the links in the language-switcher or short descriptions in the search results. Sometimes it is not inserted visually at all, such as tracking Categories.

Marking Visible indicators are often used to show content coming from Wikidata. Templates can include graphical indicators such as a pencil symbol (a shortcut link to edit the Wikidata statement), a miniature Wikidata logo, and an 'edit' link. Recent changes and watchlists are standardized with a 'D' to mark Wikidata entries. However, not all inserted content is marked: sitelinks and Short Descriptions.

Workflow Complexity and frequency Barriers to entry for editors using Wikidata on other projects vary, impacting how often they engage with it. Different uses require differing levels of technical expertise. Sitelinks are usually the first interaction editors have with Wikidata, (often without realizing it is Wikidata they are interacting with). Intermediate uses vary widely across the different Wikimedia projects but usually do not require a specialized technical knowledge. Advanced workflows are the rarest due to their high barriers to entry, requiring niche technical knowledge and familiarity with programming languages like Lua and SPARQL.