Jump to content

Wikidata For Wikimedia Projects/Projects/cache createSchemaElement

From Meta, a Wikimedia project coordination wiki
Tracked in Phabricator:
Task T352019

Improve article loading time by caching SkinAfterBottomScriptsHandler::createSchemaElement

Background

Every Wikipedia article page contains a JSON-LD block in its HTML-source. JSON-LD is used to structure linked data in article pages for better search engine and third-party data consumption.
Currently, when building the JSON-LD schema, two database queries are executed:

The Problem

[edit]

These function calls contribute significantly to the request time, with building the JSON-LD schema taking ~5% of the total request time when loading an article page.

The Solution

[edit]


Function 1: Cache getFirstRevision value for Linked Data Schema

[edit]

The Wikibase webhook SkinAfterBottomScriptsHandler::createSchema runs an expensive functiongetFirstRevision every time a Wiki article is loaded, accounting for approximately 2.5% of a pages total load-time, of every read-page.

Its operation calculates and outputs the first (and oldest) article version by retrieving all revisions, sorting them by date and ID, and selecting the oldest.

Caching this revision value will reduce the frequency of this expensive function being called and server-load will be reduced. Cached values will be added in the Parser cache output.

Development

[edit]

Development on this task started December 2024.
After deployment, some edge cases were discovered on page move, page preview or new page creation could lead to an error whereby the first revision timestamp hasn't been stored yet. A patch is being prepared in task T383657.

Deployment

[edit]

The patch was added to MediaWiki version MW-1.44.0-wmf.12 and was deployed to all Wiki groups on January 16, 2025.

The patch for edge-cases as mentioned in task T383657 will be merged to MW-1.44.0-wmf.18 and will be deployed to all Wikis by 27 February, 2025.


Function 2: Cache getDescription value for Linked Data Schema

[edit]

The function TermLookup::getDescription() is invoked whenever a Wiki page is read, it will query and return the Wikibase entity (Wikidata item) ID, the description and the language code. These values for the most part do not change, and requesting this information everytime a Wiki page is loaded is often unnecessary.

By caching this information for easy retrieval, the frequency of the function being invoked will be significantly reduced and so will the load on the Wikimedia servers.

The new criteria for a cache invalidation will become:

  • An edit is made to the Client page (the page/article where the function is being invoked from)
  • An edit is made to the Wikidata item Description, in the Userlanguage or fallback language (English)
  • Once per 30 Days if neither of the above happen.

Development

[edit]

Development on this task started December 2024.

Deployment

[edit]

The patch was merged to MediaWiki version MW-1.44.0-wmf.15 and will be deployed to all Wiki groups on February 6, 2025.