Wikimedia CH/Grant apply/Documentation for 2025 Wikidata Graph Split
Infodata
[edit]- Name of the project: Documentation for 2025 Wikidata Graph Split
- Amount requested: 5000 CHF
- Type of grantee: Group
- Name of the contact: Lane Rasberry, user:bluerasberry
- Contact: lanerasberry
gmail.com
In case of questions, please write to grant
wikimedia.ch
The problem and the context
[edit]What is the problem you're trying to solve?
[edit]The Wikimedia Foundation is splitting the Wikidata SPARQL endpoint in 2025. Users of Wikidata:WikiCite tools and processes will be especially affected. Many user functions will break unless they are changed. There is a transitional period, where some changes happen around March 2025, then the split will be complete January 2026. This project seeks to document the tools and processes which must be updated to remain usable. The WikiCite metrics indicate that there are tens of thousands of unique annual users, hundreds of Wikimedia editors, and dozens of institutional partners who use and develop affected WikiCite content, and who would use documentation from this project to understand and react to the changes.
As background, Wikidata is undergoing major changes as the Wikimedia Foundation is splitting its data into two graphs. This is documented in Wikidata:SPARQL query service/WDQS graph split and as a May 2024 article in The Signpost. One of the split pieces will be the data of the d:WikiCite project, and the other will be everything else in Wikidata. The general effect of this is that all citations which seek to match Wikidata items for scholarly publications with any other Wikidata content will break unless rewritten. WikiCite as a project is one of the most developed Wikidata projects, and as such, this split is certain to affect thousands of users. A pilot split begins March 2025, the transitional period ends after December 2025, and after that point everyone will need to use the two separate Wikidata graphs to access this content.
None of this is easy to explain to casual users. Additionally, the Wikimedia Foundation is planning major technical upgrades to Wikidata in 2027, and if Wikidata and WikiCite stakeholders are to give community feedback to guide those upgrades, then it is necessary to better describe the problems, possible solutions, and the various user communities which develop WikiCite content.
What is your solution to this problem (please explain the context and the solution)?
[edit]We will produce documentation which will support Wikidata users in understanding the graph split and identifying projects related to the graph split where they can contribute.
Project goals
[edit]- Establish and maintain an FAQ on the graph split
- Note what Wikidata activities change, and what tools / workflows/ users are affected
- Convert existing meeting documentation on the graph split into a narrative on Meta-Wiki
- Report the results of Scholia user survey 2024 to give insight to a WikiCite user community, and to guide a future survey
- Convert Scholia hackathon outcomes (October 2024, November 2024) into documentation associated with the overall WikiCite project
- Do storytelling for the proposals of planned Wikidata developments, and why they matter
- Benchmarking of potential Blazegraph replacements (Qlever, Virtuoso, any others)
- Loading data dumps into a graph engine, then running federated queries with any number of SPARQL endpoints
- Re-assessing what content is desirable and reasonable for the Wikidata community to curate in Wikidata
- Rewriting Wikidata queries for federation
Project impact
[edit]How will you know if you have met your goals?
[edit]Do you have any goals or metrics around participation or content?
[edit]Project plan
[edit]Activities
[edit]Budget
[edit]Community engagement
[edit]