Jump to content

User:Beetstra/Lockdown Immutable data on WikiData

From Meta, a Wikimedia project coordination wiki

Lockdown immutable data on WikiData

  • Problem: WikiData contains a lot of data that is never going to change and which is heavily re-used on client wikis. E.g. the boiling point of water is 100 °C, 373.15 K. That is never going to change (only be determined more precise). There is, even for a wiki, absolutely no reason that that data is edited (and for some data: ever again). WikiData has a lot of such data. Any change to that data amounts to vandalism, as it is changing correct data into something that is incorrect.
  • Who would benefit: Editors on all Wikipedia's
  • Proposed solution: Create a 'reviewer' right on WikiData which enables those trusted editors to lock data which has been verified as being correct and immutable. After that the data can only be unlocked or changed by administrators.
Downsides

Data that is locked down cannot be changed anymore, which may be felt as 'unwiki-like'.

  • More comments:
  • Phabricator tickets:

Discussion

[edit]

Some clarification: on en.wikipedia we have had a drive to get identifiers for chemical compounds correct. The number of available ones varied at that time from 1 to about 10 (now it is likely 15 or even 20). After running through numerous pages I guess that in 5-10% of the articles there was inconsistency between the identifiers (that means: at least one of the identifiers was likely not correct with respect to the other one). If we put it conservatively at 5%, and an average of 5 identifiers per page then at least 1% of the identifiers is 'wrong'. Now you can go through them, and get them all correct, but every single time that someone changes one identifier on a page you have several people who are going to check again whether the change is in line with all the other identifiers (knowing that maybe this was still a wrong one or a wrongly checked one which may have needed the correction). Getting them all correct is an almost futile operation, as the amount of work that goes into checking whether someone 'made it wrong'/'corrected a wrong value' is massive.

This is the problem for all data points, but probably best illustrated for numerical data: we know that the boiling point of water is 100°C and that anyone who changes it to -10°C or 257°C is turning it into a wrong number. For more obscure chemicals (say acetone, not that obscure) the boiling point is much less known. If someone makes that 83°C ... (yes, even if it had a reference before, you have to check the reference).

Yes, it is correct that if data gets sufficiently used on wikis that there will be sufficient editors seeing changes (but for obscure material that may be up to 1 or 2 people per wiki), but all of them will still have to repeatedly check any change (including checking the references again and again). That is all unneeded, the boiling point of water is not changing significantly anytime soon. --Dirk Beetstra T C (en: U, T) 10:13, 30 January 2020 (UTC)