User:EpochFail/Journal/Annotations system
- This document summarizes requirements for a object/event annotation system for MediaWiki to support analytics (and logging).
Use cases
[edit]- Track the agents/tools that make revisions
- Bots
- Tools
- In wiki: twinkle, popups, etc.
- Extra-wiki: huggle, awb
- Tracking operations made by editors using an experimental interface in MediaWiki
- AFTv5
- Experiment - Link - Form
- AFTv5
Linking annotations with wiki-objects
[edit]Annotations should be general enough that they can store arbitrary annotation data about in-wiki objects and abstract events (log in) that have no corresponding row in a table.
Lessons from NoSQL (e.g. MongoDB)
[edit]MongoDB has an interesting feature called a Database Reference (see the manual entry). A database reference allows for a foreign key to any collection's document (table row) to be stored in a field and looked up automatically by the db system.
Strategies
[edit]Key-value annotations
[edit]See example table creation for revision annotations:
CREATE TABLE revision_annotation (
rev_id UNSIGNED INT,
type VARBINARY(255),
value ???,
KEY(rev_id, type),
KEY(type)
)
The type of value is left as ??? because there aren't any datatypes available that would efficiently store any type of potential value efficiently. Candidates:
- VARBINARY(255): Limited to 255 bytes. Relatively efficient since size is variable. Inefficient for numbers. Size limitation could encourage conservationism, but could lead to common bugs related to data truncation.
- MEDIUMBLOB/LARGEBLOB: Virtually unlimited for reasonable amounts of data. Relatively efficient since size is variable. Inefficient for numbers.
Preferably, annotation values will have a standardized data structure (e.g. JSON) to allow for relatively complex/related data to be stored in the annotation itself.