WikiCite 2016/Report/Group 3/Notes
Appearance
Workgroup session notes
[edit]Goal
[edit]- Model and store citation instances
Topics
[edit]- Use cases
- What are we working on?
- Discussion of semantic typing
- Data about the citation instance
- Where is the citation
- Are citations separable?
- Yes-- question of cost, maintenance
Tasks
[edit]- Other things to do
- Useful metatdata it needs
- What page the citation is on?
- if citations are separable items, what can we store?
- New feature coming -- attachments to wiki pages. (eg, categories)
- Versioning updates
- Changes to attachments change the page
- Wiki text social constraints
- javascript bot for: "wikitext unique identifier -> human readable text"
- Using Zotero data for consistency checks
- Need for querying, doable system for storing citations that can be queried
- Re-visit: are, and can, citations be fully separable item.
- Point: fully separable citations are useful for analysis
- BUT separable citations are much more brittle, and have problems in some encodings
- How to anchor, repetitive elements,
- Middle ground: citations can exist as separable, but still include placement mark in wiki page.
- Cross wiki things, cross pages things
- Possibly very simple 3-element model
- Source cited
- Cited bib data
- Anchor at this location
- Discussion of
- / ?standard to which the standard conforms of bib data//"type"
- - > normalized into bib data
- citations styles -- how the citation instance gets human readable rendered on the Wikipedia page.
- probably on a per page basis, or a user preferences
- discussion on rendering time vs user choice on citation styles
- is citation style per citation? template
- optional additions
- fragment, page/chapter/anchor ?
- sentiment ??
- timestamp created_at modified_at
- versions
- citations point to the source, does not reference the version of a bibliographic data
- multiple copies of citation instances in a single page
- related closely to...
- do citation instances have a global ID
- hash ( sensitive ID system) or UUID (very insensitive)
- "are page versions mutable?"
- perhaps just hash the snippet text around the citation, and hash that into the unique identifier
- ... answers "does the citation identifier change?"
- spectrum::
- always changes on every version ... never changes UUID -> both ends are near useless
- good ids are some combination of data elements and hashing in between
- suggestion:
- surrogate-ID(UUID) __ smart hash of slowly changing stuff __ full hash that changes each time
- middle point "smart hash" is difficult - determine when a citation changes
- point: UUIDs are really useful, we want them.
- required for diffs on two citations
- citation ranges - lorem ipsum: [4][5][6]
- co-citations?
- they would have the same anchor
- but then they wouldn't preserve order. (hard)
- POINT: very valuable for understanding how knowledge is connects, human curated connections by snippet
- moves the analysis away from re-processing pages
- discussion: how does an anchor work?
- some meaningful expression of where the citation lives
- discussion of use cases of co-citation analysis
- result: anchors need to be smart- and will assert citations get to the same location
- what does source cited "thing" looks like?
- just a URI?
- discussion: not all bib data in wikidata
- community norms for notability apply to data in wikidata (potential problem)
- organic growth of use of wikidata - all entries manually curated
- library base as alternative, but no curations
- what do we allow if it's not a URI?
- free text allowed?
- fallback to current wikitext? consensus: no, problematic. costly.
- citoid fields?
- use bibliographic fields from wikidata
- consensus:
- use a URI, but if not
- use a unique external identifier, if not
- use the same bibliographic fields as defined in the wikidata models
- for legacy citations:
- do nothing, or
- keep the wiki text
Still to be discussed
[edit]- references, notes, further reading categories covered.
Day 2
[edit]- EN wikipedia journal article citation template: https://en.wikipedia.org/wiki/Template:Cite_journal
- Wiki citational tools: https://en.wikipedia.org/wiki/Help:Citation_tools
- Guideline for citing sources: https://en.wikipedia.org/wiki/Wikipedia:Citing_sources
- Examples of JSON citation objects: http://api.richcitations.org/
Data Structure
[edit]- UUID #UUID-identities citation object. generated on demand. do we need this in wikitext?
- Citation target (URI) #bibliographic database object /or/ wikitext (with templates/parameters)
- Target Anchor #chapter, page, etc. This is optional, but can only come from the wikitex
- Citation origin (URI) #Wikipedia page specific revision
- Origin anchor #point or range. some kind of offset
- We discussed storing a structured representation of the citation traget locally, as an "attachments" (MCR). This seems complicated and confusoing though.
- Anchors: preferably machine-actionable; possibly non-actionable text; fails gracefully
- Hypothesis: Robust anchors for electronic documents: https://hypothes.is/about/
- much work done on how to mark an anchor location in a web page
- coordinated with Open Annotation
implementation mechanism citation object -> template -> wiki text -> html
Storage
[edit]Citation instances are defined and maintained as part of the wikitext. The complexity of this is minimized by maintaining the bibliographic data separately. A machine readable representation of the citation instances is extracted, and made available along with the article text (perhaps using Multi-Content-Revisions).
Editing
[edit]- How will this be edited? How does migration happen?
- You will have three or four out of five in Wikitext:
- UUID (magic word _UUID_ substed by PST. Optional?)
- citation target (URI. Wikidata itrem? Free-form wikitext? This needs to be explicit)
- target anchor (chapter, page, etc. This is optional, but can only come from the wikitext)
- Display style - different templates as mechanism for how to do "citation styles"
- discussion:
- Not part of "actual" citation instance
- Specified in wikitext for rendering
- Could be per-page, or per-user, instead of per-citation.
- Models / use case
- how the existing could be migrated
- new citations being created
- prevention of re-entering bibliographic data
- form based editing
- what we want to use stand-alone citations for
- citation recommendation: you cited A, maybe you want B,C,D
- You read A and B, you want to read C
- show us all teh citation from Oxford in 2005-2015 that cover physics pages
- name someone in an article, list the most cited works for that person
- most frequently cited works from the topic of htis page
- publishers - article level metrics, how are the articles being used on different pages
- which wikipedia users are creating the same citations repeatedly
- dependency tree for knowledge: fact X in article is supported by page Y, Z.. .etc. network analysis
Issues
[edit]- display style / human readability
- template / json/ functin
- use cases
- relationship to wikitext
- attachments
- roll-out community / partnerships
- make documentation & elucidate motivation/rationale
- Relationship to Recent Changes changes
- UUIDs and versioning
- Consistent editing interface
Use cases
[edit]- Migrating existing data
- create new enhanced citations
- what you use standalone citations for
- bibliographic use
resources
[edit]- anchor format - http://hypothes.is
- open annotation model
Identity of Citations
[edit]- use case: unchanging ID
- use case: IDs that change when citation changes
- where does the UUID comes from, how does it persist? when a page changes, we don't want the UUID to change
- easier if it's a hash
- if UUID doesn't change then rsearch is easier
- If origin anchor changes, it's the same citation
- Citation target stays the same
- What defines a citation, use and generation of UUID
- if a anchor-origin changes, is it the same citation? (depends on use case)
- if a citation target changes, is it the same citation? (no, it's different - not same uuid)
- UUID can be injected in the wikitext, so the "same" citation can be tracked across revisions:
{{cite:SomeId:123495093802}} becomes {{cite:SomeId:123495093802 | _uuid_ }}
uuid gets replaced in the text (pst), similar to ~~~~ is replaced by name
How to store and reference bibliographic data
[edit]- Referencing in Wikipedia via reference rather than item in Wikidata solves both scalibility and notability concern in Wikidata.
- Split between citation target and citation target anchor is arbitrary.
Two cases:
- citations to sources that exist as Wikidata items (e.g. books, prominent articles)
- citations to sources that won't become Wikipedia items (e.g. article x, from newspaper y, date z)
Idea: wikidata statements have sources that could be referenced. A wikipedia author might express:
- "this text expresses a notion that is modeled by that wikidata statement, so shor the source references that wikidata has to support this statement".
{{cite-item:Q5678|page=15-17|style=Harvard}} <--- citing a sources that isdescribed by an item
{{cite-statement:Q42|A47C4EA228B|style=Harvard}} <--- citing a wikidata statement, e.g. "Water / Melting-Point / 0°C", means "recycling" the references for this statement.
{{cite-statement}}
is equivalent to the way references should be shown along with values pulled from wikidata.
- Long discussion on how this could introduces the idea of citing 'facts' (e.g. item: Helium, statement: atomic number, value: 2, sources: list of publications... )
- Difficult to model (e.g. statements changing over time, versioning statements instead of full items) and prone to loops (e.g. sources that become items)
- This mechanism adds a level of indirection: text -> source becomes text -> statement -> source.
- We might allow citing a specific revision of the statement:
{{cite-statement:Q42|A47C4EA228B|rev=346528745|style=Harvard}}
- We might allow re-using only a specific reference (by id):
{{cite-statement:Q42|A47C4EA228B|ref-id=6A4EE82334C|style=Harvard}}
- We might allow re-using citations of all statements about a property:
{{cite-statement:Q42|P31|style=Harvard}}
- Citation instances are a relationship between a citing resource (the citation origin) and the cited resource (the citation target)
- Citation instances can be modeled as follows:
- ID, origin + anchor, target (bib data record) + anchor
- target points to the current version, not a specific version, so updates to the bib data are reflected in any new rendering
- it would be nice if the origin anchor would specify what section of text is covered by a citation, but this is uncommon, and should not be required.
- it would be nice for the target anchor to be as specific as possible
- Citation references are managed as part of wikitext (for now). The complexity is greatly reduced by being able to reference bibliographical data, instead of specifying it inline.
- The citation style is defined locally in wikitext, e.g. by specifying a template name or parameter.
- A machine-readable representation (e.g. as JSON) of citation instances is made available via a data API for every version of every page. This is achieved via a parser function or Lua library.
- ...use cases...
- Citations can reference bib data as:
- wikidata items (e.g. a book like The Origin of Species). Chapter, page, etc can be supplied as local anchor information.
- "recycling" the references attached to a wikidata statement (e.g. "water boils at 100°C" is supported by [1][2][3]...)
- Alternatively, citations may contain bibliographical data directly, as part of the wikitext; we continue to use existing mechanisms like templates to structure them
- Modelling all cited sources as separate wikidata items is impractical (maintenance overhead, community capacity, database scaling)
So:
- three ways to specify a citation: inline, item, or statement.
- citations managed in wikitext, available as JSON
Several details remain undecided:
- How to track citation identity across page revisions (inject a UUID into the wikitext?)
- How to specify anchors that are robust against editing?
Resources:
- Hypothesis: Robust anchors for electronic documents: https://hypothes.is/about/
- Citation Ontology: http://www.sparontologies.net/ontologies/cito/source.html
- Open Annotation Ontology: http://www.openannotation.org/spec/core/
- Rich Citations (structured JSON citation objects): http://alpha.richcitations.org/
- Comment: a Citation origin ought to be a specific edit to an article, not just the latest version, nor the entire history: https://en.wikipedia.org/w/index.php?title=Capybara&diff=271977127&oldid=271777948 This is what ties an assertion to the citation which putatively supports it.
- Report draft here: