Jump to content

User:Jeblad/statements

From Meta, a Wikimedia project coordination wiki

Assume that you want to identify a value or values from the items and statements that exists at the repo, but in a fashion similar to w:Xpath. The whole tree would be flat, and only consisting of "nodes". References to items would be flatten out. If you want to reference a Norway as a root node you would write /Q20 or /Norway. The first one will allways work, but the last one can fail. If you write /Q20/P36 or /Norway/capital you get "Oslo". The forms can be mixed, so /Q20/capital and /Norway/P36 would mean the same, although they might fail due to disambiguation problems. If a lookup on "Norway" fail due to multiple hits, then it might disambiguate correctly at next level as a music group called "Norway" might not have a "capital".

Whats selected by the dot is the same as for normal node traversal, it simply selects the current node. The double dot is not so obvious, as there might be several parent nodes. One interpretation could be that the parent is all parents, that is all incoming links. An other interpretation is that the parent is the current traversed parent-child relation. Even another interpretation is that a parent relation does not exist outside the current document.

An attribute is interesting as there are very few normal attributes in an RDF, but also because it is completly valid to swap between attributes and values in XML if the element content is a single entry and it contains no children. That means it can be materialized and put into an attribute. In fact all elements that can be materialized in a non-ambiguitive way can be put into an attribute. That means //capital and //@capital is the same, but the later is a materialized version. What constitute a materialized version is interesting, should a materialized version of an item only consist of the label for example or is this @label. It can be argued that only properties (predicates) can be attributes, and then only properties holding simple values.

If a path is defined like /Norway//Oslo then all nodes named "Oslo" in the "Norway" node will be selected. Nodes further away that also contains "Oslo" will not be selected. This is important as extending the selection outside the current node will be a quite heavy operation. Note in perticular that /Norway//Oppland/population will work as "Oppland" exist within "Norway".

If a path does not extend to root it will be rewritten to contain root. If you use capital in the context of "Norway", then the path will be rewritten as /Norway/capital. This is important as the rewritten path can be cached and reused. A variation of this is to only cache what contains the root. Rationale for this is that only some of the extracted values can be reasonably reused. Likewise the entries that only extend from the current item should only be cached for the duration of a single page. A variation could be to use server-side cache for the current items extracts, and use memcached for extracts extending all the way from the root.

Selections might end up with several items. Unless made available to Lua they will be materialized as simplest possible form. If a path with predicates like /Norway/P150[population>250000] is used, then it will give a result "Østfold", "Akershus", "Oslo", "Buskerud", "Rogaland", "Hordaland", "Møre og Romsdal", and "Sør-Trøndelag". This imply that selected elements are printed out as a list of simple materializations.