Talk:Abstract Wikipedia/Object creation requirements

Before the first step

Latest comment: 3 years ago15 comments3 people in discussion

Uncertain Z4/Type

The contributor who wants a new Z2/object might not be sure what kind of object it should be, so defining its Z4/Type should not be a necessary first step. That implies there is a class of defective Z2/objects (call them "drafts") and the question is: where should these reside within the repository? I think I prefer not to have a separate namespace, but that implies that Z2s have state. A "draft" can be progressively elaborated. Some elaborations may be more controlled, directed or guided than others. In many (most? all?) cases, these elaborations are comparable to changes to a "live" object (adding a type to a draft being comparable to changing a live object's type, for example). One crucial difference is that a draft is not guaranteed to be valid before the change and, indeed, it might become less valid as a result of the change (one might remove an inappropriate type without specifying a replacement type, for example).--GrounderUK (talk) 13:12, 9 December 2020 (UTC)Reply

Do you have an example of when the contributor wouldn't be sure about the type? These seems so fundamental that it is hard to imagine not to know that. Also, if you don't know about the type yet, you can't add any further keys yet as well... I am not sure about the use case. Can you maybe sketch out a user story? That would be helpful.

There is the idea of a Z99/Quote, a type that basically just quotes whatever you put into it, and doesn't further validate its content beyond well-formedness. Maybe that is sufficient? I guess we would need to see that play out once the prototypes are a bit more advanced. --DVrandecic (WMF) (talk) 20:46, 11 December 2020 (UTC)Reply

@DVrandecic (WMF): Thanks, Denny. I'm mostly thinking of functions in a pipeline, where a particular function consumes the evaluation of some other function and/or will have its evaluation consumed by some other function. As an example, suppose I want to compute an harmonic of a musical note. Ultimately, I want the name of the note, the number of complete octaves between the fundamental and the required harmonic, and the number of cents by which the named harmonic is sharp or flat. For example, in English text I could say "the third harmonic of B [ B (Q726738) ] is a couple of cents below the F# in the next octave" or "the fifth harmonic of B is nearly 14 cents below the E in the third octave". So, we start with the name of a note which may or may not specify which octave it's in (e.g. "B" or "C4" [ C4 (Q32700582) ]) and an ordinal that is effectively a positive integer. As it happens, we only need the integer for the core calculation, but should the evaluation return a result just in cents or in octaves, semitones (half-steps) and (residual) cents (or semitones and cents, or...)? To some extent, that depends on how the "musical interval" function is (or should be) defined. (For the curious, the result in cents is 1200*(log(n)/log(2)), where n is the harmonic.) Perhaps this is slightly contrived, but is there a type for "the name of a note which may or may not specify which octave it's in"? Or do we split off the octave into an integer (remembering that omitting it is different from specifying 0)? And do we recognise "octaves, semitones and (residual positive or negative) cents" as a single type? What about "simple musical interval" in semitones and cents, where semitones is an integer between 0 and 11 (inclusive) and cents is a decimal between (but not including) -100 and +100? Most importantly, for a new contributor who often punches [log] [÷] [2] [log] [×] [1200] into their calculator, how much do they need to learn about types before they can contribute their function (which is not restricted to integers)?--GrounderUK (talk) 17:02, 12 December 2020 (UTC)Reply

I understand, but in all these cases you would be creating a function, so you know you have to create a function. You may not be sure about the types of the arguments of your function, and even change that later, but in the end, you are always creating a function here - or do I misunderstand? --DVrandecic (WMF) (talk) 22:24, 17 December 2020 (UTC)Reply

@DVrandecic (WMF): I'm sure I'm misunderstanding something... Yes, I know my Z1 is a Z2 that's a Z8, but I don't know what my Z8/Function's Z1K1/type is unless I know all the Z8K2/argument types and its Z8K1/return type. It's the same with my "musical note" type; I know it's a string but I don't know what type of string it is (one where en:"B" = de:"H" = fr:"si" etc).--GrounderUK (talk) 23:29, 17 December 2020 (UTC)Reply

Ah, yes, you're right. That's something I haven't formulated anywhere. Yes, I am imagining some mystical magical UI that would need to know very early what type or generic type you want to use, but that would allow you to change the parameters for the generic type at any point. So yes, I have been unclear. And I am not sure how to get there. But if you know it is a function or a list, you can then change the arguments of the function or the type of the list later on, without having to redo the whole thing. I hope that works out. But yes, I never explicated that. Thanks for making this clearer! --DVrandecic (WMF) (talk) 15:22, 18 December 2020 (UTC)Reply

That makes sense, thanks. I have a few thoughts of the mystical magical type, which I hope to share before the end of the year (nothing major, as far as I can tell).--GrounderUK (talk) 16:32, 18 December 2020 (UTC)Reply

Comment from verdy_p

Isn't ot the same case that already exists also in Wikidata where we can create an entity and give several of its properties without necessarily saying which type(s) it is an instance of. As well for functions, we may not be sure about which type we'll use, or functions would just be in fact a type by themselves if they are seen as "constructors" (most of these would map from many Wikidata items).

Note that the decision made in Wikidata to make a distinction between instances and types is very bad, it complicates everything when ALL concepts (including "instances") can be seen as types (frequently we don't know if there is a single instance or that will never be one in the future; instanciation and derivation are in fat completely synonym in a database of knowledge and this separation is an legacy artefact from C++ and early object-programming concepts. C++ has failed in many aspects and has become an horror to program, and is very poor to represent knowledge, compared to other OO languages that allow all objects to be used as if they were types (i.e. instanciation and derivation are the same, there's just a new factory method/function to call, and those languages have the concept of "prototypes" to ease their implementation (e.g. Javascript/ECMAScript, Lua, C-Sharp...): this make those languages much more flexible, more easily extensible, without sacrifyin the isolation, security, and performance (reduction of dependancies is a domain where C++ has largely failed: the dependency graphs are rapidely becoming extremely tall and wide and very tricky).

If we want to represent human knowledge and allow enough flexibility for producing texts in human languages, we should stop the dichotomy between classes and objects (or types and instances) which is only useful in very restricted domain that have NO place for development, extension, and where all is frozen from the start (basically it only works for building theories with axiomatic models; it has nothing in common with real human culture, and it does not even work with almost all scientific areas of nature, where knowledge is always limited in scope and precision: physics, biology, chemistry, medecine, and even large branches of mathematics related to differential analysis, infinitesimal values, non-enumerable sets, fractional dimensions, probabilities and and fuzzy logic).

The same limitations also apply to all human sciences including linguistic: we have to put limits to the too strict dichotomic model dividing types and instances, and the too hierarhical model for derivation: in fact, each time we instanciate/derive a new type, the new entity has its own behavior which it can override including to produce errors when trying to use its "inherited" properties (when they were in fact overridden). The infinite recursion modeml implied by the strict hierarchival model is false because the separation of instances or derived types is almost never strict and some "distinctive" properties can be partly fuzz; if you derive and rederive many times, the fuzzy margins will grow up to the point that the fuzzy score will be 100% of the value, meaning that the value no longer as any meaning and is the same as "unknown" (like all other instances): so the hierarchic tree with separarte branches tend to have branches gluing together, and "type inference" is no longer possible (100% error margin on the infered values or behaviors).

Unfortunately, Wikidata, and the current model of functions still attempt to use the old pure hierarchic model of C++: I can already affirm today that this approach will completely fail for representing the real human knowledge and will fail even more miserably for producing a meaning text in a human language ! This will just produce an "heuristic" based on fuzzy statistics, but overinterpreted and then producing non-sense because we'll never measure the error margins. We need fuzzy logic and non hierarchical data models allowing exceptions, so we must admit that all objects should be also classes and their own types: the relation between a "parent" type and its instance is that is shares "some" properties within some margins of probabilities.

One good sign of that is the incredibly exploding number of unsolved problems in Wikidata for the data model abstraction, which is full of contradictions (with its current inference model which supposes infinitely transitive assertions, which is false in almost all cases of human sciences, geopolical topics, biology... as long as we don't include fuzzy logic and a way to properly compose the assertion with composound probabilistic margins.). this severaly limits the usability of Wikidata to make useful inference and correctly eliminate "dead branches" where the margins of errors are too large to provide any useful response. The same will be true with Wikifunctions if we want to infer texts in human languages (even when using Wikidata's new lexicographic data, which is in fact an oversimplified represetnation of what is more accuretly represented in Wikitionnary, with additional notes, notably on domains of applications and on linguistic level of speech, and historic data, because usage are also evolving over time and high higly contextual in their interpretation: that's why Wikitionnary also includes dated quotations from identifiable authors/sources, as well as many sidenotes to explicit the context of use: these are to be interpreted by humans, but they are extremely difficult to represent formally without fuzzy logic, i.e. estimations of accuracy for all lemes with mapped translations and all the necessary variants and derived terms; as well lingusitic grammars are full of exceptions in every language, and questions of style plays an important role if we want to be clear and sufficiently consistant: we also need to take into account some contextual terminologies defined by specific domain, sometines by specific authors, or by specific documents, sometimes as well influenced by current events in the news media and by fashions). It's very hard to be purely "factual" (even if facts are sourced, Wikipedia also needs to keep its NPOV and plurality of authors and a way to balance them: not everything is inferable by rules, they are always the result of mutual agreements at some limited time with some contributors; other factors also play a role, such as criteria of notability, which can influence and change the terminology used over time).

So I advocate an approach that will enable later to implement fuzzy logic rather than implementations based purely on binary inferences: even the evaluator should be able to handle "fuzzy values" and an estimator of errors, and some heuristic to evaluate the most promizeing branches and eliminate dead branches from which it is useless to continue exploring the data or composing functions to find a correct solution (which may not need to be "optimal", but just to be sufficiently accurate to be meaningful). as well the result of the evuluation may be stopped when reaching a sufficiently high threshold of accuracy, that may be inclureased later (if it is increased, the evaluation can be resumed but will take more time to converge to a good enough solution or a small set of solutions that can be compared and evaluated by humans to select the "best" looking one: a system of notations/votes mlay help, or the generated solution would be kept in a cache along with a human-written solution). Basiscally this way, the Abstract Wikipedia would become a hint to help translate articles more easily or to evaluate if proposed translations are reasonnably "trustable". It could as well become a helper to detect some kinds of abuses or errors (notably for NPOV, systemic biases, lack of coverage of some topics...): it could be used by admins and reviewers to improve the efficiency of their work on the most critical areas, using technics like recursive Bayesian estimation (used now by most antispam filters, but refined for their use in IA and robotics, and more recently for handle "big data" with "machine learning" processes). verdy_p (talk) 05:50, 21 December 2020 (UTC)Reply

Z4/Type redefinition through additional validation

@DVrandecic (WMF):I can't comment in detail on what @Verdy p: says, but I agree that Z4/Types of object will not naturally form a strict hierarchy. I suppose that the equivalent of adding a property to a Wikidata item is adding a validation check to a Wikifunctions Z2/Persistent object (through its Z4/Type). So the broader type of "positive integer" becomes the narrower type "non-zero positive integer" by the addition of the validation check "not zero". In practice, that means switching the object's type. But to the contributor, it is likely to feel more intuitive to add the value zero to the set of invalid values. Perhaps this calls for some mystical UI magic. Supposing for the moment that each validation check is implemented through a single function and corresponds to a specific Z5/Error, it should be possible to identify appropriate available types through the validation checks that their validator functions enforce. And it should also be possible to identify "adjacent" types that enforce the largest number of identical checks or the smallest number of additional checks (or, more generally, where the intersection between the sets of possible values is maximized; which is how the option to exclude zero might be made apparent).

Wikifunctions and Wikidata

[@DVrandecic (WMF):I think the first two thirds of this is relevant to the Abstract Wikipedia/Representation of languages question.]--GrounderUK (talk) 15:49, 13 March 2021 (UTC)Reply

For more natural "encyclopedic" types, we need to explore the interaction between Wikifunctions and Wikidata. If we query Wikidata through a function whose evaluation is a type (such as items that are an instance of pitch class (Q1760309)) we create the possibility of introducing side-effects into Wikifunctions through changes in Wikidata. That suggests we need to isolate the query from the population of the type enumeration by, for example, defining a constructor function for the type (the constructor would be non-functional but a function's use of the constructed type would not render it non-functional). This should also allow (but not require) the import of labels. And a type constructor would not automatically produce a Z2/Persistent object; when required, this would be done by the UI or by a generic constructor function. This allows contributors to identify a similar existing type and resolve any differences. It's likely that some individual Wikidata properties will underpin more than one Z4/Type, but the property (or conjunction of properties) will often be a useful way to discover relevant types, just as types will be a useful way to discover relevant objects. For example, a "population density" function could be applied to any Wikidata item with values for a population (P1082) property and an area (P2046) property (without considering what the item is an instance of).--GrounderUK (talk) 16:43, 29 December 2020 (UTC)Reply

Yes, you are right. I was hoping to defer that question to well after the launch (it's like the second major milestone after the launch, the first being "enable calls in other wikis"). There's plenty of questions regarding how to integrate with Wikidata, in particular with Wikidata's extremely flexible data model (which is now coming back to bite me after I defended it for a decade), and also how to deal with caching and changes in Wikidata. Any thoughts on that are welcome, and I am happy to read them!

If I understand your suggestion right is basically to create kind of shapes, either just based on a Wikidata item that acts as a type, or on a set of properties that are required in order to allow for a calculation to happen. I think, connecting that basically with the EntitySchemas in Wikidata might be a good idea, and leverage those. DVrandecic (WMF) (talk) 22:41, 25 June 2021 (UTC)Reply

@DVrandecic (WMF):I am not sure about a Wikidata Item that acts as a type. There are Property to Item statements that have this effect, often with intent, but I see that as incidental. But, yes, a set of (one ore more) properties, each somewhat qualified, either separately or in conjunction, serves to identify that a Wikidata Item is a valid member of the set of such Items that might be included in the enumeration of some corresponding Z4/Type. I think this can always be expressed as the where-clause in a SPARQL query, so the member of the set (an instance of the type) would be a row in the query results (which are the enumeration of the type). (This does not mean that a type constructor would necessarily be a SPARQL query.)

A change to a Wikidata Item can have the logical effect of adding it to or removing it from any number of such enumerations. If these were cached, the cache needs to be refreshed, sooner or later. But working through the implications on cached evaluation results within Wikifunctions is complex. The corresponding Z4/Type can help identify the scope of possible impact, of course. Beyond that, it depends how we characterize the use of the type within the function. It seems to me, though, that this is just a special case of the use of evaluations within evaluations (the enumeration of the Type being an evaluation used in the evaluations of some functions). So, a function that ranks places in a region by population density will be affected by changes to the value of a place's population, the value of its area and/or the enumeration of the set of places identified as being within that region and their populations and areas. In contrast, a function that returns a single place's population density, which uses the same Z4/Type for the place argument, will only be affected by changes to the value of the place's population and the value of its area (arguably, even if the place would somehow no longer appear within the enumeration of the type because, for example, its population is too small or out of date).--GrounderUK (talk) 13:12, 2 July 2021 (UTC)Reply

@GrounderUK: Oh, I like the idea of populating a type via a SPARQL query. That would be much simpler than what I said. And when they are used as an enum, we could just refer to the QID for a specific value. Clever. I like that. Something we should keep in mind for Milestone 2. -- DVrandecic (WMF) (talk) 21:30, 3 September 2021 (UTC)Reply

Discovery of Z2/Persistent objects

Reliable discovery of existing objects also lies behind "contributor wants to create an object." How does the contributor know that their desired object does not already exist? The need for an object does not emerge spontaneously; there must be some context for the new object, and the precise context may imply certain constraints that should apply to the new object. In any event, it is likely that the new object resembles some existing object, and the second can be a model for the first. But perhaps there is an object that would be a better model? Other objects with the same or a similar type ("adjacent", as above) can be identified by the mystical UI. (I suggest that there should be a "model object" for every kind of object. Where a Z4/Type is used by more than one object, the most helpful examples of its usage might be indicated.) Whichever object is chosen as the model, it can then be "de-constructed" by the inverse function to its constructor and amended within the UI (or exported from Wikifunctions as monolingually labelized text).--GrounderUK (talk) 16:05, 30 December 2020 (UTC)Reply

Oh, that's a great idea! I like it. Have showcase objects for each type. Do you think that should be formalized, or would the documentation be good enough? DVrandecic (WMF) (talk) 22:43, 25 June 2021 (UTC)Reply