Jump to content

Talk:Abstract Wikipedia/Early mockups

Add topic
From Meta, a Wikimedia project coordination wiki
Latest comment: 3 years ago by DVrandecic (WMF) in topic Storage of composition?

Name of functions

[edit]

@Denny and DVrandecic (WMF): In these mockups, you assumes a function have an unique name. Are labels intended to be unique? In Wikidata, property labels are only unique per language; it is possible that "foo" means one thing in English and another in German.--GZWDer (talk) 03:55, 21 July 2020 (UTC)Reply

@GZWDer: Thanks for asking! Here I assume labels are unique per type and per language. I first assumed, they should be unique over the whole language, but that lead to a few really ugly labels, and it seems sufficient to relax that to be unique per type and language. That is similar to Wikidata items, which are unique per language and description. --DVrandecic (WMF) (talk) 16:18, 21 July 2020 (UTC)Reply

Namespaces?

[edit]
Most programming languages do have namespaces. Will Wikifunctions have namespaces for the functions?
Wikidata having unique name for properties comes with Wikidata having a property creation process that prevents users from creating new properties to prevent congestion of the namespace from poorly chosen names. What's the idea of preventing the main namespace of Wikifunctions from congestion/bad-naming? What are the ideas for renaming functions? ChristianKl01:21, 16 January 2021 (UTC)Reply
Remember that Wikifunctions will be indexed by their Z-ID (similar to Q-ID in Wikidata). So they will be unique from the start. The "names" however will be like labels (with possible aliases) in Wikidata, they may also be translated. It's not clear if we need another kind of object for grouping related functions into another meta-object (with its own Z-ID and its own translatable set of label and aliases for each language): adding a namespace it may be a good idea, or it could just be a "property" (like in Wikidata), and in some cases, the same "Z-functions" object could have multiple occurences of this property, allowing to link it to several "Z-namespaces". The "canonical" name of a Zfunction will not necessarily be welldefined, it will vary from user to user, depending on their human language of choice.
Conversion from a "plaintext" source code to the canonical JSON form using ZID will need their own parsers (to support an actual programming syntax), some parsers may not need it and the plain-text form of the algorithm is not necesarily the unique one (e.g. the implementation could be programmed graphically with building blocks, or using autocollapsible webforms, with "click and drag" Ui components. At runtime, the canonical JSON will only reference ZID: the actual source code for implementing basic functions may be written in various programming languages, each one with its own internal namespaces if needed, or none if programmed in C. Binding the "low-level" implemantation to ZID and sets of labels/aliases in Wikifunctions shoudl be independant of the internal names used in the implementation (remember: there could be multiple implementations: internal to the Wikilambda runtime, or external on a server, or client-side in Javascript...
So collisions of "names" are definitely not an issue, in fact this will happen always, as soon as we want these names to be accessible to all users (not just programmers understanding the "technical English" jargon, often specific to each programming language): ZID will be what will allow to link them clearly and unambigusously, names will just be a human facility and does not need to be completely unique (see the similar rationale about names of elements and properties in Wikidata, which works even without any "namespace": uniqueness requires the tuple (label, description, BCP 47 language code) to be unique, so collisions are avoided by language and description, without creating a real "namespace" whose members would be enumerable; for enumerations, it requries defining properties, that's why I think that "namespaces" for Wikifunctions should also use properties, like for Wikidata elements). verdy_p (talk) 19:30, 16 January 2021 (UTC)Reply

Storage of composition?

[edit]

@Denny and DVrandecic (WMF): How is individual function in composition stored?

  • Referred by ZID - this makes it not easily portable to other instances (e.g. importing function from test Wikilambda will break)
  • Referred by name - 1. A function does not have single name, it have one per language; 2. Referred by name is prone to vandalism 3. This require individual implementation having unique names
  • Other ideas: Referred by GUID, Referred by hash

--GZWDer (talk) 04:35, 22 July 2020 (UTC)Reply

@GZWDer: By ZID. Cross-wiki importing of multiple installations is not envisioned. I'm not sure about cross-wiki calling ({{#lambda:…}}), but I suppose we'd have to rely on ZIDs, too, to avoid too much complexity/magic. Jdforrester (WMF) (talk) 09:16, 22 July 2020 (UTC)Reply
@GZWDer: Regarding different instances of Wikilambda, we'll figure that out later. My first hunch is to rely on a namespacing mechanism. --DVrandecic (WMF) (talk) 14:14, 22 July 2020 (UTC)Reply
@Denny, DVrandecic (WMF), and Jdforrester (WMF): Note: what I said is if we have a test Wikilambda, it will be a pain to move content (i.e. function definitions and implementations) between Wikilambda and test Wikilambda. By comparison, templates and modules can be easily transwikied, or even copied directly.--GZWDer (talk) 14:17, 22 July 2020 (UTC)Reply
@GZWDer: thanks for clarifying! Yes, that's a good point. We'll probably need something like a "bootstrap" content that will have fixed ZIDs for testing purposes. --DVrandecic (WMF) (talk) 14:22, 22 July 2020 (UTC)Reply
@Denny, DVrandecic (WMF), and Jdforrester (WMF): This is not enough, especially when new features are introduced and there is some content in both projects.--GZWDer (talk) 16:33, 22 July 2020 (UTC)Reply
@Denny, DVrandecic (WMF), and Jdforrester (WMF): Referencing functions by ZID is a good solution; but for the testing framework, or for regualr evaluations of performance, we'll need to be able to select implementations specifically. Even the evaluator may have to choose between several implementations (depending on availability of servers or local services proposed by the client, e.g. Javascript support in its browser). So we need a way to specify and expose the capabilities of evaluators, and their performances/charge or availability (and if we have several candidates, a way to choose an alternate implementation). The same implementation could be also in a container running over a farm of servers (internal or externals), and may be containers could be mirrored and deployed transparently, possibly temporarily on demand, on servers that have enough resources for some time to run them or that are already connected online (notably web browers of individual servers, e.g. if they have a support for frameworks like Node.js and enough storage, including for caching their results). People may also want to deploy this in a locally installable container service running separately of their browers (so independantly of their current navigation), then contributing to some P2P computing grid (for that we'd just need a discovery service, possibly backed by a DHT like the one used by BitTorrent, which could as well be used to mirror and distribute the workload modules for deployable functions, or some collections of related functions (e.g. featuring some "topic" tags) with more interest (where they would be stored more permanently or cached for longer time as long as storage is available.
As well, beside the core function evaluators, we'll need caches: caches are also distributable (e.g. using Memcached, possibly as well deployable in a container as one core "function", featuring database capabilities for database like storages, or file/data storage, or simply transparent HTTP proxies serving classic web queries). Managing the cachability (like in HTTP/HTTPS/HTTPv2...) wouild require a set of metadata and should be using existing web standards (but if we accept to use external evaluators or caches, there will be interesting issues about privacy that we MUST solve according to legal requirements and restrictions: required connection logs, session cookies, metadata of clients present in their web requests: those hosting these external services must have a privacy policies and have a contact point, and these may need to be evaluated; some of them may need to be blocked or blacklisted when needed, including to respect legal orders to no longer relay data with them without more explicit authorizations; otherwise, there's a risk that those offering these external evaluators could be overwhelmingly bad advertizers, or spies, or criminals trying to abuse the system to distribute their spew, including with the help of caches offered by others on the distributed network).
So basically a solid set of trustable caches should first be built, using conventional contracts with the Foundation, requiring them to expose some usage and performance metrics, as well as being able to honor cancel requests for malicious contents or spams that would several damage the reliability and stability of results we get from the functions. Ideally, then, the solution would be to implement caches using verifiable digital signatures created from trusted evaluators, associating a request with its response, and as well the evaluators should also be integrable with a digitally signed contract (just like with blockchains used for cryptocurrencies), so that we can easily detect false results generated by a bad evaluator (malicious or damaged, or having bugs discovered later), then instruct caches to discard their results, forcing then clients to search again from another evaluation on the network.
Multiple implementations will also be needed to solve bugs; but as well, if some complete evaluator takes too much resource and no result can be found immediately, it would be helpful to have partial evaluation (with then different results, less complete or with higher margins of errors: implementations should then be able to provide a "score" in that case. The functiosn compositors would still be able to return some useful result even if it's less precise (e.g. if a translation is not fully available and some fallbacks have to be used, or the result is returned only in abstract form but still usable by using another translation function, on demand by users if they really need it and don't want to use external translation services like Google Translate to correctly understand these results)
Translating results will often not be available and external translations will be needed. But users will want to specify themselves their preferences of languages, including for fallbacks, or may just use another slower implementation that just returns the translated results directly in their language of choice: the unicity of results for the same request should then not be absolutely required at all times. So in most cases, we'll use Wikifunctions and will let it use the best implementations it has: using its known evaluators if available, or its caches.
Caches will also be needed for evaluating functions that are very long to compute, such as some statistics reports requiring to agregate large databases and performing complex evaluations from large datasets (that need to be downloaded or that is only available with many partial requests possibly on multiple servers or via different networks with different performances). As well statistics or quality assessment frequently need to to regularly updated to show the progress (not all computation may be done by functions directly but using human groups in a community project (consider how CLDR works: it is large, very slow to update, requires mutliple steps including vetting, and it is never complete, not even in English). And for most human knowleges, a result is only good according to the current state of what. This is just a current view of what is the best for now. We could collect quality assments with references of facts and sources, but these are also partial, possibly oriented with biases, discovered very late (solving these biases will require patience, in order to find relevant people to expose their problems or those that were not represented and ignored by "trusted sources"). So we need partial results, and a way to measure their progress (or regression) over time in terms of quality, relevancy or coverage. These can be used to create synthetic results that will evolve over time. Caches will progressively be cleaned and reanalyzed but should be able to return their existing data even it knows that another result is pending but still not available, or not ealuated/vetted. As well translations are generally never unique for the same source or abstract text, and the same target language (there are synonyms, or better styles preferable for some audiences)
So multiple implementations will be present, we need ways to identify them, ways to discover them, ways to measure their performances and reliability/security/coverage/completeness/freshness and community-given vetting scores. verdy_p (talk) 16:20, 18 June 2021 (UTC)Reply
As usual, you have a lot of really good points here, and a number of good sketches for solutions. Thanks! I do think you are in many cases like three or four steps ahead of the problems that we currently have. I agree that - if all goes really well! - we will eventually have these problems to solve and I am very much looking forward to having them (because it means we are successful!)
There are two kinds of implications from your post: the ones that have immediate relevance, and the ones that will hopefully be relevant in the medium-term future, once we have something like a distributed peer-to-peer network of compute nodes sharing resources (goodness, how cool will that be!) Event though I might be easily nerd-sniped into the second kind, I'll skip them today and will focus on the ones with immediate implications.
First: being able to select an implementation to run and having all the information necessary to make a good selection are two separate concerns. Yes, we will make it possible to select the implementation (basically, in the Z7K1, when you select the function, you can either just refer to a Z8, or you can copy a Z8 and modify the list of implementations in Z8K4 to only have the implementations you choose - which could be a single one).
Second: but how to select the right implementation if you don't know the capabilities of the endpoints? For now we will use the simple most solution: there is only one orchestrator, and the orchestrator has hard-coded knowledge of all evaluators it can access. Yes, I know, that is not a long-term solution, but it turns this question from the first kind (requires an immediate answer) to a second kind (requires a solution eventually when we are successful enough)
Third: now there is the most difficult question - do any of the answers to the questions that we will face in the future require certain decisions at this point? Or is the current plan malleable enough to allow for the future answers to be accommodated for? The answer to that is that I hope it is, but I don't know. Looking through your thoughts and possible sketched out futures here, it looks like they all should be workable with organic extensions to the system as we plan it now. Obviously no, it won't be possible to do these things with the system we are building now without changes, but there seems a pretty simple path forward for any of those ideas to be implemented within the framework we are building now, not requiring major changes to the system.
Obviously, I can be wrong with that assessment. If you (or anyone else) thinks I am, I would be super thankful if the concrete issues were pointed out. Until then, I keep reading your ideas, and keeping them in mind as an inspiration for the future, when cryptographically signed function evaluations and proofs are exchanged between zero-trust nodes.
(I am sure you are aware that there is a decade or two of research in that area, and that we should be just stealing ideas in order to build our system - at least that's my intention.)
Honestly, thank you for the inspirational future and problems you paint here. I really appreciate how optimistic you are about the future of Wikifunctions. I honestly hope that we are building the right foundation to justify that optimism. --DVrandecic (WMF) (talk) 22:21, 25 June 2021 (UTC)Reply

Relation of Constructor and Renderer

[edit]

I am assuming that the Constructor is basically like a template - you fill it in with the values (Wikidata ID's, dates, numbers) in a particular context. So the Abstract WikiText for the first examples might look something like {{Z272377|Q311440|Q1253|Q81066|1 Jan 2017}} or maybe more like {{Z272377|K1:Q311440|K2:Q1253|K3:Q81066|K4:2017-01-01}} to handle optional keys etc. Hopefully there would be a UI to fill in the form so you don't have to look up those ID's separately. So then the English (or any other language) renderer takes that template and produces the text as you illustrate. Would it make sense to call the filled-in template an object of type "Succession (Z272377)"? And then the renderer just takes that object as value, rather than specifying each of the keys separately? Anyway, thanks for these mockups, they make a lot of sense to me generally! ArthurPSmith (talk) 14:58, 22 July 2020 (UTC)Reply

@ArthurPSmith: Yes, the Constructor represents the base data elements necessary for construction of natural language, whereas the Renderer is the implementation that transforms that data to produce the natural language. In the web interface mockup, I think you make a good point that the data entry for the Renderer ought to be clear that the thing on which it's operating is a data structure of a particular Constructor type. The subheading in the Renderer tries to indicate something to this effect, but I'm wondering are you thinking it might be helpful to lay that out more clearly in the UI (e.g., nesting the function arguments under the expected type string of the constructor)? I haven't spoken at length with teammates, but I think we expect a fair amount of autosuggest type input boxes to simplify things for users, yes! The shorthand template syntax you provided may be a way to do it that's familiar, although in the context of a particular content wiki it may make more sense to support localized renderer names (which would be enumerated in the wiki of functions) - but actually, were you sharing that syntax thinking about the wiki of functions, the content wikis, or both? Different examples of this nature have been shown in different documents, but I was curious about your take. Suffice it to say there's much to be worked out and we will want to examine a number of approaches and this is slideware and all that, but really good questions you raised! --ABaso (WMF) (talk) 12:54, 24 July 2020 (UTC)Reply
@ABaso (WMF): Ah, so I'm guessing the slideware here wasn't just produced by Denny himself then? Thanks for getting these clear examples out! On your questions... Do the renderers need to be called explicitly? If they are I guess what I was thinking was they would be called with the constructor as the explicit argument, not the components of the constructor. On the other hand if they are simply implicit - given a particular constructor and a language, you use the renderer for that automatically, then I guess we don't really have to worry about what it looks like when called. I was thinking about the content wikis specifically for the template-style syntax. That is, I'm imagining an "abstractwiki" page for a particular topic that has a series of such template-constructors that might in abstractwikitext format look like {{headercontent|...}}{{infobox|...}}{{image|...}}{{firstsection|...}}{{secondsection|...}}...{{references}}{{external_links}} or perhaps more granular with constructors for each paragraph or sentence. Then for a particular language this content would be turned into readable text by renderers for each of those constructors. An alternate approach (and one that might be helpful in the shorter term when we don't have so many constructors or renderers) might be for each language wiki to allow insertion of an "abstracttext" section that looks like a series of these constructors, rendered for that language. The problem with that approach though is you don't have a single abstracttext entry for that topic, but a possibly different one in every language that chooses to do it. ArthurPSmith (talk) 14:33, 24 July 2020 (UTC)Reply
@ArthurPSmith: briefly, Denny did create the slideware, although James, Nick (I think?), and I took a quick look around the time of the posting. I'll reply black on the rest of your comment, may be a business day or two. Thanks!
@ArthurPSmith: so in the UI for end users to try out functions, input fields for the function arguments might be fine, but I think we're assuming explicit function name nesting (imagine, for example, there's a show source code toggle, that ought to show an actual function name / z-ID). A canonical article model with well defined potential elements of the sort you suggest for the central abstract article seems sensible and probably a good way to reasonably avoid detrimental forking. I wonder if forking should happen at the per-language renderer level, although information hierarchy and the actual prose to be generated interplay a lot. One thing to be wary of is potentially recreating the copy-paste orphan phenomenon of templates, but it seems like there's a bit more work to determine the right sort of approach. Would it make sense to carry the discussion about article "building blocks" to Talk:Abstract Wikipedia#Hybrid article)?
@ABaso (WMF): Can we pretend for a moment that a function does not actually have "an actual function name / Z-ID"? Even if it really does, can we pretend that there's no reason why someone using the function or changing the function would need to know what its name is? I mean, you need to know where it lives and it needs to be able to have a name (or address) in every language. But let's pretend that all its aliases (including its zID) are equally secondary. The same applies for the function's arguments. Now, do we envisage mixed-language invocations? If so, can a name–value pair be bilingual? Can I call {{sv:funktion=de:Fubar| Q1027788=it:testo("per esempio"}} (passing an Italian string as an argument to the foobar function), or something equally daft? (Because I saw this great function on the Swedish Wikipedia, and they got it from the German and I don't see why it wouldn't work in Italian...) What about {{Q1027788=it:testo("per esempio"|sv:funktion=de:Fubar}}? And is {{foo|"for example"}} converted to {{en:foo|en:text("for example")}} if my language is set to english? And if it's not, could I say {{en:foo|"for example"}}? (Sorry if my "foo" looks like a "fool".)
@GrounderUK: good questions. And funny, on the foo-pipe tomfoolery :) There are so many potential use cases there I can see why and why not, and how best and how best not to support that sort of thing...some of this will definitely be constrained to facets of the software architecture, and as I think @GZWDer: suggests, potential UXes. Nonetheless, do you have a particular point of view on this? If we should take it to a different talk page, please do advise!
@ABaso (WMF): I do not have a settled view on this, but it's probably best to pick this up under Talk:Abstract_Wikipedia/Function_model#Internationalization_concerns for now. :) "tomfoo|ery" is right!--GrounderUK (talk) 16:15, 28 July 2020 (UTC)Reply
We need a Scratch-like visual editor of contents.--GZWDer (talk) 18:29, 24 July 2020 (UTC)Reply

Conjunction

[edit]

I know this is just a list of oversimplified examples/mockups ("just meant to roughly display the idea", as you said), but the example of generating a conjunction from data really drives home how complicated this is all going to be. Even ignoring things like adjectives and joint prepositional phrases, things like definite/indefinite articles get very complicated, even in English. Compare the phrases "the president and vice president" (or "the president, vice president, and chief of staff") with "the United States, the United Kingdom, and Canada", or "an apple, a pear, and a plum".

I haven't read up on NLG. Are there existing established methods for handling things like this reasonably well? --Yair rand (talk) 03:22, 5 August 2020 (UTC)Reply

@Yair rand:, thanks for the question, yes, fortunately there are. E.g. d:Q5593683 deals with that by putting the definiteness into the noun phrase. E.g. in the example with the countries it is part of the lexicographic entity, in the the vice president example the interesting part is that there is no difference between "the president and vice president" and "the president and the vice president", etc.
The example with the conjunctions is indeed vastly oversimplified as the type would never be a list of strings as the tests make it look like, but would always need to be an array of noun phrases (as claimed by the type of the function). And the noun phrase would have a slot for its definiteness.
This becomes more clearly visible in languages that have cases on the nouns, where each of the elements of the conjunction would need to inflect to the proper case. So the result of the conjunction would not be a string (as suggested by the tests) but rather a noun phrase again, which, in return could be inflected.
So sorry for these oversimplified examples, and for the confusion they create. But as said, the goal here is to convey a rough idea of what we are planning for. Reality will look rather more complex.
Thank you for the question! --DVrandecic (WMF) (talk) 02:27, 21 August 2020 (UTC)Reply

Views - Display of Arguments

[edit]
Adds tabular grid, as well as adding Returns and Tests

It's probably useful to actually say "arguments" or "parameters" within these displays. The bold-only approach currently shown isn't enough of an indicator of what the element really is. So, I think that the arguments should be listed vertically, instead of presented horizontally (with a comma separator) and given a label Arguments: and also Notes: Perhaps in table display with possible Notes: auto-retrieved from the function docs? The reasoning is that sometimes there will be a lot to say about arguments themselves of the functions. It's also a more pleasing look for those functions that might have many arguments and not only two 2. -- Thadguidry (talk) 18:46, 24 August 2020 (UTC)Reply

(Screenshot markup fixed :) ) Quiddity (WMF) (talk) 20:17, 24 August 2020 (UTC)Reply

Function contracts

[edit]

@DVrandecic (WMF): As a professional software engineer, I think Wikifunctions is a wonderful project with a huge potential, congratulations for all people involved on it. These early mockups also look really good and I would like to suggest a couple of ideas that maybe you want to consider too. Sorry if you already discussed this topic, I was unable to find any page with improvement suggestions here or in phabricator.

Maybe it's a good idea to add some function contracts, as already provided by different programming languages (like Eiffel, Spark-Ada, or JML; or even Python with different packages), that is:

  1. Preconditions: A new section after the function arguments with a list of boolean predicates indicating the conditions required by the arguments before calling this function (e.g. list cannot be empty, or first parameter must be greater than the second one)
  2. Postconditions: A new section after the function arguments with a list of boolean predicates indicating the conditions that are guaranteed by the result of the function (e.g. result is between zero and the first parameter, or the length of the output list is the length of the input list plus one)
  3. Type invariants: A new section in the data type page with a list of boolean predicates indicating the conditions met by every value of this data type (e.g. value must be strictly greater than zero, or is a prime number)

Probably the simplest approach is that each predicate is just a function in the same namespace as the rest of the functions. In programming languages usually extra operations are allowed in contract predicates (like the "for all" or "there is at least one" quantifiers), but this may be optional. In pre-conditions it would be required just to reference the function arguments, and in post-conditions a way to reference the function result will be needed. As I understand that arguments cannot be modified, there is no need to reference in postconditions the original value of a parameter at function start. All predicates of the list must be true (i.e. a logical and of all the predicates), and preconditions can be as detailed as needed, probably being also useful for renderers, e.g. argument must be an wikidata item of a human, and also already dead.

Besides formally documenting the function for implementers and users (e.g. if a precondition fails, the user will get a unified and clear error message, without the need to handle the different errors inside each function implementation), postconditions are very useful for automatically checking the results during tests, and would be nice that the platform generates a report with constraints violations for each implementation in case a result doesn't fulfill the postcondition with a set of input parameters (so the implementation or postcondition can be corrected). Type invariants would be implicit function preconditions of every function with arguments of that type, and implicit postconditions of every function with a result of that data type.

Other advantages usually obtained is the potential to simplify implementations because some defensive code can be reduced thanks to the preconditions, and avoids the need to handle exceptional situations in a compatible way between all supported languages. Maybe "robustness tests" should also be provided, i.e. some special tests for a function checking that some parameters are not allowed by the current preconditions of the function.

I hope this helps in the design process, and thank you very much again for this awesome project. —surueña 21:26, 21 March 2021 (UTC)Reply

@Suruena: I really like the suggestion. I think the idea of type invariants will be automatically fulfilled, since every type comes with a validator, and the validator is basically a predicate. Regarding pre- and post-conditions, I was thinking that the type system would lift most of that weight, but it obviously can't do the whole job. It is an interesting separation of the work, as you correctly point out - it would allow for the implementations to be less defensive and clearer. What I hope is that it would be possible to basically implement predicates, as you suggest, the core function, and a function that takes the predicates and the core function and wraps them together to create a classical function. Then all such wrapped functions could use the post- and pre-conditions as they see fit etc.
So, in short, yes, I hope that all of this will be possible, but I don't think it needs to be part of the core system, as it can be developed entirely in the contributor layer. I hope to keep the core system as lean as possible, but to allow exactly these kind of contributor-driven extensions on top of it.
(Which might lead to the question of me being inconsistent, because hey, I still introduce types, which could totally be also living in a contributor-driven extension, just as pre- and post-conditions would. That's true. At the same time I think that types are so fundamentally relevant for a good UX and for learning how to use the system that it was a product decision to bake them into the core system.)
Thanks for the suggestion! I think it is a great idea, and I hope to see it develop! --DVrandecic (WMF) (talk) 22:30, 19 April 2021 (UTC)Reply