Abstract Wikipedia/Updates/Office hours 2021-06-22
Appearance
Phabricator project: #abstract wikipedia
Abstract Wikipedia |
---|
(Discussion) |
General |
Development plan |
|
Notes, drafts, discussions |
|
Examples & mockups |
Data tools |
Historical |
Participants
(IRC and Telegram usernames)
- Aishwarya
- apine
- Csisc1994
- genocation
- Hogü-456
- Jan_ainali
- Julio974
- lucaswerkmeister
- mahir256
- Nikki
- Nortix
- Philippe
- quiddity
- vrandecic
Content (in English)
<vrandecic> Hello everybody!
<vrandecic> Who's here for the very first office hour? :D
<mahir256> o/
<vrandecic> Yay, hello Mahir!
<Hogü-456> Hogü-456
<Nikki> hi
<Jan_ainali> \o
<vrandecic> Hello Jan! Hi Hogü!
<Julio974> Hello!
<Philippe> Hi!
<vrandecic> Hi Philippe!
<vrandecic> Hi Julio!
<vrandecic> Hello and welcome to the first ever Wikifunctions and Abstract Wikipedia office hour!
<vrandecic> We are here to let you know what we are doing, where we are at, to answer any questions, and to start or have discussions on various topics.
<vrandecic> I mean, we are here at any time to do this, but now is a designated time where you have our full attention :)
<vrandecic> Wikifunctions is a new Wikimedia sibling project for everyone to collaboratively create and maintain a library of code functions to support the Wikimedia projects and beyond, for everyone to call and re-use in the world's natural and programming languages. It will allow us to collaborate on a new knowledge format, functions.
<vrandecic> Abstract Wikipedia is not a sibling project in the sense of Wikidata, Wiktionary, Wikipedia, etc., but rather a development that aims to combine the pieces that are or will be in place in order to allow for the creation of content in a way that abstracts from a concrete natural language, and then can be translated into the individual languages of the Wikipedia projects. This way we hope to massively increase
<vrandecic> coverage, correctness, and currency in Wikipedia across many languages.
<vrandecic> So, that's the canned text I had prepared.
<lucaswerkmeister> o/
<vrandecic> Hey Lucas!
<vrandecic> Thanks for everything and welcome!
<Nortix> o/
<vrandecic> Nortix, yay! Hi
<vrandecic> So, we can either start taking questions, or I will babble on about where we are currently, in which state of development etc.
<Nortix> Start with some babbling ;)
<vrandecic> Since we are not too many, I suggest we can informally have questions at any time, and I will keep babbling as Nortix suggested :)
<Nortix> (Y)
<vrandecic> We have subdivided the development work until launch in eleven phases.
<vrandecic> And named them after the first letters of the Greek alphabet
<Julio974> I have quite a few questions
<Philippe> About talks; see https://meta.wikimedia.org/wiki/Talk:Abstract_Wikipedia/Updates/2021-06-17
<Julio974> First: functions will be stored as ZObjects, but what exactly will be stored? Will it be the implementation (meaning it depends on the language) or the instructions (meaning it is basically in an abstract programming language)?
<vrandecic> We are currently in Phase Epsilon, which is the fifth phase.
<vrandecic> Philippe, yes we can discuss that. The page is a bit long, can you say which question to discuss first?
<vrandecic> Julio974: both!
<apine> Julio974: yes, implementations will be stored. There are also abstract programming language features, but I'm not entirely sure what is meant by "instructions."
<Philippe> I think we know that basic implementations (using composition) will be based on a compact JSON form. Other native implementations will be represented as metadata in the JSON, allowing to locate the relevant places to run
<vrandecic> What apine / Cory said. We will have two main different forms of implementations, one is by using contributed code - i.e. something written in a programming language, the other is by using so-called composition (we are working on that right now in phase Epsilon), where one function is implemented as a nested composition of other functions.
<vrandecic> Julio974: does this answer your question?
<vrandecic> Philippe: yes, I think that is right as far as I understand your comment.
<Philippe> composition will not be ncessarily slow, as it should work jointly with caches, and the minimum runtime will contain basic functions (notably common data aggregators)
<Julio974> vrandecic: so implementations that are not compositions will be dependent on a programming language, right?
<vrandecic> Philippe: fully agreed! I really hope that compositions won't be slow, I actually hope this will be the main way to provide implementations, but we have different opinions on this within the development team
<vrandecic> Julio974: yes that is correct
<Philippe> However, the wiki talk page has an interesting question: why results have to be unique?
<Julio974> vrandecic: Okay, thanks
<vrandecic> Philippe: unique in the sense of "deterministic"?
<vrandecic> I.e. always the same given the same inputs?
<Philippe> results are more than just a direct reply to a question (function and its parameters), it should contain various other metadata. And Determinism is not warrantied given the unavoidable constraints we'll have: computing a result can take lot of time, depending on the level of accuracy or coverage we need, so there should be some additional metadata like relevance and performance, and results may be estimations whose accuracy w
<quiddity> Re: development phases mentioned earlier, details are within https://meta.wikimedia.org/wiki/Abstract_Wikipedia/Phases
<Philippe> In other words, a function call would behave more like a web search: you get several results with relevance scores (that are also evolutive)
<vrandecic> The idea is to start by avoiding these non-deterministic cases. So, we won't have random results in the beginning. This is in order to simplify our life for now.
<vrandecic> At some point it will make A LOT of sense to introduce non-deterministic functions, e.g. with random elements.
<Philippe> In this sense, it would not be fully "deterministic", but organized by the orchestrator as an heuristic to converge to an acceptable solution
<vrandecic> Yes, that could work. Another thing that could work would be, if you want a ranked list of results have a function that returns a ranked list of results.
<Philippe> it will then be up to the client to provide parameters for what is acceptable (including the maximum delay to get an estimation or result)
<vrandecic> I.e. add(2, 5) will not require a ranked list of results
<vrandecic> but other functions might
<Philippe> yes but basic arithmetic on small integers will be certainly builtin, like in most programming languages
<vrandecic> we're currently not planning to do that :D
<Philippe> so one result, with 100% accuracy and fast reply (not even requiring any cache
<vrandecic> It might and will likely happen as an optimization later on
<vrandecic> But we'll start without having arithmetics built in. Let's see where this will take us :)
<vrandecic> (If we use any words that are unclear or that you'd like to be explained, please ask! That's why we are here)
<vrandecic> I am looking forward to see how caching, builtins, and other implementations will play out
<vrandecic> I agree that it sounds rather silly to cache the result of add(2,5) and then look it up in a cache :D
<vrandecic> When the calculation would be a single computing cycle
<vrandecic> My expectation is that our first implementation will aim at being feature complete, but might have grave inefficiencies like these
<vrandecic> and over time, understanding where the actual pain points are, we will start improving and optimization the system
<vrandecic> One thing that will help us is that we hope for an organic growth of the platform
<Julio974> Also, the results of any function involving "time right now" or random functions should by default not be cached
<vrandecic> i.e. fewer people at the beginning, and then ramping up, in the meantime we can watch and see what happens
<vrandecic> Julio974: yes, exactly. That's why we'll try to avoid these functions in the beginning :)
<vrandecic> "time right now" can always be given as an argument, so that it doesn't break cachability (random too, in a way)
<vrandecic> Here's one thing that's bouncing in my head a lot (and I wanted to write a weekly on it for a while), if I may open a discussion thread
<Philippe> Note that it is still useful to predict and prepare the code to the fact that a function will not necessaruily return a single result, but a list of results (possibly with different ranking factors: accuracy, speed of computation/costs, freshness of the result, where caches could still store several historied results as long as the latest is not fully completed).
<vrandecic> We want Wikifunctions to aim at a wide audience
<Philippe> And so the orchestrator will have to find a way to determine which result to use
<vrandecic> Philippe: yes, but that should be handled by the orchestrator transparently
<Philippe> advanced users will want to see more results and tune the decision factors used by the orchestrator
<vrandecic> I don't want Wikifunctions to be only used by the people who already know how to use functions, if that makes sense
<vrandecic> Philippe: ah, yes! And they will have the opportunity. Every function evaluation will come wrapped with metadata about the evaluation.
<Philippe> and even if we get a single response, we should be able to get metadata details on how and where it was computed
<vrandecic> Cory is currently pondering about what to but into this envelope
<vrandecic> yes, agreed!
<Julio974> As the {{LAMBDA:}} keyword will allow to call functions within other wikis, is it planned to allow functions replacing (or acting as) modules on other wikis?
<Philippe> that's why I think that "function evaluators" should better be named "estimators"
<quiddity> (naming things is hard!)
<Hogü-456> What do you think is neccessary that Wikifunctions is also used by not so advanced users and what is the current state for the user interface after I think this is a important, especcially if there is not so much experience using something.
<vrandecic> Julio974: functions will be similar in some ways to modules, but we do not plan to replace them.
<vrandecic> Julio974: IIRC modules can expose several functions? We would be more finegrained than that.
<vrandecic> Julio974: both can and will for most users conveniently be hidden behind template calls, I expect. And they will have much of the same functionality, but cross-wiki.
<vrandecic> Julio974: does this sound right? or do you have specific follow ups?
<Julio974> vrandecic: Alright!
<vrandecic> Hogü: yes, a good user experience will be crucial! We are currently working towards it. We conducted some user research, and started the foundational steps of the UX design work. Assume that what you see in the current prototype will be completely rewritten.
<Philippe> Another thing not talked: functions are described for now by segregating "input" and "output" parameters. Consider add(2,5)=7, formally it is a relation add(2,5,7) with 3 variables and any one could be unknown, so that it also describes inverse functions. Consider multiply(0,1)=0, it is the same as multiply(1,0,0), but if the unknown parameter is not the last one, multiply(y,0,x,0) is also true but the result we "want" is an
<Philippe> still this infinite list as a simple representation as lamda(x,(multiply,0,x,0))
<Julio974> By the way, will be it possible to "export" composition implementations as code in a chosen programming language?
<Philippe> some programming languages are good for such processing, and even maths contain such abstractions: generalize functions (or "applications") as relations. And then we get into graph theories, and graph exploration and reduction methods
<vrandecic> Philippe: yes, that's true, and programming language like Datalog and Prolog are making good use of relations instead of functions. But general relations are so much more computational intensive in many cases.
<vrandecic> Philippe: I was considering relations instead of functions, but looking at the state of research and implementations in that area that seemed too risky.
<Aishwarya> Hogü-456 yes we wish to design an interface for creating functions that would be understandable, (hopefully) interesting, and usable by non programmers but this will definitely be a big challenge! we currently do not have a user interface but are working on it as i type 🙂
<Philippe> And these good languages have lot of use in IA: language processing will require IA.
<Nikki> what is IA?
<Aishwarya> of course there are ways to contribute besides writing functions. someone can help curate/organize the library, label functions, help write documentation, or do translation or integration into others wikis!
<Csisc1994> Artificial Intelligence (re @Nikki: what is IA?)
<vrandecic> Philippe: additionally, relations require the end-user to understand the concept of variables. Based on my experience teaching computer science, that's one of the hardest concepts. The current function model of Wikifunctions avoids variables. But that's a design decision, it could have cone the other way.
<Philippe> sorry, AI (Artifficial inetlligence, I used the French acronym)
<Nortix> great to hear! =) (re @Aishwarya: Hogü-456 yes we wish to design an interface for creating functions that would be understandable, (hopefully) interesting, and usable by non programmers but this will definitely be a big challenge! we currently do not have a user interface but are working on it as i type 🙂)
<Csisc1994> I know :) (re @Philippe: sorry, AI (Artifficial inetlligence, I used the French acronym))
<Hogü-456> Will you be present at the virtual Wikimania this year. This is a chance to make Wikifunctions more known. At the moment I dont know how many people who are active in the Wikimedia Projects know what Wikifunctions is.
<Julio974> Will be it possible to "export" composition implementations as code in a chosen programming language?
<vrandecic> Julio974: re: will it be possible to export compositions in a programming language. I very much hope so!! We are not currently planning this as a feature, because it is potentially hard to do it right for all programming languages, but I think it should be possible to compile an implementation in a programming language from a composition if you have enough of the involved functions in that programming langugae.
<vrandecic> Julio974: so I really hope that will be possible, but I don't make it a promised feature.
<vrandecic> Julio974: there might be some limitations, e.g. regarding the allowed types when exporting compositions as implementations etc.
<vrandecic> Julio974: the good news is, since it will all be on an open wiki, others can try to create such a module that automatically synthesizes an implementation from a composition, in case we don't do it soon enough or it turns out to be more complicated than I currently think.
<vrandecic> Julio974: I certainly hope and expect PL researchers around the world to jump on this opportunity ;)
<lucaswerkmeister> yeah, I was thinking that sounds like something that could be done in a tool ^^
<quiddity> Hogü-456: Yes, we have applied for a slot in the Wikimania schedule, and we very much hope to see the topic discussed in a few places. I know there are at least 3 submissions that mention "Abstract Wikipedia" (and thus implicitly Wikifunctions) in https://wikimania.wikimedia.org/wiki/2021:Submissions
<vrandecic> lucas: yes, I hope so. Or as a Wikifunctions function.... <inception_meme>
<vrandecic> Since composition is basically just nested function calls, it *should* be easy to translate that in many different programming languages straightforwardly. But I am not a Programming Languages expert, so there might be edge cases that complicate that step.
<lucaswerkmeister> will evaluation be limited to functions stored on the wiki? (the current API allows specifying any function implementation in the API parameters, if I remember correctly)
<lucaswerkmeister> which relates to another thing I was wondering – how do we prevent people from running cryptominers on wikifunctions 😐
<Philippe> the complex part will occur where functions are builtins required by the platform: it may be hard to integrate these builtins
<vrandecic> lucas: it currently is not limited, I think we will need to limit it eventually
<Philippe> cryptominers can only be prevented by metadata in results, containing estimations of costs (CPU cycles, or storage used)
<vrandecic> lucas: i mean, limit to implementations stored on wiki. Since eventually only the orchestrator will be communicating with the evaluators, that should be rather straight forward to limit.
<Julio974> lucaswerkmeister: I'm not sure how it cryptominers would work on wf, but is it possible to limit the runtime of functions?
<Philippe> so the orchestrator will need to monitor these resources, and pace things: everything can be computed, but it will take just more time and people will have to live with just estimations
<Philippe> or people will have to provide resources to the system
<Hogü-456> Do you want that people run calculations with functions from Wikifunctions on their own computer or hardware or how many ressources will you offer for calculations at Wikifunctions.
<vrandecic> regarding cryptominers: yes, that's something we are looking at actively, and need to figure out a strategy for it. We certainly don't want cryptomining to be happening on our compute resources.
<Philippe> That's why I like the design gaol to allow evaluators/estiamtors to run externally outside Wikimedia. But then we'll need resource resolvers (e.g. a DHT-based P2P network)
<vrandecic> Hogü: yes, we explicitly hope and will support the creation of an ecosystem of evaluators. It should eventually be possible to run functions on your own hardware or on cloud hardware etc.
<Philippe> But then we need also warranties for preserving the privacy, if we allow things to run externally
<vrandecic> Hogü: we will only provide limited compute
<vrandecic> Philippe: a P2P network of people sharing compute results and resources would be awesome! We have a few people on the team who are already mulling such an infrastructure, and some people with experience developing DHTs. It is as of now not on our critical path, but I would love to see such an ecosystem evolve over the coming years.
<vrandecic> Julio974: yes, we plan to make it possible to limit resources
<Philippe> May be then integrate a Winfunctions evaluator inside a torrent client?
<vrandecic> Julio974: we plan to keep track of resource usage, and we plan to constrain the running of the evaluators in monitored containers / VMs
<Julio974> There's something I still don't get about awp. How will awp be able to make the link between lexemes, forms, and glosses in multiple languages?
<vrandecic> Philippe: I certainly wouldn't be opposed to that :)
<Philippe> but allow people still use unlimited resources on their own system even if they share a part to the world
<vrandecic> Julio974: so let's take a look at one sentence from the Jupiter example here: https://meta.wikimedia.org/wiki/Abstract_Wikipedia/Examples/Jupiter
<Julio974> For example, the jupiter example (https://w.wiki/3XTH) uses the argument "large". But if it is translated to (for example) french, how can it find which lexeme, form, and gloss it is translated to?
<vrandecic> Superlative(subject: Jupiter, quality: large, class: planet, location constraint: Solar System)
<vrandecic> ok, let's focus on large
<Philippe> some people could offer for example a space for free deployment of containers running arbitrary code from the sharing network, monitored by the Wifunctions orchestrator (and a set of people enforcing an usage policy)
<vrandecic> so large would actually be a ZID
<vrandecic> it is not literally large
<genocation> Philippe: Having a P2P infrastructure for running functions would be fascinating, no doubt. The biggest challenge on this would be as always with p2p, how do we build a minimum authority model to validate that the execution of functions is trusted
<vrandecic> but a ZID with the meaning of "big in terms of space", which then for each language would point to a specific Lexeme
<vrandecic> so that ZID would point to L3415 for English, L400213 for german etc.
<Julio974> vrandecic: So it would have a ZID for each concept, each pointing to a specific lexeme/form/sense?
<Julio974> Okay
<vrandecic> more or less
<vrandecic> in some cases that can be a bit more complex
<vrandecic> we had the discussion of different horse types a few weeks ago
<vrandecic> where there is not a single ZID for each of the concepts, but rather on ZID for a function that, then given further information, selects the appropriate lexemes
<Hogü-456> When do you plan to launch the website for Wikifunctions
<vrandecic> so basically, the ZID does not have to be a single constant relation, it could be a function
<Philippe> traditional means of control: contributors of resources will need to register a secure account and gain "points", they will be ranked. On top of that we build a network of confidence (but without automatic transitive delegation: each participant should have the control for their network of trusts, and can disconnect any one; no need for a single central authority, even of a few good authorities with high rankings will develop
<vrandecic> Hogü: when we are ready :) I expect later this year. Some time after Wikimania.
<Philippe> Non hierarchical system like this is what blockchains are realizing. And if we develop points of trust, then we also develop a chin of value... a new cryptocurrency by itself!
<vrandecic> Philippe: yes, agreed on those thoughts regarding a dapp to run wikifunctions functions :) that would be great!
<quiddity> We are at time for the hour. Please do keep discussing though! The team just won't be as immediately responsive. We will do another one of these in 4-6 weeks (possibly with video). Thanks for joining us, and please check out the Arctic Knot conference later this week: https://meta.wikimedia.org/wiki/Arctic_Knot_Conference_2021
<Jan_ainali> https://tools-static.wmflabs.org/bridgebot/b6cd2e28/mp4.mp4
<vrandecic> thanks everyone! this was fun!
<vrandecic> See you around, I have to run to my next meeting!
<Philippe> Yes this sounds silly, but do have have the choice ?
<Jan_ainali> The usual wiki way (re @Philippe: Yes this sounds silly, but do have have the choice ?)
<vrandecic> I don't think it sounds silly at all
<Philippe> either a central hierarchical adminsitration (and limitations of freedom and privacy) or some new system like distributed blockchains
<Julio974> So it would mean it isn't superlative(..., "large", ...) but superlative(..., sizeExpression(2.5), ...), where sizeExpression would for example return the lexeme for "small" for an argument smaller than 2, and the lexeme for "large" for an argument larger than 2?
<mahir256> That's certainly one way that could work
<Philippe> @Jan_ainali : the usual wiki way no longer works. It has excluded MANY people and is no longer open as it supposed to be.
<Jan_ainali> That's a bold claim. Can you prove it? (re @Philippe: @Jan_ainali : the usual wiki way no longer works. It has excluded MANY people and is no longer open as it supposed to be.)
<lucaswerkmeister> can we please not have a blockchain discussion here
<Philippe> We know the battles and problems caused by selfish admins, abusing their power of threat and building up artificially the trust from others, only by their omnipresence and absence of any working appeal
<vrandecic> Julio974: maaaaybe. Up to the community. It could go either way. I would probably for the beginning just have a single one point mapping for small and large into each language, and start making functions only when a clear need is visible.
<vrandecic> ah, he just left
<vrandecic> our first implementation will be centralized. I think it is worthwhile exploring a decentralized solution, and anyone should feel welcome to work on that if they so want, but in order for us to simplify life, we'll start with a centralized deployment. We aim to make Wikifunctions as widely usable as possible and hope that we can make some of the barriers to participation more porous. We will see how far we'll get.