Jump to content

Abstract Wikipedia/Tasks

From Meta, a Wikimedia project coordination wiki

This is part of the development plan for Abstract Wikipedia.
Continues from Abstract Wikipedia components.

Development happens in two main parts. Part P1 is about getting Wikifunctions up and running, and sufficiently developed to support the Content creation required for Part P2. Part P2 is then about setting up the creation of Content within Wikidata, and allowing the Wikipedias to access that Content. After that, there will be ongoing development to improve both Wikifunctions and Abstract Wikipedia. This ongoing development is not covered by this plan. Note that all timing depends on how much headcount we actually work with.

In the first year, we will work exclusively on Part P1. Part P2 starts with the second year and adds additional headcount to the project. Parts P1 and P2 will then be developed in parallel for the next 18 months. Depending on the outcome and success of the project, the staffing of the further development has to be decided around month 24, and reassessed regularly from then on.

This plan covers the initial 30 months of development. The main deliverables after that time will be:

  1. Wikifunctions, a new WMF project and wiki, with its own community, and its own mission, aiming to provide a catalog of functions and allow everyone to share in that catalog, thus empowering people by democratizing access to computational knowledge.
  2. A cross-wiki repository to share templates and modules between the WMF projects, which is a long-standing wish by the communities. This will be part of Wikifunctions.
  3. Abstract Wikipedia, a cross-wiki project that allows to define Content in Wikidata, independently of a natural language, and that is integrated in several local Wikipedias, considerably increasing the amount of knowledge speakers of currently underserved languages can share in.

Part P1: Wikifunctions

Task P1.1: Project initialization

We will set up Wikipages on Meta, bug components, mailing lists, chat rooms, an advisory board, and other relevant means of discussion with the wider community. We will start the discussion on and decide on the names for Wikifunctions and Abstract Wikipedia, and hold contests for the logos, organize the creation of a code of conduct, and the necessary steps for a healthy community.

We also need to kick off the community process of defining the license choice for the different parts of the project: abstract Content in Wikidata, the functions and other entities in Wikifunctions, as well as the legal status of the generated text to be displayed in Wikipedia. This needs to be finished before Task P1.9. This decision will need input from legal counsel.

Task P1.2: Initial development

The first development step is to create the Wikifunctions wiki with a Function namespace that allows the storage and editing of Z-Objects (more constrained JSON objects - in fact we may start with the existing JSON extensions and build on that). The initial milestone aims to have:

  • The initial types: unicode string, positive integer up to N, boolean, list, pair, language, monolingual text, multilingual text, type, function, builtin implementation, and error.
  • An initial set of constants for the types.
  • An initial set of functions: if, head, tail, is_zero, successor, predecessor, abstract, reify, equal_string, nand, constructors, probably a few more functions, each with a built-in implementation.

This allows to start developing the frontend UX for function calls, and create a first set of tools and interfaces to display the different types of Z Objects, but also generic Z Objects. This also includes a first evaluator that is running on Wikimedia servers.

Note that the initial implementations of the views, editing interfaces, and validators are likely to be thrown away gradually after P1.12 once all of these are becoming internalized. To internalize some code means to move it away from the core and move it into userland, i.e. to reimplement them in Wikifunctions and call them from there.

Task P1.3: Set up testing infrastructure

Wikifunctions will require several test systems. One to test the core, one to test the Web UI, one to test the Wikifunctions content itself. This is an ongoing task and needs to be integrated with version control.

Task P1.4: Launch public test system

We will set up a publicly visible and editable test system that runs the bleeding edge of the code (at least one deploy per working day). We will invite the community to come in and break stuff. We may also use this system for continuous integration tests.

Task P1.5: Server-based evaluator

Whereas the initial development has created a simple evaluator working only with the built-ins, and thus having very predictable behavior, the upcoming P1.6 function composition task will require us to rethink the evaluator. The first evaluator will run on Wikimedia infrastructure, and needs monitoring and throttling abilities, and potentially also the possibility to allocate users different amounts of compute resources depending on whether they are logged-in or not.

Task P1.6: Function composition

We allow for creating new function interfaces and a novel type of implementation, which are composed function calls. E.g. it allows to implement

add(x, y)

as

if(is_zero(y), x, add(successor(x), predecessor(y))

and the system can execute this. It also allows for multiple implementations.

Task P1.7: Freeze and thaw entities

A level of protection that allows to edit the metadata of an entity (name, description, etc.), but not the actual value. This functionality might also be useful for Wikidata. A more elaborate versioning proposal is suggested here.

Task P1.8: Launch beta Wikifunctions

The beta system runs the next iteration of the code that will go on Wikifunctions with the next deployment cycle, for testing purposes.

Task P1.9: Launch Wikifunctions

Pass security review. Set up a new Wikimedia project. Move some of the wikipages from Meta to Wikifunctions.

Task P1.10: Test type

Introduce a new type for writing tests for functions. This is done by specifying input values and a function that checks the output. Besides introducing the Test type, Functions and Implementations also have to use the Tests, and integrate them into Implementation development and Interface views.

Task P1.11: Special page to write function calls

We need a new Special page that allows and supports to write Function calls, with basic syntax checking, autocompletion, documentation, etc. A subset of this functionality will also be integrated on the pages of individual Functions and Implementations to run them with more complex values. This Special page relies on the initial simple API to evaluate Function calls.

Task P1.12: JavaScript-based implementations

We allow for a new type of Implementations, which are implementations written in JavaScript. This requires to translate values to JavaScript and back to Z-Objects. This also requires to think about security, through code analysis and sandboxing, initially enriched by manual reviews and P1.7 freezing. If the O1 lambda-calculus has not yet succeeded or was skipped, we can also reach the self-hosting for Wikifunctions through this task, which would allow to internalize most of the code.

Task P1.13: Access functions

Add F7 the Lambda magic word to the Wikimedia projects which can encode function calls to Wikifunctions and integrate the result in the output of the given Wikimedia project. This will, in fact, create a centralized templating system as people realize that they can now reimplement templates in Wikifunctions and then call them from their local Wikis. This will be preceded by an analysis of existing solutions such as TemplateData and the Lua calls.

This might lead to requests from the communities to enable the MediaWiki template language and Lua (see P1.15) as a programming language within Wikifunctions. This will likely lead to requests for an improved Watchlist solution, similar to what Wikidata did, and to MediaWiki Template-based implementations, and other requests from the community. In order to meet these tasks, an additional person might be helpful to answer the requests from the community. Otherwise P2 might start up to a quarter later. This person is already listed in the development team above.

Task P1.14: Create new types

We allow for the creation of new types. This means we should be able to edit and create new type definitions, and internalize all code to handle values of a type within Wikifunctions. I.e. we will need code for validating values, construct them, visualize them in several environments, etc.. We also should internalize these for all the existing types.

Task P1.15: Lua-based implementations

We add Lua as a programming language that is supported for implementations. The importance of Lua is due to its wide use in the MediaWiki projects. Also, if it has not already happened, the value transformation from Wikifunctions to a programming language should be internalized at this point (and also done for the JavaScript implementations, which likely will be using built-ins at this point).

Task P1.16: Non-functional interfaces

Whereas Wikifunctions is built on purely functional implementations, there are some interfaces that are naively not functional, for example random numbers, current time, autoincrements, or many REST calls. We will figure out how to integrate these with Wikifunctions.

Task P1.17: REST calls

We will provide builtins to call out to REST interfaces on the Web and ingest the result. This would preferably rely on P1.16. Note that calling arbitrary REST interfaces has security challenges. These need to be taken into account in a proper design.

Task P1.18: Accessing Wikidata and other WMF projects

We will provide Functions to access Wikidata Items and Lexemes, and also other content from Wikimedia projects. This will preferably rely on P1.17, but in case that didn’t succeed yet at this point, a builtin will unblock this capability.

Task P1.19: Monolingual generation

The development of these functions happens entirely on-wiki. This includes tables and records to represent grammatical entities such as nouns, verbs, noun phrases, etc., as well as Functions to work with them. This includes implementing context so that we can generate anaphors as needed. This allows for a concrete generation of natural language, i.e. not an abstract one yet.

Part P2: Abstract Wikipedia

The tasks in this part will start after one year of development. Not all tasks from Part P1 are expected to be finished before Part P2 can begin, as, in fact, Parts P1 and P2 will continue to be developed in parallel. Only some of the tasks in Part P1 are required for P2 to start.

Task P2.1: Constructors and Renderers

Here we introduce the abstract interfaces to the concrete generators developed in P1.19. This leads to the initial development of Constructors and the Render function. After this task, the community should be able to create new Constructors and extend Renderers to support them.

  • Constructors are used for the notation of the abstract content. Constructors are language independent, and shouldn't hold conditional logic.
  • Renderers should hold the actual conditional logic (which gets applied to the information in the Constructors). Renderers can be per language (but can also be shared across languages).
  • The separation between the two is analogous to the separation in other NLG systems such as Grammatical Framework.

Task P2.2: Conditional rendering

It will rarely be the case that a Renderer will be able to render the Content fully. We will need to support graceful degradation: if some Content fails to render, but other still renders, we should show that part that rendered. But sometimes it is narratively necessary to render certain Content only if other Content will definitely be rendered. In this task we will implement the support for such conditional rendering, that will allow smaller communities to grow their Wikipedias safely.

Task P2.3: Abstract Wikipedia

Create a new namespace in Wikidata and allow Content to be created and maintained there. Reuse UI elements and adapt them for Content creation. The UI work will be preceded by Design research work which can start prior to the launch of Part P2. Some important thoughts on this design are here. This task will also decide whether we need new magic words (F5, F6, and F7) or can avoid their introduction.

Task P2.4: Mobile UI

The creation and editing of the Content will be the most frequent task in the creation of a multilingual Wikipedia. Therefore we want to ensure that this task has a pleasant and accessible user experience. We want to dedicate an explicit task to support the creation and maintenance of Content in a mobile interface. The assumption is that we can create an interface that allows for a better experience than editing wikitext.

Task P2.5: Integrate content into the Wikipedias

Enable the Abstract Wikipedia magic word. Then allow for the explicit article creation, and finally the implicit article creation (F1, F2, F3, F4, F5, F6).

Task P2.6: Regular inflection

Wikidata’s Lexemes contain the inflected Forms of a Lexeme. These Forms are often regular. We will create a solution that generates regular inflections through Wikifunctions and will discuss with the community how to integrate that with the existing Lexemes.

Task P2.7: Basic Renderer for English

We assume that the initial creation of Renderers will be difficult. Given the status of English as a widely used language in the community, we will use English as a first language to demonstrate the creation of a Renderer, and document it well. We will integrate the help from the community. This also includes functionality to show references.

Task P2.8: Basic Renderer for a second language

Based on the community feedback, interests, and expertise of the linguists working on the team, we will select a second large language for which we will create the basic Renderer together with the community. It would be interesting to choose a language where the community on the local Wikipedia has already confirmed their interest to integrate Abstract Wikipedia.

Task P2.9: Renderer for a language from another family

Since it is likely that the language in P2.8 will be an Indo-European language, we will also create a basic Renderer together with the community for a language from a different language family. The choice of this language will be done based on the expertise available to the team and the interests of the community. It would be interesting to choose a language where the community on the local Wikipedia has already confirmed their interest to integrate Abstract Wikipedia.

Task P2.10: Renderer for an underserved language

Since it is likely that the languages in P2.8 and P2.9 will be languages which are already well-served with active and large Wikipedia communities, we will also select an underserved language, a language that currently has a large number of potential readers but only a small community and little content. The choice of this language will be done based on the expertise available to the team and the interests of the community. Here it is crucial to select a language where the community on the local Wikipedia has already committed their support to integrate Abstract Wikipedia.

Task P2.11: Abstract Wikidata Descriptions

Wikidata Descriptions seem particularly amenable to be created through Wikifunctions. They are often just short noun phrases. In this task we support the storage and maintenance of Abstract Descriptions in Wikidata, and their generation for Wikidata. We also should ensure that the result of this leads to unique combinations of labels and descriptions.

See Abstract Wikipedia/Updates/2021-07-29 for further details and discussion.

Task P2.12: Abstract Glosses

Wikidata Lexemes have Senses. Senses are captured by Glosses. Glosses are available per language, which means that they are usually only available in a few languages. In order to support a truly multilingual dictionary, we are suggesting to create Abstract Glosses. Although this sounds like it should be much easier than creating full fledged Wikipedia articles, we think that this might be a much harder task due to the nature of Glosses.

Task P2.13: Support more natural languages

Support other language communities in the creation of Renderers, with a focus on underserved languages.

Task P2.14: Template-generated content

Some Wikipedias currently contain a lot of template-generated content. Identify this content and discuss with the local Wikipedias whether they want to replace it with a solution based on Wikifunctions, where the template is in Wikifunctions and the content given in the local Wikipedia or in Abstract Wikipedia. This will lead to more sustainable and maintainable solutions that do not require to rely on a single contributor. Note that this does not have to be multilingual, and might be much simpler than going through full abstraction.

Task P2.15: Casual comments

Allow casual contributors to make comments on the rendered text and create mechanisms to capture those comments and funnel them back to a triage mechanism that allows to direct them to either the content or the renderers. It is important to not lose comments by casual contributors. Ideally, we would allow them to explicitly overwrite a part of the rendered output and consider this a change request, and then we will have more involved contributors working on transforming the intent of the casual contributor to the corresponding changes in the system.

Task P2.16: Quick article name generation

The general public comes to Wikipedia mostly by typing the names of the things they are looking for in their language into common search engines. This means that Wikidata items will need labels translated into a language in order to be able to use implicit article creation. This can probably be achieved by translating millions of Wikidata labels. Sometimes it can be done by bots or AI, but this is not totally reliable and scalable, so it has to involve humans.

The current tools for massive crowd-sourced translation of Wikidata labels are not up to the task. There are two main ways to do it: editing labels in Wikidata itself, which is fine for adding maybe a dozen of labels, but quickly gets tiring, and using Tabernacle, which appears to be more oriented at massive batch translations, but is too complicated to actually use for most people.

This task is to develop a massive and integrated label-translation tool with an easy, modern frontend, that can be used by lots of people.

Non-core tasks

There is a whole set of further optional tasks. Ideally these would be picked up by external communities and developed as Open Source outside the initial development team, but some of these might need to be kickstarted and even fully developed by the core team.

Task O1: Lambda calculus

It is possible to entirely self-host Wikifunctions without relying on builtins or implementations in other programming languages, by implementing a Lambda calculus in Wikifunctions (this is where the name proposal comes from). This can be helpful to allow evaluation without any language support, and so easier start up the development of evaluators.

Task O2: CLI in a terminal

Many developers enjoy using a command line interface to access a system such as Wikifunctions. We should provide one, with the usual features such as autocompletion, history, shell integration, etc.

Task O3: UIs for creating, debugging and tracing functions

Wikifunctions’s goal is to allow the quick understanding and development of the functions in Wikifunctions. Given the functional approach, it should be possible to create a user experience that allows for partial evaluation, unfolding, debugging, and tracing of a function call.

Task O4: Improve evaluator efficiency

There are many ways to improve the efficiency of evaluators, and thus reduce usage of resources, particularly caching, or the proper selection of an evaluation strategy. We should spend some time on doing so for evaluators in general and note down the results, so that different evaluators can use this knowledge, but also make sure that the evaluators maintained by the core team use most of the best practises.

Task O5: Web of Trust for implementations

In order to relax the conditions for implementations in programming languages, we could introduce a Web of Trust based solution, that allows contributors to review existing implementations and mark their approval explicitly, and also to mark other contributors as trustworthy. Then these approvals could be taken into account when choosing or preparing an evaluation strategy.

Task O6: Python-based implementations

Python is a widely used programming language, particularly for learners and in some domains such as machine learning. Supporting Python can open a rich ecosystem to Wikifunctions.

Task O7: Implementations in other languages

We will make an effort to call out to other programming language communities to integrate them into Wikifunctions, and support them. Candidates for implementations are Web Assembler, PHP, Rust, C/C++, R, Swift, Go, and others, but it depends on the interest of the core team and the external communities to create and support these interfaces.

Task O8: Web-based REPL

A web-based REPL can bring the advantages of the O2 command line interface to the Web, without requiring to install a CLI in a local environment, which is sometimes not possible.

Task O9: Extend API with Parser and Linearizer

There might be different parsers and linearizers using Wikifunctions. The Wikifunctions API can be easier to use if the caller could explicitly select these, instead of wrapping them manually, which would allow the usage of Wikifunctions with different surface dialects.

Task O10: Support talk pages

In order to support discussions on talk pages of Wikifunctions, develop and integrate a mechanism that allows for (initially) simple discussions, and slowly raising their complexity based on the need of the communities.

An interesting application of Wikifunctions is for the creation of legal text in a modular manner and with different levels (legalese vs human-readable), similar to the different levels of description for the different Creative Commons licenses.

An interesting application of Wikifunctions is for the creation of health-related texts for different reading levels. This should be driven by WikiProject Medicine and their successful work, which could reach many more people through cooperation with Wikifunctions.

Task O13: NPM library

We will create a Wikidata library for NPM that allows the simple usage of Functions from Wikifunctions in a JavaScript program. The same syntax should be used to allow JavaScript implementations in Wikifunctions to access other Wikifunctions functions. Note that this can be done by virtue of calling to a Wikifunctions evaluator, or by compiling the required functions into the given code base.

Task O14: Python library

We will create a Python library that allows the simple usage of Functions from Wikifunctions in a Python script. The same syntax should be used to allow Python implementations in Wikifunctions to access other Wikifunctions functions. Note that this can be done by virtue of calling to a Wikifunctions evaluator, or by compiling the required functions into the given code base.

Task O15: Libraries for other programming languages

We will call out to the communities of several programming languages to help us with the creation of libraries that allow the simple call of Wikifunctions functions from programs in their language. Note that this can be done by virtue of calling to a Wikifunctions evaluator, or by compiling the required functions into the given code base.

Task O16: Browser-based evaluator

One of the advantages of Wikidata is that the actual evaluation of a function call can happen in different evaluators. The main evaluator for Abstract Wikipedia will be server-based and run by the Wikimedia Foundation, but in order to reduce the compute load, we should also provide evaluators running in the user’s client (probably in a Worker thread).

Task O17: Jupyter- and/or PAWS-based evaluator

One interesting evaluator is from a Jupyter or PAWS notebook, and thus allowing the usual advantages of such notebooks but integrating also the benefits from Wikifunctions.

Task O18: App-based evaluator

One evaluator should be running natively on Android or iOS devices, and thus allow the user to use the considerable computing power in their hand.

Task O19: P2P-based evaluator

Many of the evaluators could be linking together and allow each other to use dormant compute resources in their network. This may or may not require shields between the participating nodes that ensure the privacy of individual computes.

Task O20: Cloud-based evaluator

One obvious place to get compute resources is by allowing to use a cloud provider. Whereas it would be possible to simply run the server-based evaluator on a cloud-based infrastructure, it is likely to be beneficial for the cloud providers to provide a thinner interface to a more custom-tailored evaluator.

Task O21: Stream type

Add support for a type for streaming data, both as an input as well as an output. Stream types means for example the recent changes stream on a Wikimedia wiki.

Task O22: Binary type

Add support for binary files, both for input and output.

Task O23: Integration with Commons media files

Allow direct access to files on Commons. Enable workflows with Commons that require less deployment machinery than currently needed. Requires O22.

Task O24: Integrate with Machine Learning

Develop a few example integrations with Machine Learning solutions, e.g. for NLP tasks or for work on image or video e.g. using classifiers. This requires how and where to store models, possibly also how to train them, and how to access them.

Task O25: Integrate into IDEs

Reach out to communities developing IDEs and support them in integrating with Wikifunctions, using type hints, documentation, completion, and many of the other convenient and crucial features of modern IDEs.

Task O26: Create simple apps or Websites

Develop a system to allow the easy creation and deployment of apps or Websites relying on Wikifunctions.

Task O27: Display open tasks for contributors

Abstract Wikipedia will require one Renderer for every Constructor for every language. It will be helpful if contributors could get some guidance to which Renderer to implement next, as this is often not trivially visible. Just counting how often a Constructor appears could be a first approximation, but if some Constructors are used more often in the lead, or in parts of the text that block other text from appearing, or in articles that are more read than others, this approximation might be off. In this task we create a form of dashboard that allows users to choose a language (and maybe a domain, such as sports, geography, history, etc., and maybe a filter for the complexity of the renderer that is expected) and then provide them with a list of unrendered Constructors ranked by the impact an implementation would have.

We could also allow contributors to sign up for a regular message telling them how much impact (in terms of views and created content) they had, based on the same framework needed for the dashboard.

This is comparable to seeing the status of translations of different projects on TranslateWiki.Net (for a selectable language), or the views on topics, organizations or authors in Scholia. For each project, it shows what % of the strings in it were translated and what % need update, and a volunteer translator can choose: get something from 98% to 100%, get something from 40% to 60%, get something from 0% to 10%, etc.

Task O28: Active view

Whereas the default view of Rendered content would look much like static text, there should also be a more active view that invites contribution, based on existing Content that failed to render due to missing Renderers. In the simplest case this can be the creation of a Lexeme in Wikidata and connecting to the right Lexeme. In more complex cases that could be writing a Renderer, or offering example sentences as text, maybe using the Casual contribution path described in P2.15. This would provide an interesting funnel to turn more readers in contributors.

There are product and design decisions on how and where the active view should be used and whether it should be the default view, or whether it should be only switched on after an invitation, etc. There could also be mode where contributors can go from article to article and fill out missing pieces, similar to the more abstract way in O27.

It would probably be really useful that ensure that the active way and the contribution path it leads to work on mobile devices as well.

Task O29: Implementation compilation

The function composition used for implementations should allow to create highly performant compilations of rather high-level functions in many different target programming languages, particularly Web Assembler and JavaScript. This should speed up evaluation by several orders of magnitude.

Task O30: Integrate with Code examples on Wikipedia, Wikibooks, etc.

Allow Wikipedia, Wikibooks, and the other projects to integrate their code examples directly with the wiki of functions, so that these can be run live.


This is part of the development plan for Abstract Wikipedia.