Jump to content

Wikianswers/Technical discussion/Computer-aided document authoring

From Meta, a Wikimedia project coordination wiki

Introduction

[edit]

On Wikipedia, there are over 1.9 edits made every second. An unknown percentage of these edits involve users correcting one another’s language use, e.g., spelling, grammar, or readability.

This page broaches computer-aided document authoring with MediaWiki software. MediaWiki software could support and interoperate with an extensible set of server-side document-processing tools. The outputs of these tools can be represented as annotations. These annotations can be merged, aggregated, and presented to users as they edit, view, or view more information about wiki documents.

Server-side document-processing tools

[edit]
For further information, see Server, Cloud computing, and Natural-language processing.

There are two flavors of server-side document-processing tools:

  1. Tools for formal languages, e.g., programming languages and domain-specific languages. Tools, in these regards, include parsers, linters, and compilers which can each output annotations such as informational, warning, and error messages about portions of content.
  2. Tools for natural languages. Tools, in these regards, include, but are not limited to: spellchecking, grammar checking, readability analysis, sentiment analysis, analysis of subjectivity and objectivity, fact-checking, reasoning checking, and argument verification and validation.

For natural languages, traditionally spellchecking has been performed client-side and integrated into Web browsers, e.g., spellchecking in HTML textboxes. More forms of document processing, as the above list intends to show, can be concurrently performed by server-side software. The outputs of multiple, asynchronous document-processing tools can be readily merged together, aggregated, and presented to users.

Annotations

[edit]
For further information, see Annotation, Text annotation, and Web annotation.

The outputs of document-processing tools can be represented as annotations. Annotations are data structures which bear information about documents or about portions of documents. Annotations refer to portions of documents via selectors. Means of selecting portions of document content include: URI fragments, CSS selectors, XPath selectors, text quote selectors, text position selectors, data position selectors, SVG selectors, and range selectors. Selections can also refine other selections; a selection can be made relative to the contents of a containing selection.

For more information about annotations and selectors, see the Web Annotation Data Model.

User-experience concepts

[edit]

Annotations from multiple server-side document-processing tools can be visualized by and interacted with by users while editing, viewing, or viewing further information about documents.

  1. Document previews. Some annotations from server-side document-processing tools could accompany and be visualized for previewed provisional documents.
  2. Published documents. Some annotations from server-side document-processing tools could accompany and be visualized for published documents.
    1. Some categories of annotations may be automatically removed in the event of page edits. Some categories of annotations could be removed upon page edits so that notified systems could revisit edited documents to subsequently reprocess them and add content-accurate annotations.
  3. Page information. Some annotations from server-side document-processing tools could accompany published documents without having to be visualized for the published documents. They could be, instead, displayed for users when they navigate to "Page information" in the "Tools" menu in the left margin (or a new menu item in the "Tools" menu).

User-interface and user-experience considerations are, broadly, similar to those from integrated development environments and word-processing software.

Annotations can be visualized in a number of ways. These include text decorations such as colored highlighting and underlining.

Some varieties of annotations may require users to select them, in some manner, before any relevant document content is visually indicated.

Useful content could be presented to users in hoverboxes when they hover over visually-decorated document content with one or more corresponding annotations.

Some annotated document content could be provided with context menus for users to select options pertaining to the editing of their provisional wiki documents.

MediaWiki software discussion

[edit]

MediaWiki software support for interoperation with an extensible set of server-side document-processing tools, support for annotations, and support for related client-side user interfaces would benefit the proposed Wikianswers project, would benefit a number of existing Wikimedia projects, e.g., Wikifunctions, Wikipedia, and Wikinews, and would benefit the projects of downstream users of MediaWiki software.

Should MediaWiki software come to support an extensible set of server-side document-processing tools, this might entail a new type of extension. Presently, MediaWiki supports an extensible set of content models and this new type of extension is envisioned as being interoperable with content model extensions. Many content models involve formal languages which can be processed into HTML for display. In some cases (e.g., wikitext or markdown), these output HTML documents are natural-language documents. Accordingly, many content models could be benefitted by server-side document-processing tools for both formal-language and natural-language documents.

A first implementation

[edit]

Server-side document-processing tools could be provided as third-party extensions to MediaWiki software. A first implementation of this new type of extension could be useful to design alongside any architecture, infrastructure, and API's. This first implementation could also be useful as a coding example, illustrating to developers how to implement the new type of extension.

It is an open question which specific implementation should be created first while designing the new architecture, infrastructure, and API's. It could be a tool for formal languages, e.g., a Python or wikitext linter, or a tool for natural languages, e.g., a spellchecker, grammar checker, or readability analyzer.

See also

[edit]