Jump to content

Toolhub/Data model

From Meta, a Wikimedia project coordination wiki

The goal of Toolhub is to make it easier for Wikimedians to find tools to use in their work. This data model describes what pieces of information the Toolhub collects and organizes to assist with that goal.

Toolhub field reference

[edit]

Below is a table summarizing the fields for describing a tool in Toolhub. This table omits some metadata fields that don't aid tool discovery, and in general glosses over the implementation details. The #Technical details section below describes how this works in greater detail.

For more information about the field types, see the User Manual. Fields marked as "Core + Annotation" are core fields that can be annotated by the community if the field in the core record is empty or missing. When both the core and annotation data for a given field are populated, Toolhub defaults to displaying the core data rather than the annotation data.

Field type Field name Field summary
Core + Annotation api_url A link to the tool's API, if available.
Annotation audiences Who is the intended user of the tool?
Core author The primary tool developers
Core + Annotation available_ui_languages The language(s) the tool's interface has been translated into. Use ISO 639 language codes like `zh` and `scn`. If not defined it is assumed the tool is only available in English.
Core bot_username If the tool is a bot, the Wikimedia username of the bot. Do not include 'User:' or similar prefixes.
Core + Annotation bugtracker_url A link to the tool's bug tracker on GitHub, Bitbucket, Phabricator, etc.
Annotation content_types With what type of content or data does the tool interact?
Core + Annotation deprecated If true, the use of this tool is officially discouraged. The replaced_by parameter can be used to define a replacement.
Core description A longer description of the tool
Core + Annotation developer_docs_url A link to the tool's developer documentation, if available.
Core + Annotation experimental If true, this tool is unstable and can change or go offline at any time.
Core + Annotation feedback_url A link to location where the tool's user can leave feedback.
Core + Annotation for_wikis A string or array of strings describing the wiki(s) this tool can be used on. Use hostnames such as `zh.wiktionary.org`. Use asterisks as wildcards. For example, `*.wikisource.org` means 'this tool works on all Wikisource wikis.' `*` means 'this works on all wikis, including Wikimedia wikis.'
Core + Annotation icon A link to a Wikimedia Commons file description page for an icon that depicts the tool.
Core license The software license the tool code is available under. Use a standard SPDX license identifier like 'GPL-3.0-or-later'.
Core name Unique tool identifier. Prefixes recommended to avoid name clashes.
Core openhub_id The project ID on OpenHub
Core + Annotation privacy_policy_url A link to the tool's privacy policy, if available.
Core + Annotation replaced_by If this tool is deprecated, this parameter should be used to link to the replacement tool.
Core + Annotation repository A link to the repository where the tool code is hosted.
Core sponsor Organization that sponsored the tool's development.
Annotation subject_domains Is the tool targeted at helping in a specific type of wiki project or topic area?
Core subtitle Longer than the full title but shorter than the description. It should add some additional context to the title.
Annotation tasks What type of task does the tool help with?
Core technology_used A string or array of strings listing technologies (programming languages, development frameworks, etc.) used in creating the tool.
Core title Human readable tool name.
Core tool Unique identifier for this tool. Must be unique for every tool. It is recommended you prefix your tool names to reduce the risk of clashes.
Core + Annotation tool_type The manner in which the tool is used. Select one from the list of options.
Core + Annotation translate_url A link to the tool's translation interface.
Core url A direct link to the tool or to instructions on how to use or install the tool.
Core url_alternates Alternate links to the tool or install documentation in different natural languages.
Core + Annotation user_docs_url A link to the tool's user documentation, if available
Annotation wikidata_qid Wikidata item ID for the tool.
Core _language The language in which this toolinfo record is written. If not set, the default value is English. Use ISO 639 language codes.
Core _schema A URI identifying the jsonschema for this toolinfo.json record. This should be a short uri containing only the name and revision at the end of the URI path.

Taxonomy and controlled vocabulary

[edit]

A taxonomy is a hierarchy of concepts. In the Toolhub taxonomy, the concepts represent attributes of tools. The Toolhub taxonomy seeks to enable filtering and browsing of tools by adding fields to the toolinfo.json schema and defining a controlled vocabularies for each of those attributes. A controlled vocabulary limits the values in a field to a predetermined (or controlled) set of options, to help ensure that tools are described consistently and similar tools can be discovered together.

The Toolhub taxonomy is limited to tool attributes that require human curation. For example: an attribute like "coding language" may be an important Tool attribute, but the list of coding languages that exist doesn't require human curation, so it isn't part of the proposed taxonomy. In contrast, an attribute like "use case" can have many different values which may overlap or conflict with one another. This is the type of attribute where a controlled vocabulary is useful.

Taxonomy v2

[edit]

During the Wikimedia Foundation's 2022-2023 fiscal year (July 2022-June 2023) the Toolhub team seeks to identify which additional tool attributes would be most useful to expand the data model and facilitate tool discovery. To identify a set of attributes and controlled vocabulary, User:TBurmeister_(WMF) completed a taxonomy research project and the team gathered community feedback that resulted in the following set of attributes.

Audiences

[edit]

Who is the intended user of the tool?

Values:

  • Admins
  • Organizers and program coordinators
  • Editors and content contributors
  • Readers and content consumers
  • Researchers
  • Developers

Content types

[edit]

With what type of content or data does the tool interact?

Values:

  • Articles
  • Audio
  • Books
  • Data
    • Bibliographic data
    • Categories or labels
    • Diffs and revision data
    • Event data
    • Geographic data
    • Linguistic data
    • Page metadata
    • Structured data
    • User data
  • Discussions
  • Drafts
  • Emails
  • Images
  • Links
  • Lists
  • Logs
  • Maps
  • References
  • Software or code
  • Templates
  • Videos
  • Watchlist
  • Webpages
  • Wikitext
Feedback received and changes implemented to this attribute

Feedback received:

  • "It would be good to have higher-kinded categories… I want to see only data tools that work with entity data only in Wikidata(not Discussion, Images, Files etc)"

[1]

  • "Drafts seems extremely English Wikipedia-specific. Images, audio/sound, video and books overlap with files - it's not clear whether selecting only "files" would include all tools for working on Commons files or if I would need to include the other four categories as well. What's the difference between audio and sound anyway?" [2]
  • "Content types: looks like a thorough list; I wonder if this could be made a bit hierarchical in the future so that there's only 2-5 top-level data categories?"[3]

Changes implemented:

  • Add additional level of hierarchy to group content types and enable both broad or specific values to be applied.
  • Remove "Files".
  • Split "Maps" and "Geographic Data"
  • Split "Books" and "Bibliographic Data"
  • Rename "Audio or sound files" to "Audio"

Tasks

[edit]

What type of task does the tool help with? This is a more precise concept than "use case", which was proposed in the v1 taxonomy and is included in the data model as an annotation. This list of tasks was created as part of the taxonomy research and design process, which sought to map the large, uncontrolled list of tasks represented in various tool categorizations to the following more concise list of values:

Values:

  • Analysis
  • Annotating and linking
  • Archiving and cleanup
  • Categorizing and tagging
  • Citing and referencing
  • Communication and supporting users
  • Converting and formatting content
  • Creating content
  • Deleting and reverting
  • Disambiguation
  • Downloading or reusing content
  • Editing or updating content
  • Event and contest planning
  • Hosting and maintaining tools
  • Identifying policy violations
  • Identifying spam
  • Identifying vandalism
  • Listing and ranking
  • Merging content
  • Migrating content
  • Patrolling recent changes
  • Project management and reporting
  • Reading
  • Recommending content
  • Translating and localizing
  • Uploading or importing
  • User management
  • Warning users
Feedback received and changes implemented to this attribute

Initial questions:

  • Is "fixing" content more like Editing or more like Creating / Generating new content? Or do people generally consider it to be more like cleanup, closer to tasks like archiving unused pages or cleaning up sandboxes?
    • Similar questions about what is covered by "Patrolling" – too broad?
  • How do you feel about the number of values and what they capture? Is it too overwhelming? Should we try to make them even broader groupings? For reference: here is the even bigger list of terms that was used to generate this controlled vocabulary.

Feedback received:

  • "Tasks: love this list. "Patrolling" is probably too broad, as you say. "Communication and supporting users" seems broad as well; that could include tasks related to education, to building community, etc." [4]
  • "In the tasks there is a division which says Creating or uploading content IMHO these are two separate tasks supporting different projects creating content refers to article editing. While uploading is related to media files and may overlap with Converting and Formatting assuming its about files types and not page clean up."[5]
  • "I think adding and/or updating content is not well covered by the other tasks categories and could be a useful addition...I think it will be better to remove the "adding" part, because it can be considered a particular case within "updating" [...] I believe that "editing" mainly involves content introduced by the user with total or great freedom, while "updating" involves a fully or almost fully automatic change proposal with which the user only has to interact minimally."[6]

Changes to be implemented:

  • Revise the Tasks attribute values:
    • Remove "Creating or uploading content"
    • Add "Creating new content"
    • Rename "Generating and recommending content" to "Recommending content"
    • Add "Uploading or importing"
    • Rename "Editing" to "Editing or updating"
    • Remove "Patrolling"
    • Add:
      • Identifying policy violations
      • Identifying spam
      • Identifying vandalism
      • Patrolling recent changes
      • Warning users

Subject Domains

[edit]

Is the tool targeted at helping in a specific type of wiki project or topic area?

Values:

  • Biography
  • Cultural heritage
  • Education
  • Geography and mapping
  • GLAM
  • History
  • Language and internationalization
  • Outreach
  • Science
Feedback received and changes implemented to this attribute

Attributes proposed but excluded

[edit]
Expand to see attributes that were excluded after feedback and discussion

Platforms

[edit]

Where does the tool run?

Proposed values:

  • Command-line
  • Desktop
  • MediaWiki
  • Mobile / smartphone
  • Web or browser

Initial questions:

  • Multiple of these values are already represented in the uncontrolled "tool type" field. Do we think it's worth having a controlled attribute for this concept?

Feedback received:

  • "The list of platforms is really confusing. An on-wiki gadget, for example, could come under desktop, mobile, MediaWiki and web/browser. I would expect to be able to distinguish mobile apps from web tools which work on mobile, web tools which work on mobile from web tools which only work on desktop, web tools which work on desktop from browser extensions, and tools on external websites from on-wiki tools. What about command-line tools that can be used in PAWS? Do they count as web/browser tools too?"[7]
  • "I'm a bit confused around the Platforms -- what would be the difference between desktop, mobile/smartphone, and web/browser? In my limited understanding, if a tool has a web interface, it nominally works for all three. Maybe there are tools that were designed specifically for mobile phones but I assume for most, web/browser covers it."[8]

Decision:

  • Exclude the proposed Platform attribute for now. Monitor tags and community-created lists to determine if this attribute would be useful or feasible in the future.

Programming languages

[edit]

What programming languages does the tool use?

Proposed initial set of values and their Wikidata QIDs (see phab:T308030#8045397 for background):

  • Javascript (Q2005)
  • JSON (Q2063)
  • Lua (Q207316)
  • MySQL (Q107385678)
  • Node.js (Q756100)
  • PHP (Q59)
  • Python (Q28865)
  • SPARQL (Q54871)
  • SQL (Q47607)

Initial questions:

  • Would you use this attribute to look for projects to contribute to based on your skills or learning goals?
  • Would you want this attribute to be broadened to include frameworks like Flask, Django, etc?

Feedback received:

  • "Programming languages: my personal thought here is that what's most useful about these attributes as a tool developer is seeing what other people are doing. Basically, I don't want to find myself "accidentally" doing something no one has ever done before, and so it's most useful for looking at solutions to "solved problems" and ensuring I'm adopting a tech stack that others are using."[9]
  • Several comments on JSON and other things that are not programming languages being included.[10]
  • "Frameworks might be better suited to separate uncontrolled attribute instead of being included with programming languages since tool authors could use any number of frameworks, which would vary based on programming language."[11]

Decision:

  • Exclude the proposed "Programming languages" attribute for now, and rely on annotations and the existing "technology_used" field in the data model (though that field is uncontrolled). Monitor tags and community-created lists to determine if this attribute would be useful or feasible in the future.

Taxonomy v1

[edit]

These categories will be superseded by or integrated into the v2 Toolhub taxonomy described above.

Categories from Research Phase 1 Data Model

Audiences v1

[edit]

This refers to the audience categories in the Wikimedia Resource Center, which currently are:

  • For program coordinators
  • For contributors
  • For developers
  • For affiliate organizers

Tool use cases

[edit]

Use cases for tools are represented by a controlled vocabulary meant to represent different purposes a tool may serve. Tools can have multiple use cases.

To put it briefly, tools can be used for developing or consuming content, facilitating interactions among community users, writing code, and organizing projects. With respect to content-related tools, the type of content is treated separately from the thing done with the content; appropriate Wikimedia projects to use a given tool on are represented through a separate tool attribute.

  • Content format
    • Content pages (encyclopedia articles, original texts)
    • Media (images, videos, sound recordings)
    • Data (Wikidata items, structured file data)
    • Code
    • Templates
    • Documentation
  • Contributors
    • Prepare
      • Research
      • Collection curation (curating datasets, curating image sets)
    • Create
      • Page creation
      • Uploading
      • Drafting
    • Change
      • Annotating
      • Expanding
      • Copyediting
      • Formatting
      • Illustrating
      • Renaming
      • Merging
      • Splitting
      • Categorizing
      • Format conversion (e.g. OCR, video conversion)
    • Quality assurance
      • Copyright management
      • New page patrolling
      • Recent changes patrolling
      • Maintenance tagging
      • Assessment
    • Destroy
      • Reverting
      • Deleting
      • Suppressing
  • Interacting with users
    • Socializing users
      • Welcoming
      • Training and mentoring
      • Counseling and social support
    • Conduct
      • Reverting
      • Warning
      • Blocking
      • Dispute resolution
    • Other
      • Assistance (solving specific problems)
      • Talk page discussion
      • User rights (admin, rollback, etc.)
      • User activity analysis
  • Developers
    • APIs
    • Coding environments
    • Data services
    • Productivity tools
    • Tool development kits
    • Wikimedia operational tools
  • Organizers
    • Online project planning (WikiProjects, etc.)
    • Event planning
    • Contest organizing
    • Governance
    • Learning and evaluation
    • Worklist development
    • Project communication
    • Partnership development
  • Consumers
    • Reading
    • Data and metrics
    • Visualization and remixing
    • Large-scale content analysis

Technical details

[edit]

This section mostly serves to document technical implementation details. You don't need to know most of this stuff for day-to-day use.

Some parts of the data model rely on controlled vocabularies, where a field can only be defined using one of several pre-defined terms. Those are described above in #Controlled vocabularies.

Toolinfo schema

[edit]

Version 1.2.2

[edit]

Version 1.2.2 for the schema, published on 16 March 2022. This schema introduces a new "person" data type and allows it to be used to declare multiple authors. The url_multilingual object definition no longer allows additional undeclared properties to pass validation.

Version 1.2.1 of the schema, published on 06 January 2022. This schema introduces two new tool types: "lua module" and "template".

Version 1.2.0 of the schema, published on 15 October 2021. This schema includes some new fields, but maintains backwards compatibility with the previous 1.1.1 and 1.0.0 schemas.

Changes from 1.1.1:

  • Update syntax for json-schema draft 7
  • Fix validation rules for "license" property. Prior schema referenced a non-existent spdx schema.
  • Add "user_docs_url" property.
  • Various description string copy edits.
  • MaxLength constraints added for all string types
  • Extracted #/definitions/url
  • Extracted #/definitions/url_multilingual_or_array
  • toolinfo_version replaced by $schema
  • toolinfo_language replaced by $language
{
  "title": "toolinfo",
  "description": "A tool is a piece of software that helps facilitate contribution toward, or consumption of, Wikimedia projects and associated data, not including the core MediaWiki software and its extensions.",
  "$id": "/toolinfo/1.2.2",
  "$schema": "http://json-schema.org/draft-07/schema#",
  "definitions": {
    "tool": {
      "type": "object",
      "properties": {
        "name": {
          "type": "string",
          "maxLength": 255,
          "description": "Unique identifier for this tool. Must be unique for every tool. It is recommended you prefix your tool names to reduce the risk of clashes.",
          "examples": [
            "toolforge-admin",
            "user-bdavis_wmf-GlobalWatchlistReset.js"
          ]
        },
        "title": {
          "type": "string",
          "maxLength": 255,
          "description": "Human readable tool name. Recommended limit of 25 characters."
        },
        "description": {
          "type": "string",
          "maxLength": 65535,
          "description": "A longer description of the tool. The recommended length for a description is 3-5 sentences. Future versions of this schema will impose a character limit."
        },
        "url": {
          "$ref": "#/definitions/url",
          "description": "A direct link to the tool or to instructions on how to use or install the tool."
        },
        "keywords": {
          "type": "string",
          "maxLength": 2047,
          "description": "[DEPRECATED] Comma-delineated list of keywords. This parameter is deprecated and will be removed in the next major version.",
          "$comment": "Remove in version 2."
        },
        "author": {
          "oneOf": [
            {
              "type": "string",
              "maxLength": 255
            },
            {
              "type": "array",
              "items": {
                "$ref": "#/definitions/person"
              }
            }
          ],
          "description": "The primary tool developers."
        },
        "repository": {
          "$ref": "#/definitions/url",
          "description": "A link to the repository where the tool code is hosted."
        },
        "subtitle": {
          "type": "string",
          "maxLength": 255,
          "description": "Longer than the full title but shorter than the description. It should add some additional context to the title."
        },
        "openhub_id": {
          "type": "string",
          "maxLength": 255,
          "description": "The project ID on OpenHub. Given a URL of https://openhub.net/p/foo, the project ID is `foo`."
        },
        "url_alternates": {
          "type": "array",
          "items": {
            "$ref": "#/definitions/url_multilingual"
          },
          "description": "Alternate links to the tool or install documentation in different natural languages."
        },
        "bot_username": {
          "type": "string",
          "maxLength": 255,
          "description": "If the tool is a bot, the Wikimedia username of the bot. Do not include 'User:' or similar prefixes."
        },
        "deprecated": {
          "type": "boolean",
          "default": false,
          "description": "If true, the use of this tool is officially discouraged. The `replaced_by` parameter can be used to define a replacement."
        },
        "replaced_by": {
          "$ref": "#/definitions/url",
          "description": "If this tool is deprecated, this parameter should be used to link to the replacement tool."
        },
        "experimental": {
          "type": "boolean",
          "default": false,
          "description": "If true, this tool is unstable and can change or go offline at any time."
        },
        "for_wikis": {
          "oneOf": [
            {
              "type": "array",
              "items": {
                "$ref": "#/definitions/wiki"
              }
            },
            {
              "$ref": "#/definitions/wiki"
            }
          ],
          "default": "*",
          "description": "A string or array of strings describing the wiki(s) this tool can be used on. Use hostnames such as `zh.wiktionary.org`. Use asterisks as wildcards. For example, `*.wikisource.org` means 'this tool works on all Wikisource wikis.' `*` means 'this works on all wikis, including Wikimedia wikis.'"
        },
        "icon": {
          "$ref": "#/definitions/commons_file",
          "description": "A link to a Wikimedia Commons file description page for an icon that depicts the tool."
        },
        "license": {
          "type": "string",
          "maxLength": 255,
          "description": "The software license the tool code is available under. Use a standard SPDX license identifier like 'GPL-3.0-or-later'.",
          "examples": [
            "GPL-2.0-or-later",
            "GPL-3.0-or-later"
          ]
        },
        "sponsor": {
          "$ref": "#/definitions/string_or_string_array",
          "description": "Organization that sponsored the tool's development."
        },
        "available_ui_languages": {
          "oneOf": [
            {
              "type": "array",
              "items": {
                "$ref": "#/definitions/language"
              }
            },
            {
              "$ref": "#/definitions/language"
            },
            {
              "type": "string",
              "maxLength": 1,
              "enum": [
                "*"
              ]
            }
          ],
          "default": "en",
          "description": "The language(s) the tool's interface has been translated into. Use ISO 639 language codes like `zh` and `scn`. If not defined it is assumed the tool is only available in English."
        },
        "technology_used": {
          "$ref": "#/definitions/string_or_string_array",
          "description": "A string or array of strings listing technologies (programming languages, development frameworks, etc.) used in creating the tool."
        },
        "tool_type": {
          "type": "string",
          "maxLength": 32,
          "enum": [
            "web app",
            "desktop app",
            "bot",
            "gadget",
            "user script",
            "command line tool",
            "coding framework",
            "lua module",
            "template",
            "other"
          ],
          "description": "The manner in which the tool is used. Select one from the list of options."
        },
        "api_url": {
          "$ref": "#/definitions/url",
          "description": "A link to the tool's API, if available."
        },
        "developer_docs_url": {
          "$ref": "#/definitions/url_multilingual_or_array",
          "description": "A link to the tool's developer documentation, if available."
        },
        "user_docs_url": {
          "$ref": "#/definitions/url_multilingual_or_array",
          "description": "A link to the tool's user documentation, if available."
        },
        "feedback_url": {
          "$ref": "#/definitions/url_multilingual_or_array",
          "description": "A link to location where the tool's user can leave feedback."
        },
        "privacy_policy_url": {
          "$ref": "#/definitions/url_multilingual_or_array",
          "description": "A link to the tool's privacy policy, if available."
        },
        "translate_url": {
          "$ref": "#/definitions/url",
          "description": "A link to the tool's translation interface."
        },
        "bugtracker_url": {
          "$ref": "#/definitions/url",
          "description": "A link to the tool's bug tracker on GitHub, Bitbucket, Phabricator, etc."
        },
        "_schema": {
          "type": "string",
          "format": "uri-reference",
          "maxLength": 32,
          "description": "A URI identifying the jsonschema for this toolinfo.json record. This should be a short uri containing only the name and revision at the end of the URI path.",
          "examples": [
            "/toolinfo/1.2.1"
          ]
        },
        "_language": {
          "$ref": "#/definitions/language",
          "default": "en",
          "description": "The language in which this toolinfo record is written. If not set, the default value is English. Use ISO 639 language codes."
        }
      },
      "required": [
        "name",
        "title",
        "description",
        "url"
      ]
    },
    "url": {
      "type": "string",
      "maxLength": 2047,
      "format": "uri"
    },
    "wiki": {
      "type": "string",
      "maxLength": 255,
      "pattern": "^(\\*|(.*)?\\.?(mediawiki|wiktionary|wiki(pedia|quote|books|source|news|versity|data|voyage|media))\\.org)$"
    },
    "commons_file": {
      "$ref": "#/definitions/url",
      "pattern": "^https://commons.wikimedia.org/wiki/File:.+\\..+$",
      "maxLength": 2047
    },
    "language": {
      "type": "string",
      "maxLength": 16,
      "pattern": "^(x-.*|[A-Za-z]{2,3}(-.*)?)$"
    },
    "url_multilingual": {
      "type": "object",
      "properties": {
        "language": {
          "$ref": "#/definitions/language"
        },
        "url": {
          "$ref": "#/definitions/url"
        }
      },
      "additionalProperties": false
    },
    "url_multilingual_or_array": {
      "oneOf": [
        {
          "type": "array",
          "items": {
            "$ref": "#/definitions/url_multilingual"
          }
        },
        {
          "$ref": "#/definitions/url"
        }
      ]
    },
    "string_or_string_array": {
      "oneOf": [
        {
          "type": "string",
          "maxLength": 255
        },
        {
          "type": "array",
          "items": {
            "type": "string",
            "maxLength": 255
          }
        }
      ]
    },
    "person": {
      "type": "object",
      "properties": {
        "name": {
          "type": "string",
          "maxLength": 255,
          "description": "The full/formatted name of the person."
        },
        "wiki_username": {
          "type": "string",
          "maxLength": 255,
          "description": "The person's Wikimedia username."
        },
        "developer_username": {
          "type": "string",
          "maxLength": 255,
          "description": "The person's Wikimedia Developer account username."
        },
        "email": {
          "type": "string",
          "maxLength": 255,
          "format": "email",
          "description": "Email address"
        },
        "url": {
          "$ref": "#/definitions/url",
          "description": "Home page or other URL representing the person."
        }
      },
      "required": [
        "name"
      ],
      "additionalProperties": false
    }
  },
  "oneOf": [
    {
      "type": "array",
      "items": {
        "$ref": "#/definitions/tool"
      }
    },
    {
      "$ref": "#/definitions/tool"
    }
  ]
}

Version 1.1.1

[edit]

Version 1.1.0, published on 30 June 2018, updated the schema with new fields while maintaining full backwards compatibility with the previous schema.

Version 1.1.1, published on 13 October 2018, corrects a typographical error from 1.1.0.

The JSON Schema is below.

toolinfo.json schema v1.1.1
{
  "$schema": "http://json-schema.org/draft-04/schema#",
  "id": "https://tools.wmflabs.org/toolhub/schema/1.1.1",
  "title": "Wikimedia Tool",
  "description": "A tool is a piece of software that helps facilitate contribution toward, or consumption of, Wikimedia projects and associated data, not including the core wiki software and its extensions",
  "version": "1.1.1",
  "authors": [
    "Hay Kranen",
    "James Hare"
  ],
  "definitions": {
    "tool": {
      "type": "object",
      "properties": {
        "name": {
          "type": "string",
          "description": "Unique identifier for tools. Must be unique for every tool. It is recommended you prefix your tool names to reduce the risk of clashes."
        },
        "title": {
          "type": "string",
          "description": "Human readable tool name. Recommended limit of 25 characters."
        },
        "subtitle": {
          "type": "string",
          "maxLength": 250,
          "description": "Longer than the full title but shorter than the description. It should add some additional context to the title."
        },
        "openhub_id": {
          "type": "string",
          "description": "The project ID on OpenHub. Given a URL https://openhub.net/p/foo, the project ID is `foo`."
        },
        "description": {
          "type": "string",
          "description": "A longer description of the tool. The recommended length for a description is 3-5 sentences. Future versions of this schema will impose a character limit."
        },
        "url": {
          "type": "string",
          "format": "uri",
          "description": "A direct link to the tool or to instructions on how to use or install the tool."
        },
        "url_alternates": {
          "type": "array",
          "items": {
            "$ref": "#/definitions/url_multilingual"
          }
        },
        "keywords": {
          "type": "string",
          "description": "Comma-delineated list of keywords. This parameter is deprecated and will be removed in the next version."
        },
        "author": {
          "type": "string",
          "description": "The primary tool developer."
        },
        "repository": {
          "type": "string",
          "format": "uri",
          "description": "A link to the repository where the tool code is hosted."
        },
        "bot_username": {
          "type": "string",
          "description": "If the tool is a bot, the Wikimedia username of the bot. Do not include 'User:' or similar prefixes."
        },
        "deprecated": {
          "type": "boolean",
          "default": false,
          "description": "If true, the use of this tool is officially discouraged. The `replaced_by` parameter can be used to define a replacement."
        },
        "replaced_by": {
          "type": "string",
          "format": "uri",
          "description": "If this tool is deprecated, this parameter should be used to link to the replacement tool."
        },
        "experimental": {
          "type": "boolean",
          "default": false,
          "description": "If true, this tool is unstable and can change or go offline at any time."
        },
        "for_wikis": {
          "oneOf": [
            {
              "type": "array",
              "items": {
                "$ref": "#/definitions/wiki"
              }
            },
            {
              "$ref": "#/definitions/wiki"
            }
          ],
          "default": "*",
          "description": "A string or array of strings describing the wiki(s) this tool can be used on. Use hostnames such as `zh.wiktionary.org`. Use asterisks as wildcards. For example, `*.wikisource.org` means 'this tool works on all Wikisource wikis.' `*` means 'this works on all wikis, including Wikimedia wikis.'"
        },
        "icon": {
          "$ref": "#/definitions/commons_file",
          "description": "A link to a Wikimedia Commons file description page for an icon that depicts the tool."
        },
        "license": {
          "$ref": "https://tools.wmflabs.org/spdx/schema/licenses.json#/definitions/license",
          "description": "The software license the tool code's is available under. Use a standard SPDX license keyword."
        },
        "sponsor": {
          "$ref": "#/definitions/string_or_string_array",
          "description": "Organization that sponsored the tool's development."
        },
        "available_ui_languages": {
          "oneOf": [
            {
              "type": "array",
              "items": {
                "$ref": "#/definitions/language"
              }
            },
            {
              "$ref": "#/definitions/language"
            },
            {
              "type": "string",
              "enum": [
                "*"
              ]
            }
          ],
          "default": "en",
          "description": "The language(s) the tool's interface has been translated into. Specify this field manually only if the tool does not handle interface translation through translatewiki.net. Use ISO 639 language codes like `zh` and `scn`. If not defined it is assumed the tool is only available in English."
        },
        "technology_used": {
          "$ref": "#/definitions/string_or_string_array",
          "description": "A string or array of strings listing technologies (programming languages, development frameworks, etc.) used in creating the tool."
        },
        "tool_type": {
          "type": "string",
          "enum": [
            "web app",
            "desktop app",
            "bot",
            "gadget",
            "user script",
            "command line tool",
            "coding framework",
            "other"
          ],
          "description": "The manner in which the tool is used. Select one from the list of options."
        },
        "api_url": {
          "type": "string",
          "format": "uri",
          "description": "A link to the tool's API, if available."
        },
        "developer_docs_url": {
          "oneOf": [
            {
              "type": "array",
              "items": {
                "$ref": "#/definitions/url_multilingual"
              }
            },
            {
              "type": "string",
              "format": "uri"
            }
          ],
          "description": "A link to the tool's developer documentation, if available."
        },
        "feedback_url": {
          "oneOf": [
            {
              "type": "array",
              "items": {
                "$ref": "#/definitions/url_multilingual"
              }
            },
            {
              "type": "string",
              "format": "uri"
            }
          ],
          "description": "A link to where tool users can leave feedback."
        },
        "privacy_policy_url": {
          "oneOf": [
            {
              "type": "array",
              "items": {
                "$ref": "#/definitions/url_multilingual"
              }
            },
            {
              "type": "string",
              "format": "uri"
            }
          ],
          "description": "A link to the tool's privacy policy, if available."
        },
        "translate_url": {
          "type": "string",
          "format": "uri",
          "description": "A link to the tool translation interface."
        },
        "bugtracker_url": {
          "type": "string",
          "format": "uri",
          "description": "A link to the tool's bug tracker on GitHub, Bitbucket, Phabricator, etc."
        },
        "toolinfo_version": {
          "type": "integer",
          "default": 1,
          "description": "The major version number of the Toolinfo schema used. The default value assumed is 1, referring to versions 1.0.0 and 1.1.0."
        },
        "toolinfo_language": {
          "$ref": "#/definitions/language",
          "default": "en",
          "description": "The language the toolinfo record is written if, if not the default value of English. Use ISO 639 language codes."
        }
      },
      "required": [
        "name",
        "title",
        "description",
        "url"
      ]
    },
    "wiki": {
      "type": "string",
      "pattern": "^(%5C*|(.*)?%5C.?(mediawiki|wiktionary|wiki(pedia|quote|books|source|news|versity|data|voyage|tech|media|mediafoundation))%5C.org)$"
    },
    "commons_file": {
      "type": "string",
      "format": "uri",
      "pattern": "^https://commons.wikimedia.org/wiki/File:.+%5C..+$"
    },
    "language": {
      "type": "string",
      "pattern": "^(x-.*|[A-Za-z]{2,3}(-.*)?)$"
    },
    "url_multilingual": {
      "type": "object",
      "properties": {
        "language": {
          "$ref": "#/definitions/language"
        },
        "url": {
          "type": "string",
          "pattern": "uri"
        }
      }
    },
    "string_or_string_array": {
      "oneOf": [
        {
          "type": "string"
        },
        {
          "type": "array",
          "items": {
            "type": "string"
          }
        }
      ]
    }
  },
  "oneOf": [
    {
      "type": "array",
      "items": {
        "$ref": "#/definitions/tool"
      }
    },
    {
      "$ref": "#/definitions/tool"
    }
  ]
}

Version 1.0.0

[edit]

Hay's Tool Directory established a de facto standard for describing Wikimedia tools using JSON files. This standard has been retroactively established as version 1.0.0 of the toolinfo JSON schema.

toolinfo.json schema v1.0.0
{
  "$schema": "http://json-schema.org/draft-07/schema#",
  "title": "Wikimedia Tool",
  "description": "A tool is a piece of software that helps facilitate contribution toward, or consumption of, a Wikimedia project, not including the core wiki software and its extensions",
  "version": "1.0.0",
  "authors": [
    "Hay Kranen",
    "James Hare"
  ],
  "definitions": {
    "tool": {
      "type": "object",
      "properties": {
        "name": {
          "type": "string"
        },
        "title": {
          "type": "string"
        },
        "description": {
          "type": "string"
        },
        "url": {
          "type": "string"
        },
        "keywords": {
          "type": "string"
        },
        "author": {
          "type": "string"
        },
        "repository": {
          "type": "string"
        }
      },
      "required": [
        "name",
        "title",
        "description",
        "url"
      ]
    }
  },
  "oneOf": [
    {
      "type": "array",
      "items": {
        "$ref": "#/definitions/tool"
      }
    },
    {
      "type": "object",
      "$ref": "#/definitions/tool"
    }
  ]
}

Annotations

[edit]

These are additional pieces of data that can be used to describe tools. Annotations cannot be submitted through toolinfo files; they are meant to be submitted through Toolhub or the Toolhub API only.

Current planned annotations include:

  • Additional info – expands on the tool description.
  • Audiences – Wikimedia Resource Center audiences.
  • Broken – yes/no flag to indicate that a tool is no longer working, with the username of the person making that report and an associated report.
  • Collections – community-curated groupings of tools.
  • Documentation URL – link to user documentation, including both official documentation and user-generated documentation.
  • Official maintainer – the people who are currently responsible for maintaining the tool's code.
  • Related topics – links between tools and Wikidata items as another way of describing tools.
  • Screenshots – visual aids showing the tool in use.
  • Testers – people who have signed up to test new versions of the tool.
  • Use cases – controlled vocabulary outlining different uses for tools.
  • Video – tutorials and other such audio-visual guides.
  • Volunteer (user assistance) – people who have volunteered to help other users with using the tool.
  • Wikidata item ID – the Wikidata item ID for the tool.

Automated data inputs

[edit]

Automatically generated data will help factor into tool relevance. Note that not all of these will be available right away, nor will they be available for every tool.

  • Tool availability – is the tool up? When was the last time it was up? How often is the tool down?
  • Translators – credits for translation, based on translatewiki.net statistics.
  • Total gadget users – based on data from the wikis
  • Active gadget users – based on data from the wikis
  • Web hits – for Toolforge tools, based on data from Toolforge
  • Unique devices – nice to have, but would probably be harder to accomplish in practice
  • Last updated – based on changes to git repository, probably?
  • Wikis where used – for gadgets
  • Toolforge maintainers