Jump to content

User:Lucas Werkmeister/QuickCategories

From Meta, a Wikimedia project coordination wiki

QuickCategories is a tool to add and remove categories from pages in batches. It’s similar to Magnus Manske’s QuickStatements tool for Wikidata, but editing categories instead of statements, and for all Wikimedia wikis, not just Wikidata.

Usage

[edit]

Submitting batches

[edit]

First, log in via OAuth. Then you can start to create batches on the index page. Enter the domain of the wiki you want to work with (en.wikipedia.org, commons.wikimedia.org, etc.) into the first input field, and the batch contents into the second one (the large, multi-line text area). When you’re done, press the “submit batch” button below the second input field.

A batch consists of a list of commands, which are specified in individual lines of the input: each line corresponds to one command. A command is a list of fields separated by vertical bar (|) or Tab characters, where the first field is the page that the command applies to and the remaining fields specify categories to be added (+Category:Category name) or removed (-Category:Category name). The page can be in any namespace, but the categories must always be specified using the English word “Category” (though the tool will use the localized name when actually adding the category, adding e. g. [[Kategorie:Category name]] on German-language wikis).

Optionally, you can specify a title for a batch, to explain what it’s doing and why. The title will be included in the edit summaries of edits made for this batch, so [[wikilinks]] can be used, but other elements of wikitext syntax are not supported.

You can (within reason) specify any number of actions (add or remove category) for a page, and any number of commands in a batch. Categories that are already present on a page will not be added again, and categories that are not present cannot be removed either, so it’s possible that no edit will be made when running a command. You can think of it like this: a command describes a state that the page should be brought into, and if it is already in that state, there’s nothing more to do.

If you often work with the same few wikis, you can enter them on the preferences page as the “suggested wiki domains for new batches”, and they will replace the default suggestions. If you most frequently work with just one wiki, you can enter that as the “default wiki domain for new batches” and it will be the default value of the first input field on the index page. (You can still edit the value of that input field to submit batches for other wikis.)

Sort key support

[edit]

You can also modify the sort key of a category link, which is separated from the title by a hash (#) character. (MediaWiki uses the vertical bar for this, but in QuickCategories that already separates multiple fields.) Several flavors are available:

+Category:Title#sort key
If the page isn’t yet in this category at all, add a category link with that sort key, otherwise do nothing.
+Category:Title##sort key
If the page isn’t yet in this category, add a category link with that sort key. If it has a category link for that category without a sort key, add the sort key. If it has a category link with a different sort key, do nothing.
+Category:Title###sort key
If the page isn’t yet in this category, add a category link with that sort key. If it has a category link for that category, with or without sort key, set its sort key. This can also be used to remove a sort key by omitting it in the action (e. g. +Category:Some category###|+Category:Something else, i. e. with nothing after the ###).
-Category:Title#sort key
If the page has a category link with this sort key, remove it, otherwise do nothing.

The most useful mode is probably +Category:Title##sort key, leaving existing sort keys intact, but if you know that your sort key is better than the existing one you can also use +Category:Title###sort key to override it.

Redirects

[edit]

Since 2020-02-16, redirects are resolved by default: if a batch contains the line A|+Category:C, and page A is actually a redirect to page B, then the category will be added to page B, not page A. If you want to edit the categories of the redirect itself, you can prefix the title with an exclamation mark: !A|+Category:C will always edit page A.

Prior to 2020-02-16, the ! syntax was not available, and batches would always edit the specified page even if it was a redirect. For compatibility, batches which were created prior to this date but not completely run will continue to behave this way: that is, they will display without the exclamation mark (as that is how they were specified originally), but if they are run now they will behave as if the exclamation mark was there (as that was the default behavior when they were created).

PagePile support

[edit]

Instead of listing all the pages in a batch each with individual commands, you can also create a new batch from a PagePile (also available as an option in the “use this pagepile in one of these tools” menu on an individual PagePile page). In this case, the list of actions you specify is applied to all the pages from that PagePile; the syntax is the same as for normal batches, just with the first field (the page) missing, e. g. +Category:Category name|-Category:Category name.

Note that the pages are imported from the PagePile at batch creation time, and later changes to the PagePile will have no effect. The tool also doesn’t currently keep track of whether a batch was created from a PagePile, so you may want to mention it in the batch title.

Running batches

[edit]

After submitting a batch, you will be redirected to the page for this batch. It has been assigned a unique ID by which you and others can review the contents of this batch, now or at any later time, by loading the same URL where you were redirected. (A link to the batch is also included in the edit summaries of any edits made as part of this batch.) The commands of the batch are listed in pages, with fifty commands per page by default; you can navigate to the previous or next page of commands (of the same batch) with the buttons at the top, or directly adjust the offset and limit parameters in the URL.

If any of the commands on the current page have not yet been run (the default state if you just submitted this batch), and you are logged in as the same user who submitted this batch, then you can use the “run these commands” button at the bottom of the page to run the commands being shown. (To run only a few commands, set a smaller limit in the URL.) This may take a while (the tool can run a bit over two commands per second – a full page of fifty commands takes some 22 s to run), but afterwards you should see the page in a different state: the commands should now say “done” instead of “not done”. The “done” badge is also a link to the edit if an edit was made, or otherwise to the revision of the page where the tool determined that no edit was necessary. If the batch has more pages, you may now want to proceed to the next page to run those commands as well, and so on.

You can also check your contributions on the target wiki to see what edits the tool is making (hopefully none other than the ones you specified).

Single-category additions are marked as minor edits, all other edits are considered normal (unless you have the “Mark all edits minor by default” preference set, in which case all edits by this tool are minor). The tool also sets the “bot” flag on all its edits, so if you’re a bot, your edits will not be shown in recent changes by default (otherwise, MediaWiki ignores the flag).

Background runs

[edit]

If you are autoconfirmed on the target wiki, you can also run batches in the background, using the grey “run whole batch in the background” button next to the “run these commands” button. After the page reloads, you’ll see a line of text near the top of the page indicating when the background run was started, as well as a button to stop it and a link to the full history of background runs of this batch (you can start and stop the same batch running in the background multiple times). If nobody stops a background run explicitly, it will stop automatically as soon as there are no more commands to run (or when certain errors, e. g. “you are blocked from editing”, are encountered). In this case the text also indicates when the background run stopped, and the “stop” button disappears.

Running batches in the background is slightly slower, but much more convenient for you: you don’t need to run each page of commands in a long batch separately – in fact, you don’t need to do anything while the commands are running, and can even turn off your computer, e. g. while the batch is running in the background overnight.

In addition to the user who submitted a batch, background runs can also be stopped by administrators on the target wiki, as well as by stewards, to reduce the impact of vandalism via this tool. (However, only the submitting user can start background runs.)

Generating batches using the Wikidata query service

[edit]

You can also use the Wikidata Query Service to generate batches. By default, it outputs data in tabular form, so you can copy its rows and columns (in Firefox, hold down the Ctrl key while dragging the mouse across the table to select by row/column instead of textually) and paste them into the “new batch” form. To example uses are described below.

Remember that with great power comes great responsibility – make sure that you only edit pages where you’re sure the category is correct! It may be a good idea to start with a shorter list of commands first and see if there are objections before proceeding with the rest.

Categorizing categories based on Wikidata information

[edit]

If the pages you want to add categories to are themselves categories, you can use the category graph in the Wikidata Query Service to generate commands for a batch where a category is not yet a subcategory of the desired category. This is especially useful on Wikimedia Commons. You can use the following query as a template:

SELECT ?item ?commonsCategory ?page ?command1
WITH {
  # SELECT ITEMS HERE
  # should look like: SELECT DISTINCT ?item WHERE { … }
} AS %items
WITH {
  SELECT ?item ?commonsCategory WHERE {
    hint:SubQuery hint:optimizer "None". # we have to disable the optimizer for this subquery because it gets confused by the ?commonsCategoryStatementEn part
    INCLUDE %items.
    OPTIONAL {
      ?item wdt:P373 ?commonsCategoryStatement.
      BIND(STRLANG(CONCAT("Category:", ?commonsCategoryStatement), "en") AS ?commonsCategoryStatementEn)
      ?commonsCategoryFromStatement schema:name ?commonsCategoryStatementEn;
                                    schema:isPartOf <https://commons.wikimedia.org/>.
    }
    OPTIONAL {
      ?commonsCategoryFromSitelink schema:about ?item;
                                   schema:isPartOf <https://commons.wikimedia.org/>.
      FILTER(STRSTARTS(STR(?commonsCategoryFromSitelink), "https://commons.wikimedia.org/wiki/Category:"))
    }
    BIND(COALESCE(?commonsCategoryFromStatement, ?commonsCategoryFromSitelink) AS ?commonsCategory)
    FILTER(BOUND(?commonsCategory))
  }
} AS %itemsWithCommonsCategories
WHERE {
  INCLUDE %itemsWithCommonsCategories.
  MINUS {
    SERVICE <https://query.wikidata.org/bigdata/namespace/categories/sparql> {
      SERVICE gas:service {
        gas:program gas:gasClass "com.bigdata.rdf.graph.analytics.BFS";
                    gas:linkType mediawiki:isInCategory;
                    gas:traversalDirection "Reverse";
                    gas:in <https://commons.wikimedia.org/wiki/Category:CATEGORY_GOES_HERE>;
                    gas:out ?commonsCategory.
      }
    }
  }
  BIND(REPLACE(wikibase:decodeUri(SUBSTR(STR(?commonsCategory), STRLEN("https://commons.wikimedia.org/wiki/") + 1)), "_", " ") AS ?page)
  BIND("+Category:CATEGORY GOES HERE" AS ?command1)
}

Cliquez pour essayer !

In place of SELECT ITEMS HERE near the top, insert a query selecting the items you’re interested in and whose Commons categories you want to edit. You can ask for help with this part on Wikidata’s Request a query page, if you want. At the bottom, replace CATEGORY_GOES_HERE and CATEGORY GOES HERE with the title of the category, first in “URL form” (underscores instead of spaces), then in “title form” (spaces instead of underscores).

When you run the query, the last two columns (?page and ?command1) can be copied and used as input for QuickCategories.

Copying category members from another wiki

[edit]

When you’re working in a non-English language edition, you might sometimes create new categories on your wiki that already exist on English Wikipedia. To populate the category, you can add it to the pages in your wiki that are linked, via Wikidata, to the English Wikipedia category there. You can find those via the query service as well, using MWAPI and the categorymembers API module. The following query only selects direct category members (using Basque Wikipedia as the example target wiki):

SELECT ?titleEu ("+Category:BASQUE CATEGORY HERE" AS ?command) WHERE {
  hint:Query hint:optimizer "None".
  SERVICE wikibase:mwapi {
     bd:serviceParam wikibase:api "Generator";
                     wikibase:endpoint "en.wikipedia.org";
                     mwapi:generator "categorymembers";
                     mwapi:gcmtitle "Category:ENGLISH_CATEGORY_HERE";
                     mwapi:gcmnamespace 0;
                     mwapi:gcmprop "title";
                     mwapi:gcmlimit "max".
     ?titleEn_ wikibase:apiOutput mwapi:title.
  }
  BIND(STRLANG(?titleEn_, "en") AS ?titleEn)
  ?articleEn schema:name ?titleEn;
             schema:isPartOf <https://en.wikipedia.org/>;
             schema:about ?item.
  ?articleEu schema:about ?item;
             schema:isPartOf <https://eu.wikipedia.org/>;
             schema:name ?titleEu.
}

Cliquez pour essayer !

The following variant selects members of subcategories as well:

SELECT ?titleEu ("+Category:BASQUE CATEGORY HERE" AS ?command) WHERE {
  hint:Query hint:optimizer "None".
  SERVICE <https://query.wikidata.org/bigdata/namespace/categories/sparql> {
    SERVICE mediawiki:categoryTree {
      bd:serviceParam mediawiki:start <https://en.wikipedia.org/wiki/Category:ENGLISH_CATEGORY_URL_HERE>;
                      mediawiki:direction "Reverse";
                      mediawiki:depth 5 . # change this if needed
    }
  }
  ?out schema:name ?categoryEn.
  SERVICE wikibase:mwapi {
     bd:serviceParam wikibase:api "Generator";
                     wikibase:endpoint "en.wikipedia.org";
                     mwapi:generator "categorymembers";
                     mwapi:gcmtitle ?categoryEn;
                     mwapi:gcmnamespace 0;
                     mwapi:gcmprop "title";
                     mwapi:gcmlimit "max".
     ?titleEn_ wikibase:apiOutput mwapi:title.
  }
  BIND(STRLANG(?titleEn_, "en") AS ?titleEn)
  ?articleEn schema:name ?titleEn;
             schema:isPartOf <https://en.wikipedia.org/>;
             schema:about ?item.
  ?articleEu schema:about ?item;
             schema:isPartOf <https://eu.wikipedia.org/>;
             schema:name ?titleEu.
}
LIMIT 100

Cliquez pour essayer !

In both cases, replace BASQUE CATEGORY HERE with the category you want to add on the target wiki, and ENGLISH_CATEGORY_HERE or ENGLISH_CATEGORY_URL_HERE with the English Wikipedia category whose members you are interested in.

Limitations

[edit]

Most of these limitations could, to some degree, be addressed if the need arises; please contact me on the talk page in that case.

Wikitext-based

[edit]

The tool analyzes and edits the Wikitext of the page, which means it doesn’t know anything about what templates do to the categories. If a template adds a category (e. g. based on information from Wikidata), the tool will not know that the category is already there and will add it explicitly to the article. It is also not possible to remove categories added via templates.

[edit]

While the tool will recognize category links under any supported namespace name (e. g. [[Category:…]], [[Категория:…]] or [[К:…]] on Russian Wikipedia) when checking whether a category is already present (and doesn’t need to be added again), it will always use the translated category namespace name to add new categories ([[Категория:…]] in this case).

The tool will also not recognize any other namespace names when parsing submitted batches: they must always use Category:… syntax.

Subcategories

[edit]

The tool is not aware of sub- or parent categories: when tasked to add [[Category:People]] to a page that already has the subcategory [[Category:Living people]], it will add the category, even though it may be considered redundant.

Underscore handling

[edit]

The tool will replace underscores with spaces when saving a submitted batch, so that, no matter whether you specified +Category:Category name or +Category:Category_name, [[Category:Category name]] will be added to the page. If you know of any categories that should really be added using underscores, similar to w:Category:Articles with underscores in the title, feel free to let me know, but even then I think it will almost certainly better to fix those category links using a separate bot, rather than change this tool and force all its users to replace underscores with spaces before submitting batches.

(Note: up to and including batch #10, the tool would use whatever combination of spaces and underscores was specified by the user, so you’ll see a few underscores in those early batches.)

Edit conflicts

[edit]

MediaWiki suppresses conflicts between edits by the same user, so I recommend not editing any pages in a batch manually (or with another tool) while the batch is running. It’s okay if anyone else edits them (either this tool or the other editor will get an edit conflict, and if the tool gets an edit conflict, it will automatically retry later), it’s just you who need to back off for a bit.

Protected pages

[edit]

The OAuth consumer used by the tool does not have the “edit protected pages” grant, so protected pages will be skipped even if you normally have the right to edit them.