Jump to content

User:Beetstra/Overhaul spam-blacklist

From Meta, a Wikimedia project coordination wiki

Overhaul spam-blacklist

  • Problem: The current blacklist system is archaic; it does not allow for levels of blacklisting, is confusing to editors. Main problems include that the spam blacklist is indiscriminate of namespace, userstatus, material linked to, etc. The blacklist is a crude, black-and-white choice, allowing additions by only non-autoconfirmed editors, or only by admins is not possible, nor is it possible to allow links in certain namespaces. Also giving warnings is not possible (on en.wikipedia, we implemented XLinkBot, who reverts and warns - giving a warning to IPs and 'new' editors that a certain link is in violation of policies/guidelines would be a less bitey solution).
  • Who would benefit: Editors on all Wikipedia's
  • Proposed solution: Basically, replace the current mw:Extension:SpamBlacklist with a new extension with an interface similar to mw:Extension:AbuseFilter, where instead of conditions, the field contains a set of regexes that are interpreted like the current spam-blacklists, providing options (similar to the current AbuseFilter) to decide what happens when an added external link matches the regexes in the field (see more elaborate explanation in collapsed box).

    Note: technically, the current AbuseFilter is capable of doing what would be needed, except that in this form it is extremely heavyweight to use for the number of regexes that is on the blacklists, or one would need to write a large number of rather complex AbuseFilters. The suggested filter is basically a specialised form of the AbuseFilter that only matches regexes against added external links. Alternatively, it could be incorporated into the current AbuseFilter as a specialized and optimized 'module'.

description of suggested implementation

description of suggested implementation

  1. Take the current AbuseFilter, create a copy of the whole extension, name it ExternalLinkFilter, take out all the code that interprets the rules ('conditions').
  2. Make 2 fields in replacement for the 'conditions' field:
    • one text field for regexes that block added external links (the blacklist). Can contain many rules (one on each line, like current spam-blacklist).
    • one text field for regexes that override the block (whitelist overriding this blacklist field; that is generally more simple, and cleaner than writing a complex regex, not everybody is a specialist on regexes).
  3. Leave all the other options:
    • Discussion field for evidence (or better, a talk-page like function)
    • Enabled/disabled/deleted (not turn it off when not needed anymore, delete when obsolete)
    • 'Flag the edit in the edit filter log' - maybe nice to be able to turn it off, to get rid of the real rubbish that doesn't need to be logged
    • Rate limiting - catch editors that start spamming an otherwise reasonably good link
    • Warn - could be a replacement for en:User:XLinkBot
    • Prevent the action - as is the current blacklist/whitelist function
    • Revoke autoconfirmed - make sure that spammers are caught and checked
    • Tagging - for certain rules to be checked by RC patrollers.
    • I would consider to add a button to auto-block editors on certain typical spambot-domains (a function currently taken by one of Anomie's bots on en.wikipedia).

This should overall be much more lightweight than the current AbuseFilter (all it does is regex-testing as the spam-blacklist does, only it has to cycle through maybe thousands of AbuseFilters). One could consider to expand it to have rules blocked or enabled on only certain pages (for heavily abused links that actually should only be used on it's own subject page). Another consideration would be to have a 'custom reply' field, pointing the editor that gets blocked by the filter as to why it was blocked.

Possible expanded features (though highly desired)
  1. create a separate userright akin AbuseFilterEditor for being able to edit spam filters (on en.wikipedia, most admins do not touch (or do not dare to touch) the blacklist, while there are non-admin editors who often help on the blacklist).
  2. Add namespace choice (checkboxes like in search; so one can choose not to blacklist something in one particular namespace, with addition of an 'all', a 'content-namespace only' and 'talk-namespace only'.
    • some links are fine in discussions but should not be used in mainspace, others are a total nono
    • some image links are fine in the file-namespace to tell where it came from, but not needed in mainspace (e.g. flickr is currently on revertlist on en.wikipedia's XLinkBot)
  3. Add user status choice (checkboxes for the different roles, or like the page-protection levels)
    • disallow IPs and new users to use a certain link (e.g. to stop spammers from creating socks, while leaving it free to most users).
    • warn IPs and new users when they use a certain link that the link often does not meet inclusion standards (e.g. twitter feeds are often discouraged as external links when other official sites of the subject exists; like the functionality of en:User:XLinkBot).
  4. block or whitelist links matching regexes on specific pages (disallow linking throughout except for on the subject page) - coding akin the title blacklist
  5. block links matching regexes when added by specific user/IP/IP-range (disallow specific users to use a domain) - coding akin the title blacklist
Downsides

We would lose a single full list of material that is blacklisted (the current blacklist is known to work as a deterrent against spamming). That list could however be independently created based on the current rules (e.g. by bot).


Modular approach: make the current AbuseFilter 'modular', where upon creation of a new filter, you can define a 'type' of filter. That module can be a module like the current existing AbuseFilter, or specialised modules like this spam-blacklist filter described above.


Discussion

[edit]