Jump to content

WikiLoop/DoubleCheck/RfC:Levels for WikiLoop DoubleCheck Reviewers

From Meta, a Wikimedia project coordination wiki
WikiLoop
First RfC.

Overview

[edit]

Reviewing Wikipedia edits is an important activity to ensure content quality, but it could also be abused or misused. Existing major review or counter-vandalism tools (e.g. STiki or Huggle) made the tradeoff by restricting access to their tools to only a small group of trusted and privileged users. WikiLoop DoubleCheck, an open source web app for easier reviewing edits, want to explore an alternative model: just like everyone can edit Wikipedia, the DoubleCheck would like to allow everyone to review Wikipedia, while having mechanisms to address the concerns about reviewer competence and trustworthiness.

In this page we propose a design to allow reviewers with different levels to have different restrictions or powers.

Ideally we hope to complete the discussion by end of 2020-08-29 23:59, unless the community suggests a large scale change to this proposal.

Design summary

[edit]

We propose the following ladders and their restrictions or additional powers for WikiLoop DoubleCheck.

A DoubleCheck reviewer's credibility ladder will be determined by the Wikipedia permissions or # of DoubleCheck judgements or endorsements whichever comes higher.

For example,

  • if a Wikipedian is a Wikipedia admin, but hasn't conducted any reviews on WikiLoopDoubleCheck, their ladder is level 5, the top level.
  • if a Wikipedian is an auto-confirmed user, but has conducted 10K or more reviews on WikiLoop DoubleCheck, their ladder is level 4
  • if a Wikipedian is an extended-confirmed user, but has received two L4 reviewer's public endorsement, their ladder is also level 4.

The main rationale is that

  • For below average credible reviewers, while we allow them to contribute to reviewing, we basically don't allow them to have any additional power to edit Wikipedia content or discussion pages.
  • For average credible reviewers, the tool hopes to make it easier to discover revisions worth reviewing, e.g. based on their topic interest or probability of vandalism.
  • For more credible reviewers, such as rollback permission holders or admins, we want to allow them to have at least the same power with other alternative tools, e.g. STiki, Huggle, and even more productivity in doing their patrol and administrative work if they choose to.
Either reach Wikipedia User Level Or has # of DoubleCheck contributions To be imposed with Restrictions or allow using Powers
blocked IP or user 0 Level 0
anonymous user such as 5 or 10, to be discuss in Separate Discussion 1 Level 1
logged in user / auto-confirmed user such as 50 or 100, to be discuss in Separate Discussion 1 Level 2
extended confirm user such as 500 or 1K, to be discuss in Separate Discussion 1 Level 3
users in {rollback} group such as 5K or 10K, to be discuss in Separate Discussion 1, or publicly endorsed by two Level4 reviewers or one Level 5 reviewers Level 4
users in {admin, bureaucrats, global stewards} group such as 50K or 100K, to be discuss in Separate Discussion 1 Level 5

Here is a list of our current or planned restrictions / power features for different levels.

Levels #/ Restrictions or Powers Level 0 Level 1 Level 2 Level 3 Level 4 Level 5
Restrictions R1. Highlighted as contributions from blocked users
R2. Speed restricted
R3. Can not be endorsed
Powers P1. URL-to-Undo: display a button taking the reviewer to the Wikipedia page before they manually undo bad the revision
P2. Generate WikiText based on reviewer-selected reasons for warnings RfPP like Twinkle
P3. Publicly endorse other WLDC reviewers
P4. Revert or Undo multiple revisions from WLDC like Huggle and STiki, if also having rollback permission in given wiki
P5. Issue reviewer-selected talk page warnings or RfPP directly like Twinkle
P.6 Directly block users or protect page if also having admin permissions in given LANG of wiki
Comparison 1

STiki for vandal patrollers requires a 1,000 mainspace edit threshold or the rollback right for its use. (see: Wikipedia:STiki#Using STiki), supports only EN Wiki to start using

Comparison 2

Huggle requires rollback permission or to be on a global white-list to start using

Request for comments

[edit]

There are a few specific questions I want to seek community feedback on

  • (1) Does the criteria to define each level make sense?
  • (2) Should we add or remove restrictions or powers for these levels?
  • (3) Do you have any concern with providing wiki admins features like BLOCK or PROTECT directly in the DoubleCheck interface? (Only those with adminship on a given wiki will be able to use them.)

Please leave your comments below. Feel free to use the templates {{support}}, {{oppose}}, and {{doubtful}}. Xinbenlv (talk) 02:27, 15 August 2020 (UTC)[reply]

@Can I Log In:, thanks for the comment, yes I feel some of our reviewers will think those bars are too high, and some will think otherwise. Yesterday most of the voices are "just fine" or "too high" of those bars. I am happy that you now offer an opposite voice, as the bars being "too low". Yes, the proposal tries to resolve concerns of bars being too low by setting a high number of contributions required, and also imposing speed restriction. That way, one will not be able to auto-click and get higher level by 2 hours. The rationale is that, if someone is willing to spend time for hacking it, the current Wikipedia autoconfirmed-level model is extremely easy to exploit, e.g. this case, open an account, wait for some arbitrary days (>4), then suddenly edit and revert their own user page, see this case[1]. Our hope is to (1) allow at least some people be able to gain the status without someone from permissioned group / prestige to approve, but also (2) exploit-vulnerability at least better than Wikipedia and existing widely used tools. E.g. Twinkle. —The preceding unsigned comment was added by Xinbenlv (talk) 21:57, August 22, 2020
  • Oppose Oppose tl/dr: (1) no (2) no (3) nope
I'm leading a team of developers for RedWarn, another counter vandalism tool. Following a number of bad actors, we've been planning regarding restrictions for the next version of RedWarn at redwarn.wmcloud.org. Our relevent drafted decisions here are:
  • We cannot assume that this is an editor's first and only counter-vandalism tool. Many of Wikipedia's docs encourage the use of Twinkle, so many users will be coming from which. If we use our tool as the sole way of noting a user's counter-vandalism experience, it's unlikely people will pick it up, growing tired of a restricted mode and being unable to do things. A simple way to work out is to calculate the number of the user's reverts, which can be done by checking their contributions for edits made with Twinkle, the RedWarn user script, and other tools such as undo and rollback.
  • Creating a complex restrictions system will just make everything harder for us to work with and confusing for users. Either you are in a restricted mode, or you're free to go as you like, given your account has the permissions to use features on a Wiki (such as blocking).
My notes here are also that:
  • 10,000 and 100,000 (note: don't know why this 100k unlock is here for blocks, which non-admins can't even do) actions is an INSANELY high bar. This will really end up restricting actually constructive patrollers.
  • The fact P4 and P5 is only Level 4+ is a huge drawback, editors might as well just use Huggle at that point as WLDC doesn't actually give them any more incentive to use it at this point in time. In my opinion, this should be given as a Level 3 feature, considering the same can be done with Twinkle and even semi-automatically with RedWarn, both popular tools on the English Wikipedia.
Best wishes, Ed6767 (talk) 03:27, 22 August 2020 (UTC)[reply]

@Ed6767: Thank you, I will take that feedback and digest them, Thank you for your detailed comment. Really appreciate that! I like to address taht the reason those are high bars of edits means "even you don't have endorsement from others, you can gain trust by just reviewing". We also provide a allowlist-kind-of-feature: that is any two endorsements from L4(rollbackers) or one endorsements from L5(admins) will give the user a L3 power. This is equivalent to the Huggle/STiki's whitelist models except that to get whitelisted for Huggle/STiki one will need approval by its developer or a given group, but the WikiLoop DoubleCheck's model recognizes endorsements from already trusted people to establish such allowlist, and hence more decentralized and less controlled by WikiLoop DoubleCheck Xinbenlv (talk) 05:39, 22 August 2020 (UTC)[reply]


  • Support Support Tipeditor Looks good to me!
  • Oppose Oppose per Ed6767. The requirements are too strict and discourage constructive contributions. Maybe 500 and 5,000 edits would be better? (note: on enwiki extended confirmed is 500 edits, that's the most permission you can get automatically based on edit count.) Buidhe (talk) 04:07, 22 August 2020 (UTC)[reply]
  • Oppose Oppose I think the idea of 500 and maybe 2,500 makes a more attainable incentive than 1K and 10K. - AppleBsTime (talk) 04:13, 22 August 2020 (UTC)[reply]
  • Support Support but doubtful of some "bars". The Level 4 and Level 5 requirement to do things as simple as leaving a message on the talk page (which can be done manually by anybody, and also through Twinkle) seems excessively high. MrConorAE (talk) 04:41, 22 August 2020 (UTC)[reply]
@MrConorAE:, @Ed6767:, @Buidhe:, thank you for your early feedback! I previous thought the community are generally worry people will abuse review tools because they are tool powerful, and thus STiki and Huggle and many others requires Rollback permissions to even begin using. I am happily surprised instead you like to allow more people to use them, that's exactly what I like! Hope do you feel we separate the discussion regarding to "which number of revisions is needed to conduct certain things" from the the rest of the proposal, we could make the general proposal more generic in wording, while open a separate section to discussion number bars, how do you like it? Xinbenlv (talk) 05:24, 22 August 2020 (UTC)[reply]
  • Weak oppose I feel there is definitely a need for a system to prevent abuse of the tool, but as Ed6767 brought up, the system is overly complex and restricts and deters many users from using the tool. — Yours, Berrely • TalkContribs 08:45, 22 August 2020 (UTC)[reply]
  • Weak oppose With WikiLoop DoubleCheck,Users like me who has little time to edit whole article can contribute by cloudsourcing their time.I worry restrictions for editors with less edit(for example,I did only 300 edits for 11 years)limits users with little time.I hope I am a rare example.--Paperworkorange (talk) 10:32, 22 August 2020 (UTC)[reply]
  • Oppose Oppose I primarily use Twinkle for counter vandalism work, while having AWB access for programmatic edits on enwiki. Auto-confirmed users (on enwiki, and L2 as defined here) can use twinkle to do the tasks that are defined for L3 and L4. Rather than to restrict access in such manner, I feel that the access to the tool should remain as open as possible (at least similarly to twinkle), and that any abusive use of the tool in any particular lang wiki to be dealt with as with how existing abusive users have been dealt with so far in there. e.g. would-be users of such tools are slapped with a responsibility notice when visiting the documentation page of the tool that warn users of being potentially blocked for abusive use, and abusive users are being blocked through enwiki's existing administrative system. If individual wiki's community have concerns over new reviewers' competence and trustworthiness, why not offer the ability to have a whitelist and/or blacklist (i.e. like AWB's checkpage)? Robertsky (talk) 11:04, 22 August 2020 (UTC)[reply]
  • Support Support per MrConnerAE. P,TO 19104 (talk) 14:52, 22 August 2020 (UTC)[reply]
  • Support Support It looks fine to me. Maybe it will be convenient to reduce the "bars", as proposed. Alexcalamaro (talk) 17:24, 22 August 2020 (UTC)[reply]
    • Doubtful It is doubtful One question : to perfom P4 it states that "if also having rollback permission" , so you'll be already in level 4. So maybe you will have Level 4 by # of contributions but you wouldn't been able to perform P4. Same for P6 and admin, it states "if also having admin permissions". So Level 5 will be not enough to perform P6. I'm right ?. Alexcalamaro (talk) 18:07, 22 August 2020 (UTC)[reply]
  • Oppose Oppose Restricting autoconfirmed users from being endorsed seems completely arbitrary. Levels 2 and 3 should be collapsed. --Mathnerd314159 (talk) 17:52, 22 August 2020 (UTC)[reply]
  • Much simpler permission structure, from scratch:
    • Blocked users: Highlighting contributions from blocked users is helpful, and blocked users should be blocked from the tool. They are unable to revert or post talk-comments anyway, and they would be likely to make poor quality or even malicious reviews.
      • Recently a partial-block system was introduced. It would probably be difficult to try to deal with this inside the tool, and I expect it would be extremely rare for it to be relevant in the tool. You can probably ignore partial blocks, relying on the wiki to report back any attempt to make a blocked undo or blocked talk edit.
    • Admin and rollback: These should be based on the wiki userrights, period. The tool may display the relevant buttons if the user has the userright.
    • Autoconfirmed: Full use of the tool, including links for undo and talk. Undo and talk are not considered restricted actions (unless a user is actively blocked), and I think the concern is people trying to do these things without having any knowledge or familiarity with the native wiki system. Twinkle is usable by autoconfirmed users. (Edit: The rate limit thing is probably a good idea.)
    • Regarding IP users or not-yet-autoconfirmed users: While they are able to identify vandalism, users who don't know policies&guidelines even exist can't really review whether an edit is appropriate. And as noted above they probably shouldn't be using undo and talk through the tool without familiarity with the native system. I think it's simpler and makes more sense to just activate the entire tool at autoconfirmed. Alsee (talk) 03:16, 24 August 2020 (UTC)[reply]
  • Weak oppose Per Ed6767 above, plus some additional comments. What I'd like to see is some global permissioning structure encompassing all anti-vandalism tools. It doesn't make sense for the requirements to be extremely different for each tool. I don't know what it would take to achieve the following but I think it would be great to have a "basic anti-vandal" user right (maybe rollback already satisfies this?) which grants you basic access to all anti-vandalism tools, and then if absolutely necessary, one separate user right per individual tool that gets granted if you somehow demonstrate you've completed a quick training (could be as simple as just watching a 5 or 10 minute tutorial video for the specific tool) or get grandfathered in if you've already used the tool a certain number of times. Paradoxsociety (talk) 17:07, 24 August 2020 (UTC)[reply]
  • Comment Comment I received a request for comments on this proposal but I have never heard of DoubleCheck and have no idea (nor can I find) how to use it. I'd appreciate if someone would leave a message on w:UserTalk:Deisenbe with instructions. Thank you. Deisenbe (talk) 10:46, 30 August 2020 (UTC)[reply]

Separate Discussion 1: Number of Contributions needed for each Level

[edit]

If we allow reviewers to gain power without needing to be on a given allowlist, or without endorsements from anyone, what numbers are appropriate to be set as bars? Xinbenlv (talk) 05:43, 22 August 2020 (UTC)[reply]

Responded as Special:Diff/20388160 Xinbenlv (talk) 22:00, 22 August 2020 (UTC)[reply]
What is even more absurd, even though there is no difference in the functionality, is Level 5. Non-administrator in enwiki, reach X DoubleCheck contributions, and you have P6, but you don't have the user rights to do so. What's the point? Except to endorse a Level 3 "twice" as if you were a Level 4, all I can say is "bruh". Can I Log In (talk) 22:43, 22 August 2020 (UTC)[reply]

Questions

[edit]
  • What is a DoubleCheck judgement?
Judgement refers to the opinion a reviewer given to a Wikipedia edit aka revision, it currently have 3 possible value
1. ShouldRevert(means Damaging)
2. NotSure (means neutral or whether damaging or not is Not immediately obvious), and
3. LooksGood (means Not damaging, or in some tools being called Innocent)
  • What is a DoubleCheck endorsement?
1. If a Wikipedia editor A, thinks Wikipedia editor B, is a trusted editor, A could give B an endorsement on WikiLoop DoubleCheck. When A is highly trusted editor, B gets bump up le their perceived trusted level in WikiLoop DoubleCheck as well. This feature has not been implemented yet.
  • What is a DoubleCheck review?
The process of reading a Wikipedian edit and gives a judgement.
  • What is a DoubleCheck contribution?
The DoubleCheck contributions currently only counts how many judgements are given by a reviewer, i.e. how many reviews are done by this reviewer. However, in the future, we might consider counting other form of actions as contributions too, such as "taking follow up actions": issuing warning, reverting revisions, rolling back multiple revisions), or "reviewing other people's judgments". etc. So we use a more general term of contribution. Currently, think of it as counting judgement.

Mark D Worthen PsyD (talk) 18:07, 22 August 2020 (UTC)[reply]

Thank you @Markworthen: for asking these clarifying question. It helps me realize these terms are used without explanation. Xinbenlv (talk) 18:26, 22 August 2020 (UTC)[reply]

Suggestions

[edit]

First, I know only one language fluently, therefore please take my suggestions in that light, i.e., I respect others who know two or more languages. Suggestion #1: Ask a colleague who knows your native tongue and English well to proofread the English version for proper grammar, usage, syntax, etc. Only minor improvements are needed, e.g., "P1. URL-to-Undo: display a button taking the reviewer to the Wikipedia page before they manually undo bad the revision" (I am not sure what that sentence means), and "such as 5 or 10, to be discuss in Separate Discussion 1" ("discuss" should be "discussed").

Thank you, I will take your suggestion in the future by asking my native-speaking friends to proofread for me. Also some of the issue causing the challenge to understand my writing might be due to new terms and concepts I am trying to introduce, and I will try to explain them better. Xinbenlv (talk) 05:11, 24 August 2020 (UTC)[reply]

Suggestion #2: Emphasize the either/or nature of the first two columns in the first table, e.g., Either reach Wikipedia User Level Or has # of DoubleCheck contributions. Mark D Worthen PsyD (talk) 18:07, 22 August 2020 (UTC)[reply]

Good idea, will do! Xinbenlv (talk) 05:11, 24 August 2020 (UTC)[reply]

It's important for the tool to prominently display whether there are unread Notifications or unread Talk messages for the user. If the user's edits are getting reverted, or if someone is leaving messages on their user talk, it's important for the user to be aware so they can defuse any potential conflict. If the user is unaware and they keep making more edits the situation may escalate badly, possibly resulting in anger between editors or even resulting in a block. Alsee (talk) 03:47, 24 August 2020 (UTC)[reply]

Thank you, that's very good idea, filed as a feature request and track there issue #350. We will plan on adding these features. Please stay tuned!Xinbenlv (talk) 05:11, 24 August 2020 (UTC)[reply]

Request for Comments (Second version after your input)

[edit]

Hi all, thank you for your kind discussion in the first RfC version. We hear many of you ask to simplify the user level structure and to avoid abuse while giving all users the convenience for discovering problematic edits.

We discarded our original complex model and create a new simplified model as followed:

  1. Basic Power: all users will have DoubleCheck's convenience features to discover vandalism and share their opinions, but will not have FAST revert ability.
  2. Super Power: Admins and roll-backers will have FAST revert ability, similar to Huggle and STiki.
  3. Endorsements can be granted by Admin, Roll-backers and other endorsed users to get super power. Endorsement can be revoked should there be abuse of power.

Note: FAST revert ability means reverting edits without a strict rate limit (same as Huggle).

Feedback Wanted

[edit]

We look forward to your feedback or questions, please comment below, thank you!

1. Does the simplified power tier model above look good to you?

  • I don't understand why a person just wouldn't get rollback. That's a defined process with onwiki transparency and can also be revoked by anyone, rather than having to resort to say a block if there's a problem. Best, Barkeep49 (talk) 20:22, 19 November 2020 (UTC)[reply]

+1 to Barkeep's point, Sadads (talk) 01:09, 20 November 2020 (UTC)

Thank you everyone for voicing your support and also some of your concerns.
For everyone who has concern with Option 3. Here is my thinking: I think the reason we want to provide option 3 because we want to make this process as easy as possible. And super power here ONLY means revert faster than a slow limit. As you can see below, majority people think a 1 revert per 5min be too slow. But I will also think if use something like 1min per 1min, I also anticipate many to think it to be too fast. Therefore, to strike a balance, we say that while we want everyone to be able to revert fast, we need to assess the trust worthiness of a user in some simple way.
Some of you asked, why don't the reviewer just get a ROLLBACK permission? Let me express my concern on requiring getting ROLLBACK permission: ROLLBACK power is actually a very very high bar, and once got it, it also gives user a lot of power globally. For example, en:Wikipedia:Rollback#Requesting_rollback_rights suggests "it's very rare for someone to get ROLLBACK permission with less than 200 main space edits. Also one have to demonstrate being able to appropriately warn user with bad edits." Therefore, ROLLBACK really isn't for everyone.
Now, if you were an admin, do you want to give someone ROLLBACK permission, just because they have demonstrated some sense of responsibility on reviewing a few dozen or hundreds of review? I assume no, you will follow the requirement on en:WP:ROLLBACK and require the higher bar to be demonstrated.
In practice, not many people have ROLLBACK permission. I counted the EN WIKI, in the past 12 months only around 36 users are granted ROLLBACKers. Requiring ROLLBACK permission has caused th tool Huggle to maintain a user whitelist who didn't get ROLLBACK but be approved by developers/maintainers of Huggle. STiki is the same. Essentially that approach by passed the ROLLBACK requirement but just make the tool developers a smaller group of people on who can approve who is on the whitelist or not. Well, why don't Huggle and STiki avoid using such a per-app whitelist? It's because the ROLLBACK is too high a bar and too few people currently have it, there are just too many people able to vandalise Wikipedia than people who can review it.
And how do other popular tools solve it? The en:WP:Twinkle and en:WP:RedWarn, end up creating something bypassing entire ROLLBACK for people who doesn't have ROLLBACK permission.
However, I think it's not very hard for a human to spot many common problematic edits just with common sense. This tool, the WikiLoop DoubleCheck, is trying to make it easy for everyone to use. It is based on an assumption just like wikipedia: everyone can review, and should be by default able to review, unless proven abuse. It also want to further allow cross-check, meaning for each revision, multiple opinions can be given by different people. Requiring getting ROLLBACK permission to revert fast, will either limit the amount of reviewers who are capable to user it, or push the reviewers to argue for a faster revert rate-limit, which also applies to anonymous users which might be abused by blocked or malicious users.
Also, giving user a per-app privilege is a good way to give them a little incentive to contribute, also a longer runway to demonstrate they are indeed responsible and competency to conduct review and warnings appropriately, because we grant them the privilege that works across entire EN Wikipedia to many tools.
In summary, that's why we propose the option 3. To make it easier to understand the trade off we are making here, there are 3 alternatives of Option 3, :
3-A. To have a whitelist for users to have non-thottled rate limit, but only developer of WikiLoop DoubleCheck, such as me, just like Huggle and STiki (I don't personally want that dictatorship though)
3-B. Remove such whitelist, making it available to only very small group (last time I check it was about O(10K)/a few tens thousands ROLLBACKers with only O(100)/ a few hundred active ROLLBACkers every month, to face about 30K vandalisms / damaging edits per month.
3-C. Use "Extended Confirm/Confirm": but it can be easily cheated to get by edit and self-revert, and also has nothing to with review competency.
3-D Create some on-wiki lower level trust - this is not likely to happen without database change or MediaWiki software change.
3-O.(the original option 3), to keep such whitelists, but allow Admins and ROLLBACKers to grand a lower level of trust before the full ROLLBACK is granted. Xinbenlv (talk) 23:22, 6 December 2020 (UTC)[reply]
With these explanation, I hope I made my concern clearer. What do you think? Please let me know among all the Option 3 alternatives/original which one do you prefer?
Thank you! Xinbenlv (talk) 23:22, 6 December 2020 (UTC)[reply]
If tool developers have plans to give access to non-rollbacker also, then either 3-A or if tool dev don't want to handle such list then 3-C would be great options. But putting this burden on admins/rollbackers (3-O) seems to be illogical as if admin is actually reviewing someone and feel that they user can review recent changes then they can easily give rollback and in case of rollbackers I don't think should have power to give other user a rollback like access.
About the rate limit, I think it would be great if tool use rate limit which are applicable on Wikipedia site. ‐‐1997kB (talk) 14:41, 10 December 2020 (UTC)[reply]

2. For none-FAST revert ability, will 1 revert per 5 minute be too fast or too slow?

Thank you all, please see above my argument about Option 3, and vote for your preference. Thank you! Xinbenlv (talk) 23:22, 6 December 2020 (UTC)[reply]