IP Editing: Privacy Enhancement and Abuse Mitigation/Research and tools/be-tarask

This page is a translated version of the page IP Editing: Privacy Enhancement and Abuse Mitigation/Research and tools and the translation is 9% complete.

Data on Portuguese Wikipedia disabling IP edits

Portuguese Wikipedia’s metrics following restriction

30 August 2021 Update

Hello. This is a brief update about Portuguese Wikipedia’s metrics since they started requiring registration to edit. We have a comprehensive report on the Impact report page. This report includes metrics captured through data as well as a survey that was conducted among active Portuguese Wikipedia contributors.

All in all, the report presents the change in a positive light. We have not seen any significant disruption over the time period these metrics have been captured. In light of this, we are now encouraged to run an experiment on two more projects to see if we observe similar impact. All projects are unique in their own ways and what holds true for Portuguese Wikipedia might not hold true for another project. We want to run a limited-time experiment on two projects where registration will be required in order to edit. We estimate that it will take approximately 8 months for us to collect enough data to see significant changes. After that time period, we will return to not requiring registration to edit while we analyse the data. Once the data is published, the community will be able to decide for themselves whether or not they want to continue to disallow unregistered editing on the project.

We are calling this the Login Required Experiment. You will find more detail as well as a timeline on that page. Please use that page and its talk page to discuss this further.

Portuguese Wikipedia IP editing restriction

Portuguese Wikipedia banned unregistered editors from making edits to the project last year. Over the last few months, our team has been collecting data about the repercussions of this move on the general health of the project. We have also talked to several community members about their experience. We are working on the final bits to compile all the data that presents an accurate picture of the state of the project. We hope to have an update on this in the near future.

Tools

Tool development

As you might already know, we are working on building some new tools, partly to soften the effect of introducing temporary accounts, but also just to build better anti-vandalism tools for everyone. It is not a secret that the state of moderation tools on our projects doesn’t give the communities the tools they deserve. There is a lot of scope for improvement. We want to build tools that make it easier for anti-vandalism fighters to work effectively. We also want to reduce the barrier to entry into these roles for non-technical contributors.

We have talked about ideas for these tools before and I will provide a brief update on these below. Note that progress on these tools has been slow in the last few months as our team is working on overhauling SecurePoll to meet the needs of the upcoming WMF Board elections.

IP Info feature

We are building a tool that will display important information about an IP address which is commonly sought in investigations. Typically patrollers, admins and checkusers rely on external websites to provide this information. We hope to make this process easier for them by integrating information from reliable IP-vendors within our websites. We recently built a prototype and conducted a round of user testing to validate our approach. We found that a majority of the editors in the interview set found the tool helpful and indicated they would like to use it in the future. There is an update on the project page that I would like to draw your attention to.

Key questions that we would like to have your feedback on the project talk page:

When investigating an IP what kinds of information do you look for? Which page are you likely on when looking for this information?
What kinds of IP information do you find most useful?
What kinds of IP information, when shared, do you think could put our anonymous editors at risk?

Editor matching feature

This project has also been referred to as "Nearby editors" and "Sockpuppet detection" in earlier conversations. We are trying to find a suitable name for it that is understandable even to people who don't understand the word sockpuppetry.

We are in the early stages of this project. Wikimedia Foundation Research has a project that could assist in detecting when two editors exhibit similar editing behaviors. This will help connect different unregistered editors when they edit under different auto-generated account usernames. We heard a lot of support for this project when we started talking about it a year ago. We also heard about the risks of developing such a feature. We are planning to build a prototype in the near term and share it with the community. There is a malnourished project page for this project. We hope to have an update for it soon. Your thoughts on this project are very welcome on the project talk page.

Like mentioned previously, our foremost goal is to provide better anti-vandalism tools for our communities which will provide a better moderation experience for our vandal fighters while also working towards making the IP address string less valuable for them. Another important reason to do this is that IP addresses are hard to understand and are really very useful only to tech-savvy users. This creates a barrier for new users without any technical background to enter into functionary roles as there is a higher learning curve for them to work with IP addresses. We hope to get to a place where we can have moderation tools that anyone can use without much prior knowledge.

The first thing we decided to focus on was to make the CheckUser tool more flexible, powerful and easy to use. It is an important tool that services the need to detect and block bad actors (especially long-term abusers) on a lot of our projects. The CheckUser tool was not very well maintained for many years and as a result it appeared quite dated and lacked necessary features.

We also anticipated an uptick in the number of users who opt-in to the role of becoming a CheckUser on our projects once temporary accounts are introduced. This reinforced the need for a better, easier CheckUser experience for our users. With that in mind, the Anti-Harassment Tools team spent the past year working on improving the CheckUser tool – making it much more efficient and user-friendly. This work has also taken into account a lot of outstanding feature requests by the community. We have continually consulted with CheckUsers and stewards over the course of this project and have tried our best to deliver on their expectations. The new feature is set to go live on all projects in October 2020.

The next feature that we are working on is IP info. We decided on this project after a round of consultation on six wikis which helped us narrow down the use cases for IP addresses on our projects. It became apparent early on that there are some critical pieces of information that IP addresses provide which need to be made available for patrollers to be able to do their roles effectively. The goal for IP Info, thus, is to quickly and easily surface significant information about an IP address. IP addresses provide important information such as location, organization, possibility of being a Tor/VPN node, rDNS, listed range, to mention a few examples. By being able to show this, quickly and easily without the need for external tools everyone can’t use, we hope to be able to make it easier for patrollers to do their job. The information provided is high-level enough that we can show it without endangering the anonymous user. At the same time, it is enough information for patrollers to be able to make quality judgements about an IP address.

After IP Info we will be focusing on a finding similar editors feature. We’ll be using a machine learning model, built in collaboration with CheckUsers and trained on historical CheckUser data to compare user behavior and flag when two or more users appear to be behaving very similarly. The model will take into account which pages users are active on, their writing styles, editing times etc. to make predictions about how similar two users are. We are doing our due diligence in making sure the model is as accurate as possible.

Once it’s ready, there is a lot of scope for what such a model can do. As a first step we will be launching it to help CheckUsers detect socks easily without having to perform a lot of manual labor. In the future, we can think about how we can expose this tool to more people and apply it to detect malicious sockpuppeting rings and disinformation campaigns.

You can read more and leave comments on our project page for tools.

Дасьледаваньні

IP masking impact report

IP-адрасы каштоўныя як паўнадзейны частковы ідэнтыфікатар, які зьвязаны зь ім удзельнік ня можа лёгка зьмяніць. У залежнасьці ад правайдэра і наладаў прылады інфармацыя пра IP-адрас не заўсёды правільная і дакладная, і для найлепшага эфэкту работы з інфармацыяй пра IP-адрас неабходна мець глыбокія тэхнічныя веды і спрыт, хоць цяпер для атрыманьня статусу адміністратараў не патрабуецца такіх ведаў. Гэтая тэхнічная інфармацыя выкарыстоўваецца пры магчымасьці для падтрымкі дадатковай інфармацыі (якая называецца «веданьнем паводзінаў»), і ўзятая з IP-адрасоў інфармацыя значна ўплывае на прадпрынятыя адміністратарамі сродкі.

З сацыяльнага пункту гледжаньня пытаньне пра тое, ці дазваляць рэдагаваньне незарэгістраваным удзельнікам, было прадметам працяглых абмеркаваньняў. Дагэтуль думка схілялася ў бок дазволу рэдагаваньня незарэгістраваным удзельнікам. Спрэчка пераважна канцэнтравалася на супярэчнасьці жаданьня пазьбегнуць вандалізму ды захаваньня магчымасьці псэўдаананімнага рэдагаваньня і зьніжэньня бар’еру рэдагаваньня. Існуе пэўная прадузятасьць супраць незарэгістраваных удзельнікаў з-за асацыяцыі іх з вандалізмам, што ўключана таксама ў альгарытмы такіх інструмэнтаў, як ORES. Акрамя гэтага, існуюць вялікія праблемы са спосабам сувязі зь незарэгістраванымі ўдзельнікамі, бо яны ня маюць магчымасьці атрымліваць паведамленьні, а таксама няма ўпэўненасьці, што пакінутае на старонцы абмеркаваньняў IP-адрасу паведамленьне прачытае той самы ўдзельнік.

Пры патэнцыйным задзейнічаньні хаваньня IP гэта значна паўплывае на працоўны працэс адміністратараў і ў кароткатэрміновай пэрспэктыве значна абцяжарыць спраўджвальнікаў. Калі IP-адрасы пачнуць хавацца, варта чакаць значнага зьніжэньня эфэктыўнасьці нашых адміністратараў у барацьбе з вандалізмам. Гэта можна нівэляваць прадастаўленьнем раўназначнай ці большай функцыянальнасьці, але на пераходны пэрыяд варта чакаць зьніжэньня працаздольнасьці адміністратараў. Каб прадаставіць для працы нашых адміністратараў падтрымку адпаведных інструмэнтаў, мы мусім паклапаціцца пра захаваньне ці наданьне альтэрнатываў наступным функцыям, якія цяпер забясьпечваюцца інфармацыяй з IP:

Block efficacy and collateral estimation
Some way of surfacing similarities or patterns among unregistered users, such as geographic similarity, certain institutions (e.g. if edits are coming from a high school or university)
The ability to target specific groups of unregistered users, such as vandals jumping IPs within a specific range
Location or institution-specific actions (not necessarily blocks); for example, the ability to determine if edits are made from an open proxy, or public location like a school or public library.

Залежна ад таго, як мы апрацоўваем часовыя ўліковыя запісы ці ідэнтыфікатары для незарэгістраваных удзельнікаў, будзем здольныя паляпшаць сувязь зь незарэгістраванымі ўдзельнікамі. Глыбінныя абмеркаваньні і перасьцярогі што да незарэгістраваных рэдагаваньняў, ананімнага вандалізму і прадузятасьці да незарэгістраваных удзельнікаў наўрад ці значна зьменяцца, калі мы схаваем IP, бо магчымасьць рэдагаваньня праектаў пасьля выхаду з уліковага запісу застанецца.

CheckUser workflow

We interviewed CheckUsers on multiple projects throughout our process for designing the new Special:Investigate tool. Based on interviews and walkthroughs of real-life cases, we broke down the general CheckUser workflow into five sections:

Triaging: assessing cases for feasibility and complexity.
Profiling: creating a pattern of behaviour which will identify the user behind multiple accounts.
Checking: examining IPs and useragents using the CheckUser tool.
Judgement: matching this technical information against the behavioural information established in the Profiling step, in order to make a final decision about what kind of administrative action to take.
Closing: reporting the outcome of the investigation on public and private platforms where necessary, and appropriately archiving information for future use.

We also worked with staff from Trust and Safety to get a sense for how the CheckUser tool factors into Wikimedia Foundation investigations and cases that are escalated to T&S.

The most common and obvious pain points all revolved around the CheckUser tool's unintuitive information presentation, and the need to open up every single link in a new tab. This caused massive confusion as tab proliferation quickly got out of hand. To make matters worse, the information that CheckUser surfaces is highly technical and not easy to understand at first glance, making the tabs difficult to track. All of our interviewees said that they resorted to separate software or physical pen and paper in order to keep track of information.

We also ran some basic analyses of English Wikipedia's Sockpuppet Investigations page to get some baseline metrics on how many cases they process, how many are rejected, and how many sockpuppets a given report contains.

Patroller use of IP addresses

Previous research on patrolling on our projects has generally focused on the workload or workflow of patrollers. Most recently, the Patrolling on Wikipedia study focuses on the workflows of patrollers and identifying potential threats to current anti-vandal practices. Older studies, such as the New Page Patrol survey and the Patroller work load study, focused on English Wikipedia. They also look solely at the workload of patrollers, and more specifically on how bot patrolling tools have affected patroller workloads.

Our study tried to recruit from five target wikis, which were

Японская Вікіпедыя
Нідэрландская Вікіпедыя
Нямецкая Вікіпедыя
Кітайская Вікіпедыя
Англійскі Вікіцытатнік

They were selected for known attitudes towards IP edits, percentage of monthly edits made by IPs, and any other unique or unusual circumstances faced by IP editors (namely, use of the Pending Changes feature and widespread use of proxies). Participants were recruited via open calls on Village Pumps or the local equivalent. Where possible, we also posted on Wiki Embassy pages. Unfortunately, while we had interpretation support for the interviews themselves, we did not extend translation support to the messages, which may have accounted for low response rates. All interviews were conducted via Zoom, with a note-taker in attendance.

Supporting the findings from previous studies, we did not find a systematic or unified use of IP information. Additionally, this information was only sought out after a certain threshold of suspicion. Most further investigation of suspicious user activity begins with publicly available on-wiki information, such as checking previous local edits, Global Contributions, or looking for previous bans.

Precision and accuracy were less important qualities for IP information: upon seeing that one chosen IP information site returned three different results for the geographical location of the same IP address, one of our interviewees mentioned that precision in location was not as important as consistency. That is to say, so long as an IP address was consistently exposed as being from one country, it mattered less if it was correct or precise. This fits with our understanding of how IP address information is used: as a semi-unique piece of information associated with a single device or person, that is relatively hard to spoof for the average person. The accuracy or precision of the information attached to the user is less important than the fact that it is attached and difficult to change.

Our findings highlight a few key design aspects for the IP info tool:

Provide at-a-glance conclusions over raw data
Cover key aspects of IP information:
- Geolocation (to a city or district level where possible)
- Registered organization
- Connection type (high-traffic, such as data center or mobile network versus low-traffic, such as residential broadband)
- Proxy status as binary yes or no

As an ethical point, it will be important to be able to explain how any conclusions are reached, and the inaccuracy or imprecisions inherent in pulling IP information. While this was not a major concern for the patrollers we talked to, if we are to create a tool that will be used to provide justifications for administrative action, we should be careful to make it clear what the limitations of our tools are.

––
Best regards,
Trust and Safety Product

Please use the project talk page for discussions on the matter. For any issues concerning this release, please don't hesitate to leave a message on the project talk page or contact Szymon Grabarczuk.