Jump to content

User:RAdimer-WMF/Proxy blocks

From Meta, a Wikimedia project coordination wiki

This page is the collation of three Diff posts published in December 2022.

My goal with this series of blog posts was to develop a relatively concise and accessible guide to the proxy blocks problem. Any feedback is certainly welcome.

What’s happening with proxy blocks?

[edit]

This post is the first part of a three-part series on proxy blocks, covering an introduction and history.

Open proxies – shared IP addresses accessible anywhere through free or paid services – have been blocked on Wikimedia projects for over 16 years.

Since its inception the policy has remained unchanged, with growing costs to accessibility and equity that insofar has not seen widespread discussion.

“No open proxies”

[edit]

In early 2004, the question of anonymizing proxies was discussed on the English Wikipedia’s mailing list, with few objections to prohibiting their usage.

Arguments in favor noted that proxies were often used by vandals to evade blocks placed on their previous IP addresses and accounts. (On Wikimedia projects, accounts and IP addresses (or ranges of IP addresses) can be blocked to prevent editing from them; this is the primary anti-abuse tool that administrators have.)

Arguments against the proposal discussed the necessity of users subject to censorship, specifically China’s “Great Firewall”, to use proxies to safely access the internet.

Consensus developed, among the dozen or so participants on the mailing list thread, to block editing from open proxies. This was enacted on the English Wikipedia, and in 2006 was created as a global policy, enshrined on Meta-Wiki’s “No open proxies” page.

In the 18 years since that first discussion, the dynamics of open proxies and the problems associated with blocking them have changed significantly.

What constitutes a proxy?

[edit]

For the purpose of Wikimedia blocks, a proxy is generally anything that allows a user to connect to Wikimedia projects through an IP address other than that assigned to them by their Internet Service Provider (ISP). Instead of a usual residential or cellular IP address, which has some degree of consistency and is often based on location, Wikimedia servers see the IP address of the proxy service.

Today, various types of proxy connections, and connections often affected by proxy blocks, exist:

  • Traditional Virtual Private Networks (VPNs)
    • Routes traffic through the VPN’s server
  • Private relays, such as iCloud Private Relay or in-browser VPNs (Edge, Opera)
    • These have similar functionality to traditional VPNs, but are far more accessible and easier for customers to enable, especially when some services are leaning towards making this the default
  • Peer to peer or residential proxies (p2p)
    • Users route traffic through other users’ residential internet connections
  • IP switching tools
    • Changes the user’s IP address regularly
  • Carrier-grade Network Address Translation (CGNAT)*
    • Common in areas with older internet infrastructure, where many users share the same IPv4 address
  • T-Mobile style IP assigning*
    • When a user resets their connection, they are assigned an IP address on an entirely different IP range

*The latter two are not technically proxies, but are often caught up in proxy-type blocks.

T-Mobile IPs present a problem very similar to proxies, where any customer can switch their cellular connection off and back on again, and be assigned a new IP on a different range. This allows users to easily evade a block placed on a previous IP address. All T-Mobile ranges are blocked for this reason.

CGNAT IPs suffer from an opposite issue; instead of one person having access to hundreds of IP ranges, hundreds of people have access to only one IP address. Because there are so many people on very few IP addresses, if one person vandalizes or connects using a peer-to-peer proxy, this can result in the IPs being blocked, affecting everyone who was connecting through that IP.

These are reflected here:

These types of connections vary in ease of identification, from traditional VPNs almost always being evident in IP lookups, to peer-to-peer proxies being difficult to distinguish from normal residential connections.

Why are proxies blocked?

[edit]

The predominant reasons for blocking proxies have not changed since the initial discussion in 2004: their anonymizing effect prevents IP-based anti-abuse systems from working.

When someone connects to a proxy, they funnel their internet traffic through the servers of that proxy service. This means that Wikimedia servers see the IP address of that proxy service rather than the IP address assigned to them by their ISP.

Most proxy services provide dozens or hundreds of available connections, often in different countries (to evade geo-blocked content), to their customers. This is opposed to most residential ISPs, who assign singular IPs or relatively narrow ranges to their customers.

For example, my ISP assigns me a single IPv6 /64 subnet, which has been static for roughly 4 months. Searching it through Bullseye, a community-developed IP lookup tool, returns information about roughly my location, my ISP, and range. Were I to turn on a VPN, I could connect to servers in any region or country; and if one IP is blocked, I can switch to another. This makes static identification, and thus effective anti-abuse blocks, next to impossible for people using proxies.

Bad actors can use and rotate between open proxies to prevent identification of sockpuppets, evade IP address and range blocks, avoid scrutiny of contentious edits, or harass good faith editors. Where some websites may rely on cookies and other methods of tracking, MediaWiki anti-abuse tools nearly exclusively rely on IP addresses. Blocking open proxies is a necessity for functioning anti-abuse mechanisms on Wikimedia projects.

Though standard VPNs/proxies can be used to evade anti-abuse mechanisms, it is easy to identify when someone is using one, and to prevent editing from those IP ranges. Peer-to-peer proxies, however, evade this identification by funneling a user’s traffic through another user’s connection. This means that someone using a proxy in one country can appear to be connecting from a residential IP in another country. The next post in this series discusses the effect of peer-to-peer proxies, and responses to them.

Proxy blocks: Automation and scope

[edit]

This post is the second part of a three-part series on proxy blocks, covering peer-to-peer proxies and responses.

Open proxies – shared IP addresses accessible anywhere through free or paid services – have been blocked on Wikimedia projects for over 16 years, due to their usage in bypassing IP-based anti-abuse mechanisms.

Throughout that time, however, the landscape of open proxies and the types of proxies available have changed. Where traditional VPNs are easily identifiable as such, the rise in peer-to-peer proxy networks have allowed bad faith actors to avoid identification more easily by posing as residential users, which in turn significantly increases the collateral costs of blocks.

Peer-to-peer proxies

[edit]

Peer-to-peer proxies expand the scope of proxy blocks significantly. Where VPN IPs are generally exclusively used by people who intentionally connected to that proxy service, peer-to-peer connections can turn any IP into a proxy.

Various services exist to facilitate peer-to-peer proxies, creating a network where internet traffic of one user is funneled through the internet connection of another. Bad actors can use these peer-to-peer proxy networks to access residential IP addresses in any part of the world.

Unfortunately, the nature of peer-to-peer proxies means that collateral effects, i.e. users affected by proxy blocks who are not using proxies, are significantly more prevalent. This is because blocks on these IPs or ranges affect everyone connecting on those IPs, not just the computer being used as a proxy. There are also instances where peer-to-peer proxies may be running on a device without the knowledge of the device’s user.

This issue of collateral effects is particularly problematic with carrier-grade NAT (CGNAT) configurations, where many users connect through a singular IP address. If any one of those users is using a peer-to-peer proxy, and the IP is blocked, no one using that IP can edit.

Increases in blocks

[edit]

Before 2020, blocks were generally placed on proxy IP ranges individually, when found by administrators and CheckUsers. As peer-to-peer proxies became more available, the English Wikipedia saw significant abuse from these proxies, leading to the automation of their identification and blocking.

ST47ProxyBot, run by an English Wikipedia administrator/checkuser and active only on that project, was approved in April 2020, and started running shortly after. Its mandate is to automatically identify and block open proxies, which did not significantly disrupt the trend. In August 2021, its scope expanded to include peer-to-peer proxies, spiking the English Wikipedia’s monthly proxy blocks from 78,000 in July 2021, to over 440,000 in August. This change was described in more depth on the administrators’ noticeboard.

English Wikipedia proxy blocks placed, monthly

Given the highly dynamic nature of peer-to-peer proxies, it starts with a short duration and extends the block as needed; many of the above numbers are re-blocks of the same individual IP addresses. They are also generally hard-blocks, meaning that they apply both to logged-out and logged-in users.

Around the same time, Stewards set up a script to mirror many of ST47ProxyBot’s English Wikipedia proxy blocks as global blocks, which apply on all Wikimedia projects. This resulted in spikes of similar magnitude, increasing from 33,000 blocks in July 2021 to 415,000 in August.

Global proxy blocks placed, monthly

Despite the English Wikipedia’s proxy-blocking bot having made over 5.7 million blocks since its inception in March 2020, only roughly 165,000 are currently active. (which is 97% of all active English Wikipedia proxy blocks, many/most of which are normal VPNs)

And despite over 2.8 million global proxy blocks being made in the last 2 years, at the time of writing there are only roughly 29,000 currently active. Stewards ceased mirroring of ST47ProxyBot’s blocks in March 2022, after concerns about the collateral effects were raised on Meta-Wiki.

It is important to note that many of these peer-to-peer proxy blocks are of CGNAT IPs, where singular IP addresses may be used by hundreds of users. Just one person using that IP as a peer-to-peer proxy can result in that IP being blocked. And as these blocks are hard-blocks, it prevents both unregistered users and users with accounts from editing. The next post in this series discusses the collateral effects from these blocks, and its implications on equity and growth.

IP block exemption process

[edit]

For length reasons, this section does not appear in the published Diff post.

People affected by hard-blocks, or people who need to edit using a VPN, must request an IP block exemption (IPBE) in order to edit. The process to request IPBE differs by project(s), and whether the user currently has an account.

At a basic level, people who intend to edit multiple projects either through a VPN or on an IP/range that is hard-blocked globally will need to request global IP block exemption, the process for which is outlined interactively in this Stewards Wizard. Multiple venues exist: a Meta-Wiki requests page, a UTRS form, and the Stewards email queue.

Note that global IPBE exempts users only from global blocks, whereas local IPBE exempts users from both global and local blocks on the project where they have that exemption. Because some projects make their own local proxy blocks or mirror those of other projects, multiple IP block exemptions (global and local) often need to be requested. The process to do so varies by project.

Additionally, the process changes depending on if the user has an account. If not, and they are prevented from creating one by the IP block, they can request one through the English Wikipedia’s request an account process, or to Stewards by email. Other projects have other processes for this as well.

There have been efforts to provide a single point of contact for users facing these issues, such as the Stewards Wizard linked above, but the system remains confusing and difficult to navigate for new contributors.

Proxy blocks: Equity and growth

[edit]

This post is the third part of a three-part series on proxy blocks, covering equity and growth effects.

Open proxies – shared IP addresses accessible anywhere through free or paid services – have been blocked on Wikimedia projects for over 16 years, due to their usage in bypassing anti-abuse mechanisms.

Throughout this time, the increasing prevalence of peer-to-peer proxy networks – where users funnel traffic through other users’ residential internet connections – have created problems in differentiating proxy users from non-proxy users. Blocks of these peer-to-peer IP addresses collaterally affect innocent, residential users connecting through these IP addresses.

Rising concerns

[edit]

In April 2022, the Meta-Wiki page “Talk:No open proxies/Unfair blocking” was created, outlining concerns with proxy blocks and starting a discussion on the equity effects of blocking peer-to-peer proxies.

Many contributors to the discussion are from Africa, where CGNAT internet configurations, and thus peer-to-peer proxy blocks, are very common. This especially affects Ghana and Nigeria, where a significant majority of ISPs use CGNAT, and few IP addresses are left unblocked.

Contributors to the page described their experience: the pervasiveness of these blocks affecting members of their communities, confusing processes to request IP block exemption, difficulties running edit-a-thons and recruiting new editors, and overall growing concerns with the additional barriers this has imposed on editing the encyclopedia “anyone can edit”.

Minimal data is available either on abuse from peer-to-peer connections, or the collateral effects of peer-to-peer and specifically CGNAT blocks, and it’s thus very difficult to quantify the net result of these actions. Additionally, policy makes no distinction between peer-to-peer proxies and open proxies; in 2004 this was not even a consideration, as proxies were relatively easily identifiable, whereas today entire countries’ residential IPs are blocked from a probably very small minority of its population using peer-to-peer proxy services.

In the age where any IP address can be a proxy, and more and more people are collaterally prevented from editing by these blocks…when will the costs become too great, if they haven’t already?

Editor recruitment and technical constraints

[edit]

The limitations to editing, from the outset, likely dissuade many potential editors from making their first contribution. For those whose interest remains, they are then directed, in the case of a global block, to the Wikimedia Stewards.

The email queue managed by these volunteers has a significant backlog, and the time until response is uncertain; there are only 38 stewards. When the blocked user gets a response, they’ll be asked for their IP address, and which projects they intend to edit. From there, a steward can determine what next steps are required, which ranges from the steward simply making an account for them and granting global IP block exemption, to the user needing to make three separate requests themselves: an account through a local process, local IP block exemption, and finally global IPBE.

The process is complicated and largely opaque to new users, not to mention generally confusing for people accustomed to the more user-friendly parts of the internet.

This experience for new editors is multiplied in its negative effects during edit-a-thons, where potential new contributors, who made an effort to attend an in-person event, can have their first introduction to the Wikimedia community interrupted by these blocks. As it’s unlikely to have a Steward or local administrator on call, account creations and IP block exemptions can take days to be set up. Without significant preparation and coordination, these edit-a-thons can’t happen.

Independent of whether the process eventually resolves itself in the end, which it doesn’t always, every additional hoop to jump through makes new users’ first editing experience more difficult and frustrating than it should be.

From the perspective of experienced editors, they too experience difficulties. New proxy blocks can cause confusion, as the reasoning behind the block is not always clear from the block message, and few people are familiar with peer-to-peer proxies. Some users assume the block is directed at them personally. There’s also often difficulty in identifying the specific processes to request block exemptions, especially as the block summaries and Meta-Wiki documentation are available in relatively few languages. Even if they find the right processes, the users who they need to talk to may not know their language, and the response time can be significantly delayed.

The combined difficulties for editors affected by peer-to-peer proxy CGNAT blocks significantly reduce the effectiveness of editor recruitment and outreach, and causes frustration for experienced editors.

The wider proxy blocks problem

[edit]

Movement and Wikimedia Foundation values emphasize the importance of minimal barriers to contributing to the sum of all human knowledge.

And yet, for many editors these barriers have increased dramatically: whether it be editors in Ghana affected by peer-to-peer CGNAT blocks, editors in China who need to request a series of exemptions to contribute through a proxy, or editors in the United States who use T-Mobile, proxy-type blocks have a marked impact on the experience of contributors in editing Wikimedia projects.

Independent of what is decided with proxy blocks, the identification problem remains: proxies are trending towards becoming ubiquitous with basic internet privacy, CGNAT IP addresses are often blocked for reasons other than peer-to-peer usage (such as vandalism), and ISPs have been trending towards more dynamic IP assigning. The usefulness of IPs themselves as individual identifiers is in question.

In the near term, however, there is a lot of work to be done to create a more equitable, responsive, and user-friendly structure to address proxy blocks’ more immediate effects.