Jump to content

User:Sj/Design chats/UPE

From Meta, a Wikimedia project coordination wiki
Reminder: This is a wiki! Edit and refactor at will.
Problem
Improving tools for monitoring, tracking, and countering undislosed paid editing. (a chat with MarioGom and Jwindleberg.)
Current discussions
from Wikiproject:Antispam, archived in 2021

Ideas

[edit]

I commented on wikimedia-l that it seemed there's much more that could be done to support preventing + catching UPE. MarioGom pointed me here. Some of the things that could be done include:

  • Monitoring the market for UPE
    Work with groups that are in the market and completely transparent about their work to maintain a sense of rates and volume
    Search general contracting sites, general search engines, and specific reputation brokers for new options; maintain a catalog
    Spot-check and commission work. last week Jan Böhmermann spent under 500 Euros and identified two German UPE networks
  • Building better tools for tracking and countering UPE
    Tracking: automated scoring (seems like some of this exists but not everyone uses it?)
    Countering: tools to coordinate work, making it more satisfying + collaborative to tackle organized UPE networks. Especially for often-targeted categories (politicians + companies)
    Both: Focus on tools for detecting large farms/networks over time, and cleaning up the mess they leave. (how automated is this now?)

Thanks @MarioGom: for pointing me here. I am curious how much of this tooling and automation exists already, and how to make it ore visible to editors on individual projects (who might not know about cross-wiki antispam efforts).

And I recognize that we don't have a uniform definition for the heart of the challenge -- Spam, COI, UPE (constituting a kind of hidden COI), and sock- and meat-puppet networks: all are slightly different contexts + scopes. How do you think about those in relation to one another? –SJ talk  20:52, 8 September 2021 (UTC)

Hi SJ: thank you for bringing this up here. Here's some thoughts:
Monitoring the market for UPE
  • Work with groups that are in the market and completely transparent about their work to maintain a sense of rates and volume
    • Disclosed paid editors (DPE) are a tiny fraction of paid editing. Both in terms of number of accounts and edit volume. Some may be collaborative, like Beutler Ink (ping WWB Too), but I'm not sure that would be very useful for tackling UPE. If you refer to groups who are transparent about their work and rates off-wiki but don't disclose on-wiki, that probably only covers some Upwork freelancers, who are also just a small part of the market. Large actors rarely disclose rates, clients, and often not even the fact they offer Wikipedia services.
  • Search general contracting sites, general search engines, and specific reputation brokers for new options; maintain a catalog
    • We currently do some monitoring on English Wikipedia (see en:Wikipedia:List of paid editing companies), I think French Wikipedia does something similar (ping Jules*). Some editors also monitor sites like Upwork, and some results can be seen at en:WP:COIN. However, public disclosure can be in conflict with harassment policies (en:WP:OUTING), functionaries and some admins have access to more information on evidence discovered this way. A Volunteer Response Team queue exists at paid-en-wp@wikipedia.org, but only functionaries have access, and as far as I know, it is severely understaffed.
  • Spot-check and commission work. last week Jan Böhmermann spent under 500 Euros and identified two German UPE networks
    • Some people did this on French Wikipedia too, also uncovering some cross-wiki UPE operation in France (see en:Wikipedia:Wikipedia Signpost/2020-05-31/News and notes). There seems to be some disputes about how mystery shopping and our policies interact, especially on English Wikipedia. If we want more of this, we'd need a framework to ensure that mystery shoppers who are members of our community know the do's and dont's to avoid policy issues.
Building better tools for tracking and countering UPE
  • Unlike ORES, it is usually easier to detect UPE editors than isolated UPE edits. A (insufficient) number of tools exist, for example. MER-C's suspcious new articles report (see en:Wikipedia talk:WikiProject Spam#Suspicious new articles). We usually conceal operational details of such tools (en:WP:BEANS). Publicly disclosing all detection heuristics would be like publishing a manual on how to do UPE and get away with it. Anyway, I'll list here some categories of tooling I'm aware of:
    • General discovery: Tools that help finding UPE editors in general (not a specific sockfarm). MER-C's report fall into this category. I use some heuristics that yield around a 20%-35% verifiable true positive rate among the discovered accounts. I don't publish the results of this tool to avoid harm to innocent users, but a number of accounts I report come from these results after manual research.
    • Specific discovery: Tools to discover new accounts of known sockfarms. These include things as simple as watching recent changes on articles heavily targeted by specific sockfarms (e.g. en:User:MarioGom/TOPCOI/Bx), as well as more targeted heuristics to detect a known behavioral fingerprint. This works for some big UPE operations (e.g. Yoodaba SPI).
    • Attribution/Verification: Tools to check a suspicious account and attribute it to a known sockmaster. This is also quite feasible for some large operations (e.g. Yoodaba SPI, CharmenderDeol SPI, Jaktheladz SPI).
    • Investigation: Tools for behavioral sockpuppetry investigation, such as toolforge:spi-tools, toolforge:sigma, toolforge:xtools. Tooling is particularly lacking when it comes to research of cross-wiki UPE operations.
Other than that, the recent creation of Wikiproject:Antispam has been a good step in terms of cross-wiki coordination.
About cleaning up the mess, it's a quite lacking area. We have organized some ad hoc clean ups on enwiki (en:User:Blablubbs/Wolfram, en:User:MarioGom/LoboReview, en:User:MarioGom/KazakhReview), but there is still nothing formal or systematic.
Nosebagbear has a collaborative sandbox for UPE proposals (see en:User:Nosebagbear/UPE Proposals), which is worth the attention of anyone interested in moving some proposal forward. --MarioGom (talk) 10:45, 9 September 2021 (UTC)
@MarioGom: on fr-wp, we have information on paid editing companies, but disseminated. I took the initiative to create fr:Projet:Antipub/Agences de communication today (we still have to complete it); it will help us to centralize the information. We also intend to trap again some agencies in the future, in order to find socks they use, as we did in May 2020. Best, — Jules* Talk 16:56, 10 September 2021 (UTC)
Interesting, thanks to you both. @MarioGom: I think the number of larger orgs involved in en:WP:CREWE is enough that we could get a rough estimate of market rates and volume. Even one or two active PR agencies will have a better sense of it than most editors, and comparisons over time will be useful even if they see a biased subset. –SJ talk  16:29, 14 September 2021 (UTC)
Our anti-abuse tools are a pile of garbage and the API is actually worse by virtue of having random missing functionality. Once we have a working CAPTCHA, all anti-abuse tools in MediaWiki and extensions used on WMF sites have dedicated maintainers, the technical debt in the anti-abuse tools is gone, the API has been audited for missing functionality (phab:T192023, phab:T20104, phab:T188672, phab:T261752 are all material impediments) and that functionality provided, the spam blacklist is still public and infinitely scalable, PageTriage is available for all wikis and in a much improved way, ..., then we can discuss improvements. I want to focus on finding spam, not fighting the WMF's bad software and bad management decisions.
That said, if someone wants to put together a training set that would be great. I know what I am looking for, it's just a matter of weeding out the false positives. The better I can do this, the less I can rely on behavior. MER-C 17:42, 9 September 2021 (UTC)
MER-C Helpful ideas and links, thank you. I'm not sure what the right way is to bundle these up and push for prioritization; is it helpful to have an umbrella ticket for this that we can all add our +1's to? –SJ talk  16:29, 14 September 2021 (UTC)
The state of the admin tools is so bad that any improvement to anti-abuse tools in MediaWiki is welcome. Anti-spam is not the only problem we face - a general improvement in anti-abuse tools will make progress against several problems. If I were to pick one for quick wins, it would be the deleted title search in the API and UI improvements for the deleted title search so that they are available in new page patrol and other scripts. There's no escaping the massive pile of work that is phab:T20493 though. MER-C 18:25, 14 September 2021 (UTC)
  • SJ: I have surveyed data from various major UPE companies and they charge $500-$2,500, with median probably around $1,500. With respect to volume, after some back-of-the-envelope estimates, I would say that UPE page creations on English Wikipedia are in the hundreds per year (conservative), possibly in the thousands. It's hard to know for sure, since there's a long tail of UPE editors beyond the few major UPE companies. MarioGom (talk) 12:54, 28 January 2022 (UTC)
Thank you for looking into it. That's substantive, and good context for improving related tools. –SJ talk  21:01, 29 January 2022 (UTC)

Reduction + diversion of misapplied energy

[edit]

Another thought: have we looked into ways to reduce or divert demand for paid editing into constructive channels? The demand for run-of-the-mill publicity helps hide malicious manipulation. To limit this, we might create an (off-WP) space for freely-licensed intermediate steps along the path towards an on-WP proposed edit. And make the above even clearer by having all commercial paid editing start in this space instead, w/ its own tools and assumptions and expectations:

  1. Define (or create) a preferred site for sharing free knowledge about the 80% of this that is non-deceptive but still out of scope: self-curated statements by subjects, possibly-NN summaries, related media (less of an issue, given Commons' policies)
  2. Build tools to make it easy to create + transfer drafts there, streamlining deletion + migration processes.
  3. Observe activity there -- likely w/ meaningful correlation to UPE on Wikipedia.

As an example: sharing freely licensed media intended? proposed? for use in a paid article is easy, b/c Commons has fewer steps to follow, and is set up in a way that it doesn't suggest reputation or notability by associationn -- so its pages aren't very good targets for astroturfing; but are a mechanic for releasing media w/ metadata under a free license. An equivalent for nominally-factual statements (or self-reported statements) about verifiable entities could similarly narrow the subset of such edits that hit RC. –SJ talk  16:24, 14 September 2021 (UTC)

Hammering UPE operations does divert some demand into constructive channels. I have observed a few companies switching to in-house, policy-compliant, disclosed paid editors after the agency they hired was banned, exposed, and their customers appeared connected to it.
As for other venues, you can look at Wikitia and Everipedia. BLP and companies there are often correlated with UPE on Wikipedia. These are wikis they can use to freely dump all the promotional garbage they want.
On English Wikipedia we do have drafts, which is where commercial paid editing is supposed to go. For simple, hard facts, Wikidata is friendly to COI and paid editing, disclosed or undisclosed. Commons is also a relatively safe place unless they incur in copyvio (which is often the case). I'm not sure I fully grasp the kind of solution you're proposing here. MarioGom (talk) 16:43, 14 September 2021 (UTC)
Widely publicise examples of paid editing backfiring.
Unfortunately I've seen some diversion towards the Simple English Wikipedia. MER-C 18:29, 14 September 2021 (UTC)
Thanks, good examples. Wikitia and Everipedia could perhaps be such a place. [Update: it seems not.] They both still claim to be curated third-party encyclopedic content... even though they have neutrality and balance problems that far exceed our own. What I mean is
1. identifying a specifically recommended autobio-index -- a channel for a) freely-licensed, b) sourced + verifiable, self- or pr-descriptions of possibly non-notable entities, which explicitly states that it makes no claims of independence, balance, or neutrality, and does not present itself as an "encyclopedia".
2. sending commercial-PE (paid or otherwise) to such a channel to update a page there. away from the RC or on-wiki process here.
3. while all editors can use Drafts here to develop articles, commercial-PE should just post a link to this other site; from which a page or section could be imported if appropriate.
4. building tools to easily transfer pages to that channel and remove them from WP. self-promotions or articles by sock farms (that might one day be WP-suitable, if cleaned up) could be more readily deleted + migrated; and we might impose a higher standard of balance + notability for CPE, or a policy of bulk deletion/migration of articles created by UPE. All of this might be easier / less controversial if we know this isn't removing free knowledge from the web, but rather removing it from the search-engine-visibility and implicit curatorial approval of being on WP.
Do you think we could do the above w/ a pointer to Wikitia? Do you see value in trying to move CPE away from using drafts, and adding specific "delete and migrate" tools? @MarioGom: I haven't thought through the implications in detail, but that's roughly what I meant. –SJ talk  17:07, 15 September 2021 (UTC)
Do not ever consider Wikitia (but Everipedia is OK). Wikitia is a shady site documented in the MediaWiki spam blacklist. It is operated by Avoof [1], an Indian SEO company in Udaipur, Rajasthan that is well documented on en:WP:PAIDLIST. Avoof employees are the only users who are allowed to edit Wikitia. The main operators are Himank Seth [2] and Sonali Kavdia, who are highly active spammers on LinkedIn and Upwork and actively spam their own profiles all over the Internet to promote themselves. They regularly scrape articles off Wikipedia and charge clients a few hundred dollars to publish on Wikitia. Probably only 5% of the pages showing up on Wikitia are Avoof's clients.
They evidently have poor English skills as this confusing jumble of words on Wikitia's main page shows.

The community at Wikitia is limited to only specialized editors who are experts in their fields and can helping in building verified content by the continuous efforts. All articles and pages on Wikitia are protected to provide verified encyclopedic content by controlling the substandard edits by all users, who dont have knowledge in that industry or field.

Unlike Wikipedia, we believe that just negativity or controversial content are not a qualification for a page, which has garnered a lot of attention and mostly hate crime content is highlighted and not the positive work.

Dealing with this requires close monitoring of anything related to Wikitia that shows up on the Internet, such as job ads relating to Wikitia pages. Jwindleberg (talk) 02:28, 11 October 2021 (UTC)
Thanks @Jwindleberg:, so it sounds like that one is not a good option. Again suggesting we might want to set up our own venue, which rejects spam and abuse, but supports permanent rough drafts and not-yet-notable material. –SJ talk  12:51, 12 October 2021 (UTC)
Everipedia is a very good venue for "permanent rough drafts and not-yet-notable material." Larry Sanger, one of the co-founders of Wikipedia, was active in founding and managing Everipedia. The site looks well managed and has a very supportive community. On the other hand, there is no way that Wikitia can be viable option since it's completely controlled by shady Indian spammers. Jwindleberg (talk) 17:34, 12 October 2021 (UTC)

Obviously the top priority is to first hunt down the big-time harassers and extortionists. They are the ones causing the biggest damage to WMF projects. All the other small-time spammers who don't do extortion or blackmailing are just annoying but not diabolic. We should get rid of those online gopnik gangs like Wikibusiness.ua first. Jwindleberg (talk) 17:45, 12 October 2021 (UTC)

++ That makes sense to me. –SJ talk  16:03, 13 October 2021 (UTC)

Sophisticated (Eastern) European paid editing rings

[edit]

At Wikiproject:Antispam we see highly sophisticated paid editing syndicates operating mainly from Europe, especially Ukraine and Russia but with a smattering of US, French and German operators. On en-wp, the main actors are low-skilled spammers from India, Pakistan and Bangladesh who usually do not display much sophistication and largely confine their activities to en-wp. Starting off with a list of known sockfarms and their countries of origin listed at meta,

  • Çelebicihan - Ukraine
  • Wikibusiness - Ukraine
  • Serghiy Hrabarook - Ukraine
  • Olga Vaganova - Ukraine
  • Pierre Malinowski - Russia and France
  • Wiki PR, Status Labs - US
  • Ross kramerov - US, of Russophone origin
  • Olaf Kosinsky - Germany
  • Prix Versailles - France
  • KapitalBrand - Morocco
  • UA85 - Pakistan
  • Xenen1970 - Nigeria

Quite a different situation from en-wp which gets overwhelmed with clueless unsophiscated spammers from developing Anglophone countries such as India, Pakistan, Bangladesh, Sri Lanka, Nigeria, Ghana and the likes, as observed at en:WP:SPI. See this observation by Bri on en:Wikipedia:Wikipedia Signpost/2020-05-31/Recent research,

I've been working in this arena for a while, and in fact have a credit in the paper for contributing labeled data that was used to train the model. We aren't sure how sophisticated some of these operations are but my feeling is there's a distinct break between the activities of the outfits catering to well-funded Global North entities (in particular corporations and their executives, entertainers/entertainment companies, and politicians and political groups) – probably what you mean by the "professionals" – and the rest. I wouldn't be surprised if the former are highly aware of the investigative techniques used on-Wiki, and adapt to whatever metrics and techniques we apply, but the latter are unable to, at least quickly. But the greatest volume of stuff that has to be dealt with is due to the less sophisticated group, and it would still be useful to have tools that willow that away so human effort can be focused on the remainder.

This means we may need different strategies and tools to deal with, for instance, sophiscated Ukrainian paid editing rings vs. noob-type Indian freelancers. A pattern emerging now is that sophisticated x-wiki spam tends to come out of Ukraine, Russia and to some extent Western Europe, but the mostly monolingual Americans and British are typically occupied only with en-wp. Jwindleberg (talk) 02:28, 11 October 2021 (UTC)

For what it's worth, fairly sophisticated UPE operations are run from the US, UK, and Israel too. At least on English Wikipedia. Also some bigger operations are internationalized, e.g. primarily run from the US with additional teams in Philippines or Hong Kong. MarioGom (talk) 21:06, 5 November 2021 (UTC)

Challenges

[edit]
  • Hard to get a new sort of project off the ground
  • Aversion to facing the scope and impact of UPE, but editors, by admins, by WM
  • The community of disclosed paid editors measures in the thousands? a notable fraction of active editors (though many are inactive). The community of UPE may be of similar size. Together these can dominate 'organic' community discussions about policy development and change.


design chats