Grants:IdeaLab/Reduce attack pages which get picked up by other sites
Project idea
[edit]What is the problem you're trying to solve?
[edit]Attack pages can be created in Wikipedia. At inception, these pages are often indexed so they may get picked up by Google or other search parties very quickly, and they are often licensed with a creative Commons license so third parties can and retain them even after we choose to delete such an attack page.
What is your solution?
[edit]I have a modest idea, inspired by a real email received at Wikimedia (OTRS).
Someone created an attack page about an individual. This is, sadly, not that uncommon a situation.
On the positive side, we have some mechanisms in place to deal with such attack pages. Many such pages are detected fairly early on, are not just eligible for speedy deletion but get a privileged position in the administrative dashboard, so they are often deleted fairly quickly.
Unfortunately, there are third-party sites who scrape Wikipedia creations very diligently. Even though this particular attack page was deleted soon after creation, an outside site saved the content, and, you guessed it, did not delete it when we deleted it.
One could argue that this is not a Wikipedia problem. The attack page did not last very long, we responded swiftly and the material is not easily visible. However, the third-party site has chosen a name which makes it seem like they are associated with Wikipedia. (as an aside, this has been reported to legal) Again, this is arguably not our problem, but if somebody does a Google search for a particular individual and finds this rather graphic description, and thinks it is coming from Wikipedia (which it sort of did), it isn't much solace to the injured party who has people asking her about the content. She is being harassed, not by Wikipedia but we played a role.
I think there is a relatively simple solution, although it would require some change to our processes which by definition makes it not truly simple.
What if we were to change our processes so that any new article not created by editors who are new page patrol exempt is automatically no-indexed, and not granted a CC license until such time as it was patrolled by NPP? I believe that process doesn't take very long, and I believe most new page patrollers would recognize such an attack page and nominate it for CSD rather than accept it.
If this were the case, such an attack page might never get picked up by third-party scrapers and even if they figured out how to do so, they could not rely on the creative Commons license, because it wouldn't be licensed until the review by NPP.
I am sure there are some details to be worked out, but I think we could put a dent in the ability of people interested in harassing others who create a Wikipedia attack page, perhaps even knowing that it won't last very long in Wikipedia but recognizing that it may survive on the Internet through third-party reusers.
Project goals
[edit]The goal is that attack pages are not simply deleted quickly but never picked up by third-party re-users
Get involved
[edit]Participants
[edit]Endorsements
[edit]- Sounds like an excellent improvement. (Of course, an existing page can be turned into an attack, but this would be a good start.) Sminthopsis84 (talk) 09:51, 3 June 2016 (UTC)
- Excellent idea, we tried to get this change in the last big reworking of the Newpage patroll process but unfortunately it was considered too big a change. Currently patrolled can mean either patrolled and OK for mainspace or patrolled and tagged for deletion. To change the system so that all "unpatrolled" articles are started as "noindex" and patrolled means OK for mainspace would require a change to the colour system at newpage patrol with "unpatrolled but tagged for deletion" as a separate colour and a group that patrollers can choose to ignore. More broadly this could be part of the process of combining draft and mainspace with drafts being unpatrollled and therefore noindex pages in mainspace. WereSpielChequers (talk) 10:44, 3 June 2016 (UTC)
- I don't think we can do this under a non-CC license. The user is releasing their contributions with a ShareAlike requirement. We'd have to also release under the same license as the original. If we changed policy to remove "ShareAlike" from the edit form and had users release under a lesser restrictive license, then we could possibly release under a more restrictive and then come back to CC-BY-SA later. In any case, I definitely support the no-index idea wholeheartedly - at least until a page is reviewed. TParis (talk) 18:55, 4 June 2016 (UTC)
- Setting unpatrolled pages as no-index is an excellent idea, although I don't think that the licensing idea will work.Nigel Ish (talk) 20:11, 4 June 2016 (UTC)
- At the very least, we should not index the new pages until they're patrolled. This will help somewhat to prevent hoaxes and attack pages from propagating. The licensing sound difficult to work out legally, but maybe we can find some way to implement that aspect. NinjaRobotPirate (talk) 20:56, 4 June 2016 (UTC)
- Setting the no crawl bit won't help because most bots will just ignore them. Special:NewPages exists for discovery (or aggressive and distributed scraping of Special:RecentChanges) where anything can slurp it up until content. Closing those off hinders project transparency. Underlying issue is societal, not technical; fix the people and their robots will
nofollow
(until Singularity happens anyway ;-) Dsprc (talk) 00:23, 5 June 2016 (UTC)
- Setting the no crawl bit won't help because most bots will just ignore them. Special:NewPages exists for discovery (or aggressive and distributed scraping of Special:RecentChanges) where anything can slurp it up until content. Closing those off hinders project transparency. Underlying issue is societal, not technical; fix the people and their robots will
Expand your idea
[edit]Would a grant from the Wikimedia Foundation help make your idea happen? You can expand this idea into a grant proposal.