Research talk:Harassment survey 2015
Add topicThis talk page is for discussion of the Harassment survey 2015, not discussion of the consultation on the same topic. If you have comments or questions about the consultation, please share them at talk:Harassment consultation 2015. |
Move to Research:Index?
[edit]Can we move this page to Research:Index? This is a research project and should probably live there. The purpose is to keep all research organized in one space. I realize this might add a bit more work, but this is important for long-term documentation of the work that we do as a movement. Thanks! --EGalvez (WMF) (talk) 20:36, 2 November 2015 (UTC)
- Also, I'd be happy to contribute to the page if/once its moved. Thanks --EGalvez (WMF) (talk) 20:37, 2 November 2015 (UTC)
- There seem to be far more surveys in mainspace, but I have no objections, if there's a move to reorganize surveys to the Research namespace. If you want to, go ahead, Edward. However, the consultation should not move. Please leave a redirect if you do move the page, because otherwise people coming to it from the email especially will be lost. :) --Maggie Dennis (WMF) (talk) 20:42, 2 November 2015 (UTC)
- Thanks, talk | Maggie! - Will move and leave redirects --EGalvez (WMF) (talk) 20:46, 2 November 2015 (UTC)
- I remember suggesting this to Patrick, but I might have forgotten to CC everyone else. Btw, the research index is still searchable by default. --HaithamS (WMF) (talk) 01:07, 3 November 2015 (UTC)
- The move seems to be uncontroversial, as long as you leave redirects. :) I myself am comfortable with where it is, but if you're not and you're familiar enough with the Research space protocols to be sure it's welcome, have at it! --Maggie Dennis (WMF) (talk) 15:24, 3 November 2015 (UTC)
- I remember suggesting this to Patrick, but I might have forgotten to CC everyone else. Btw, the research index is still searchable by default. --HaithamS (WMF) (talk) 01:07, 3 November 2015 (UTC)
Design fault
[edit]There was a harassment survey on a pagetop banner today. I started to answer it. It might be a good idea, before doing any more such surveys, to do a survey on why people don't complete surveys. There's a basic design fault.
Here's a hint on how to do it better: I hope it's useful! If answers are wanted, allow people to submit their answers when they get tired of going round the "You must answer this question" loop; because if the only way to end the loop is to close the window, that's what they'll eventually do. If answers are not wanted, don't do the survey ...
The system didn't give me any clues, but I believe that the questions I couldn't answer were "Other: please specify", twice. Anyone who can tell me what the answer to that question would have been is quite a philosopher.
It wasn't easy to find this page -- that's another design fault, I'd say -- so I have meanwhile posted this comment at en:Wikipedia:Harassment. Andrew Dalby (talk) 09:55, 3 November 2015 (UTC)
- Hello Andrew, thank you for raising your concerns. There are currently a few questions in the survey where you are required to provide an answer for every statement, including the statement 'Other'. In those questions, you can select 'never' or 'not applicable' as a response for 'Other'. This will allow you to move to the next question. Participants are not able to go back to questions because the question order helps them keep context in mind when responding. We will explore how we can improve the experience though, based on your comments. So, thank you for sharing them. Kalliope (WMF) (talk) 14:14, 3 November 2015 (UTC)
- The page with sliders on which forms of harassment I have experienced would not advance until they were all non-zero. I selected a distinct number and added a note to say that actually meant zero. Burninthruthesky (talk) 16:39, 3 November 2015 (UTC)
- I'd like to say that I also found having to answer a question marked 'Other' completely unintuitive; it did not occur to me to select 'not applicable' without writing anything into the box. Bilorv (talk) 18:00, 3 November 2015 (UTC)
- I concur that requiring "other" be answered and requiring sliders to be nudged before they will register zero are both poorly designed. Figuring out how to make them work was a waste of time for me. For some this must have been a non-starter that led them to give up. ~ Ningauble (talk) 18:44, 3 November 2015 (UTC)
- I tried to do the survey today, and gave up after the first question, which asked how often I contribute to a "Wikimedia project". I didn't know what a "Wikimedia project" was, so I tried to find out, failed, and gave up. I have since learned that en:Wikipedia counts as a "Wikimedia project". So the very first question is enough to weed out all but the most experienced users. (A subsequent attempt to find the survey took me to the mystifying [1].) Maproom (talk) 18:05, 3 November 2015 (UTC)
- Thank you for letting us know Maproom. The first question is now linked to the list of Wikimedia projects. Any users unsure what those are can now quickly refer to it. I hope this will be helpful. Kalliope (WMF) (talk) 10:27, 4 November 2015 (UTC)
The tech issue on the multi-statement questions (like the one with the sliders) should be now fixed and no longer forcing participants to enter a number in order to progress. You will still get a reminder that "you have not answered all questions/statements" if some of the statements are unanswered but it should no longer prevent you from moving on to the next question. Do keep in mind that it is still not possible to move backwards into the survey. So, if you chose to leave statements unanswered, you won't be able to return to them later. Thank you all for raising your concerns as this did not appear to be an issue during the testing phase, before the survey's launch. If you become aware of other such glitches, do let us know.Kalliope (WMF) (talk) 09:15, 4 November 2015 (UTC)
- Thanks very much for your responses, Kalliope. I believe the designers and testers of surveys of this kind don't realise how many potentially good responses they lose half way. It's for a similar reason that I have never managed to complete a survey on The Guardian website: I'd love to respond to them, it's a great newspaper, but I don't eventually have the patience to work out what it is that they think I am failing to do. Andrew Dalby (talk) 14:02, 4 November 2015 (UTC)
Meaning of "disclosing" identity
[edit]There was a question near the end of the survey where it asks about how often the surveyed person "discloses" various aspects of his/her identity. Now I use my real name as my account name, and provide a link from my user page (on at least one wiki, not meta) to a social-media page where I disclose religion, gender, location, etc.; but I don't have any identity or POV badges on my user page, and so far I've never qualified a commit comment or discussion participation with "I have a COI with subject X because my identity is Z", or anything like that. Furthermore, my name provides clues to my gender and ethnicity. Does that count as "always" or "never"? If someone used his or her real name but no outside links, and a Google search on that name would turn up identifying information, how should that person answer the questions?
So that part of the survey might be better with an "N/A" option, or gradations of directness within "Always". DavidLeeLambert (talk) 16:25, 3 November 2015 (UTC)
Agreed. It's not like I've gone out of my way to explicitly make any of those things clear, but I'd imagine some of them are obvious given the way I go about my business...I, too, use my real name.--MichaelProcton (talk) 16:54, 3 November 2015 (UTC)
- There are direct and indirect ways of disclosing private information about you. For example, if you are using your real name as your WP username, you don't need to link it to your social media profiles. No matter how easy or difficult it is for somebody to trace you through google searches, it remains a fact that your real name is out there in an obvious and hard-to-miss way (:others will always know your name by looking at any of your activity in the projects). So, this question would warrant 'always' as a response. A more indirect way to disclose information, such as gender, is to use words that change format depending on the gender they refer to or, are gender-specific. For example the username 'moonriddengirl' indicates that the person using it is female through the word girl. So, this would also warrant an 'always' response. If nothing in your username or user page indicates where you are from but you have at times shared through discussion pages this information, then this would be a 'sometimes'. Once it's out, it's technically out. But one should also consider the degree of difficulty (or ease) in finding that piece of information and spreading it or using it against you.Kalliope (WMF) (talk) 09:31, 4 November 2015 (UTC)
Content-free page
[edit]I saw this mentioned at enwiki and looked here to get some information, but there is just market-speak. What survey? Is it by invitation, or is there a link I have missed seeing? Johnuniq (talk) 01:31, 4 November 2015 (UTC)
- What is "market-speak"? --FeralOink (talk) 15:52, 7 November 2015 (UTC)
- Hello, Johnuniq. :) It's a link via banner which displays randomly to a percentage of a project's users and which is being sent to a random selection of dormant contributors. It is starting on smaller wikis first and working its way towards larger projects, culminating on English Wikipedia. We have a limited number of survey responses we can accept, and English is likely to overwhelm that number fairly quickly. It will run until we run out (or for two weeks, in the very unlikely circumstances that we don't). --Maggie Dennis (WMF) (talk) 01:38, 4 November 2015 (UTC)
- Thanks but my point is that it is very irritating that there is no indication of that on the page; that leads to people like me wasting time by carefully examining every link to see what I'm missing. There should be a community communications team who can check that information pages actually provide information. Johnuniq (talk) 02:28, 4 November 2015 (UTC)
- You raise a good point, Johnuniq. I created the page, and I'm sorry if it disappoints. But it's a wiki page and changeable, so I'll add those details now. --Maggie Dennis (WMF) (talk) 02:33, 4 November 2015 (UTC)
- Thanks but my point is that it is very irritating that there is no indication of that on the page; that leads to people like me wasting time by carefully examining every link to see what I'm missing. There should be a community communications team who can check that information pages actually provide information. Johnuniq (talk) 02:28, 4 November 2015 (UTC)
Reporting harrasment
[edit]Responding to the survey, I think I have some stories for CA team to be aware of a long-term harrasing user. But I failed to find relevant information from CA. Could you enlight me? Is it cawikimedia.org? Not being sure, posting here... — regards, Revi 12:57, 4 November 2015 (UTC)
- Hi Revi. Particular harassment or abuse issues should not be posted here as this discussion page is neither a reporting mechanism nor a resolution process. The CA team can indeed be contacted directly through the email address that you mention, also found at the very bottom of the CA page.Kalliope (WMF) (talk) 14:54, 4 November 2015 (UTC)
- I know ;) I was just double-checking since I was in doubt. Thanks for confirming. — regards, Revi 16:08, 4 November 2015 (UTC)
User page vandalism
[edit]I was quite surprise to see user page vandalism as a part of harrassment. It is usually a form of vandalism and very, very rarely a way to harrass people. Just look at Jimbo's page history to see that his page was vandalised at least once a week, but hardly anyone wanted to harrass him. And the real harrassment (where a person vandalises userpage to harrass someone) on a user page is double counted, as it always falls under using obscene words / discriminations based on some identity / threats / trolling, or at least is a form of stalking. I would suggest not counting this line as it is irrelevant (or at most is a proxy of counter-vandalism activity of a user) — NickK (talk) 17:56, 5 November 2015 (UTC)
- NickK, thanks for this feedback, and for taking the survey. This may be a good point to raise during the upcoming consultation as well. With this survey, we're trying to track both the forms that harassment is taking, and the method/medium/mechanism used to deliver it. I agree that most user page vandalism takes the form of blanking, template-breaking, or silly/nonsense text, but some of it is directly intended to insult or threaten the page owner. So we did want to track this as well. Patrick Earley (WMF) (talk) 18:25, 5 November 2015 (UTC)
- Certainly userpage vandalism is used to harrass. I have seen many examples. As to whether vandalism on Jimbo's page is intended to harrass him (the example used by NickK), well, I guess Jimbo would be better placed to give an opinion on that. Andrew Dalby (talk) 10:16, 9 November 2015 (UTC)
Is harassment a global or a local problem?
[edit]This survey seems to be treating harassment as a universal problem that is present in more or less the same extent across all Wikipedia editions. My experience tells me this is not nearly true: some wikis are much worse than others. I have a concrete wiki in mind, but the survey is not interested in finding out which one, as it is not the wiki where I edit the most. In the end, everything will be averaged, and input from en wiki editors (en wiki is not perfect but is still a paradise compared to some wikis), who are the most numerous, will drown everything else. While the WMF is obviously aware of harassment as a universal problem, it refuses to deal with the concrete issues and complaints. Surveys do not solve anything; sometimes, one needs to check where the stink is coming from. GregorB (talk) 20:15, 5 November 2015 (UTC)
- I had a similar issue. I edit/contribute on several projects (more and less frequently) and found it hard to make over-all statements. It would be ideal if one could do the survey for every project, but that would not be very time effective. I disagree with "surveys do not solve anything". They do provide awareness but I wonder how project-specific this awareness can be. The more the better.--Jetam2 (talk) 22:20, 5 November 2015 (UTC)
- What I meant with "surveys do not solve anything" is not that surveys are generally unnecessary or useless. It's just that I don't understand why a survey which is specifically about harassment asks me where do I edit the most (en wiki, thank you for asking...), instead of where I'm harassed the most. Of course nobody will be happy editing where they are harassed, and will move someplace else. So, this question, instead of detecting problematic wikis, will detect wikis which are not problematic (or, more likely, detect nothing). To me, it's as if forming a question this way indicated a lack of concern about the root problem, which is toxic communities. And, since there is seemingly no interest in root causes, I'm afraid that the only outcome of the survey will be generalized statements such as "yeah, people are not perfect, let's introduce more policies". GregorB (talk) 11:22, 7 November 2015 (UTC)
copy-paste mistake
[edit]There is a duplication in the question of how long the hassassment lasted. The second-last item should probably read 'more than a month but less than a year'. Cheers, Pgallert (talk) 08:03, 6 November 2015 (UTC)
- Corrected. Thank you for letting us know!Kalliope (WMF) (talk) 14:00, 6 November 2015 (UTC)
A few more notes
[edit]The iniciative is to be applauded and I do so. I was in fact thinking of a similar one on my home wiki. On the other hand, I think that the survey could be made more Wiki-specific. The survey does not work so well for a (predominantly) online community like ours. Perhaps a question on harassment could be included in post-event feedback after offline meetings. Sexual harrasment is a serious issue. In our community, however, I would not expect it to happen. Not that we are awesome that way more because there is close to zero personal (off-line, face-to-face) contact between Wikipedians. Online-type harassment happens much more often: escalating discussions, name-calling (mostly related to educational underachievement, mental health and similar topics), editing wars, reverting edits without explanation, vandalizing/blanking of personal discussion pages or, the other extreme, ignoring a person totally. These types of harassment could have been studied in more detail. Another aspect that could have been included in the survey was the responses of different types of users. An admin has more tools and responsibility to deal with harassment than an editor. Blocking users that harass, for example, was not mentioned in the survey. Some Wiki projects are more global and international (English Wikipedia, French Wikipedia, maybe Arabic Wikipedia?), others are less so (Slovak Wikipedia etc.) In my experience, there is sometimes a homeland vs diaspora attitude that can become a cause of harassment or, at least cause feelings of being unwelcome. Some languages are gendered. Unless a person chooses not to participate in discussion, it is very hard to avoid revealing a gender. Having project- or language-specific surveys would be a good step but I also understand there are time and staff limits.--Jetam2 (talk) 14:48, 6 November 2015 (UTC)
- Jetam2, thank you for this. This is the first survey on this topic in the Wikimedia movement (that I know of :), and there is always room for improvement. We hope to do follow-up surveying in the future, and you input is very valuable for deciding what changes we can make in the data being collected. Patrick Earley (WMF) (talk) 19:30, 6 November 2015 (UTC)
- So, exactly how much money is WMF wasting on this project? Doesn't Wikimedia already have plenty of conflict resolution mechanisms? Do they even need my donation anymore? XavierItzm (talk) 08:05, 7 November 2015 (UTC)
Selection bias?
[edit]When I logged onto Wikipedia, a banner appeared asking me to participate in a survey on harassment. This, I'm afraid, will produce selection bias. Someone who believes that they have been subjected to harassment, or who's witnessed what they regard as the harassment of others, will be strongly motivated to respond to the survey; someone who's never felt that they've been harassed will be less likely to respond.
At the "Research" page, it's stated that "Community Advocacy has been advised that prematurely releasing the questions may bias the survey results." I'm afraid that the description on the banner will itself tend to produce a bias toward overestimating the incidence of harassment. It may be too late, but could I suggest that the description be changed to a more neutral phrasing, e.g. "participate in a survey", with no mention of the survey's topic? — Ammodramus (talk) 15:52, 6 November 2015 (UTC)
- Even worse, the two groups that would be most interested in participating will be: a) those who want WMF adopt serious measures against it, b) those who think that WMF should not act against harassment and leave it up to local communities. Unfortunately, there is little chance we will have participants in the middle, i.e. those who are not that concerned with the topic — NickK (talk) 17:26, 6 November 2015 (UTC)
- Ammodramus, NickK - Sampling bias is something we have have certainly considered when constructing this survey. We have used banners because we don't have any other data to segment users based on their relation to harassment, thus we are aware of the homogeneity of the response influx pool, and it's addressed as a conscious bias in the survey input.
- In terms of the description of the survey in the link, removing the word "harassment" from the banner wouldn't make much difference, as people will still see the survey topic in the first page in Qualtrics. They still have the chance to close it without responding to it, thus removing "harassment" from the banner would likely only drive people by curiosity to click on the link. Curiosity can create a bias as well as motivation.
- In short, each sampling method is going to introduce some error margin, and we've chosen what we thought is the best in this case. We've assessed the risk and outcome of each of the sampling methods, and we decided to go with what we believe will bring the most reliable data from the survey. Patrick Earley (WMF) (talk) 19:25, 6 November 2015 (UTC)
- There's nothing wrong with engaging those who have strong feelings about a topic. That's just how a democracy should work. In this instance it's really better than a random sampling.
- Wikipedia has major issues with suppression of opinion. All too often legitimate criticism of administrator actions is falsely labeled as AGF, threat, or harassment. Most of the administrators of today came in under a time of extraordinary growth and were not selected based on their ability to be good cops or judges. Another problem is the the English ARBCOM does not examine points of fact or allow public viewing of their BASC discussions. Final point. The doctrine of NOTTHEM is flawed. In some circumstances YESTHEM arguments are legitimate. PhanuelB (talk) 16:26, 11 November 2015 (UTC)
Survey Design
[edit]Moving through the questions, I found manybasic design problems in the manner in which questions were worded and answer choices were worded or given. I'm fairly certain there are other WP users who have relevant social science survey design experience and expertise. I'd strongly suggest you tap some of that expertise for your next site-wide survey. Meclee (talk) 22:42, 6 November 2015 (UTC)
- I am going to agree with Meclee here. There is an enormous gap between "once a day or more" and "once or twice a week". Many of my answers would fall into that gap. I'm not sure I can answer the questions with any degree of reliability given this. The question "How many times have you experienced incidents like the ones described below while working on any of the Wikimedia projects?" is not entirely clear; much of the problematic behaviour can happen off-wiki (e.g., IRC, emails, mailing lists, other public forums) because of work on a Wikimedia project, but it's not clear whether or not you want those events included or excluded. The sliders on the third page are not appropriately set, and give the impression that getting called a bad name once is equivalent to being stalked once, and realistically stalking doesn't fit into the list because it is almost guaranteed to include significant off-wiki activity. "What was the harassment based on?" doesn't include some of the most obvious and common reasons for harassment, which include "content dispute" and "movement-related role" (e.g., administrator, steward, chapter executive). "How did the witnessing of another community member being harassed affect your own participation to the Wikimedia projects?" does not consider the possibility that users continued to contribute while contemplating stopping. The questions in the personal profile section that list what personal information one might share does not have an option "I share it with some people but not others" - which is probably amongst the most frequent answers. Finally, there's no back button on the survey, which means people cannot go back to correct responses without starting all over again.
On the other hand, I'm very happy that the WMF is actively working to collect information in this area. One hopes that the information will lead to actionable plans (actionable both in the sense of "something that can be done" and "the resources are put in place to take actions"). Wikimedia can't patrol the internet (i.e., activities outside of venues that are under WMF control), and it may become clear that there are some pretty divergent ideas on what does and does not constitute harassing behaviour, but this is a good first step. Risker (talk) 06:34, 7 November 2015 (UTC)
Firefox 41.0 with Adblock user here. I confirm that the survey layout itself was all messed up. There were no tables as such, and the strings were shifted. I had to mentally recreate which radio button was meant to be. Also, many times I got stuck, as the radio buttons did not cancel one another. I think the survey engine should be changed. Zezen (talk) 14:43, 7 November 2015 (UTC)
Add "prefer to skip this question"
[edit]All questions should have an explicit "prefer to skip this question" choice. Davidwr/talk 22:56, 6 November 2015 (UTC)
"Other" button
[edit]Several questions have an "Other - please specify" option, but if one has nothing to put in the other category the survey still makes you check a box by other. ONUnicorn (talk) 12:56, 8 November 2015 (UTC)
- The survey was initially set so that if one had to select a response for the *Other* statement in grid questions but had nothing to add in the text box, they could select 'Never' or 'Not applicable'. As a few participants found this confusing, all grid questions with such an option have been set so that they no longer prevent the participant from progressing to the next question(s). In other words, you no longer have to select something for the *Other* statement. You will get a reminder letting you know that you have skipped a question if that's the case, but this will still allow you to continue. Kalliope (WMF) (talk) 12:12, 9 November 2015 (UTC)
Qualtrics
[edit]I use the "Ghostery" browser plugin, which prevented a redirection to the survey in the first place. I didn't even start the survey, because you use the services of a private company which is obviously active in the field of data collection and analysis, but over which almost no information is available. The Wikipedia article is a stub, and you expect users to trust this service? The surveys conducted through Qualtrics will be biased, because privacy-aware and generally suspicious users are likely to not even look at the first question. This is also a design fault. The WMF should spend the money for the development of their own survey software instead, which would make the projects more independent and provide better protection for our data. --CHF (talk) 22:08, 9 November 2015 (UTC)
- Hi, CHF. Thanks for your feedback. I'm sorry that you weren't comfortable completing the survey. The invitation links to their privacy policy specifically because we wanted users to be able to see what their practices are. It would be fantastic if we had in-house built survey tools, but unfortunately we currently don't. We really want feedback on this issue, though. Hopefully you will feel comfortable talking about the issue in general in our upcoming consultation which we intend to launch next Monday at Harassment consultation 2015. We were going to run the survey and consultation simultaneously but were advised that this would bias our survey results, so we had to delay the consultation component. --Maggie Dennis (WMF) (talk) 23:30, 9 November 2015 (UTC)
- Hello again, since I didn't find a better suited place, I'm updating my old statement here, despite the comment being about Qualtrics rather than the Harrassment survey. Seven years have passed, and there's still a WMF login page at this company; according to this issue in "Phabricator" their survey software ist still used as of now. The same criticism applies to the WMF's account at Slack mentioned in the same "Phabricator" issue. What comes next? Hosting WMF projects at Cloudflare or Amazon? In 2018 Qualtrics has been acquired by SAP, "the world's third-largest publicly traded software company by revenue", please see the paragraph about controversies there. You are running one the planet's largest internet sites, so you should be able to self-host surveys as well as all communication facilities. Open source alternatives to the above mentioned "solutions" provided by large business corporations do exist. The german chapter, for instance, performs many surveys "on-wiki"; those are probably not as simple to evaluate, but avoid many privacy problems. Slack is even more simple to replace: you can choose between many free IRC and XMPP (Jabber) servers, as well as set up mailing lists – their "lack" of "hierarchical tools" for corporate use is not a problem, it's a good thing from the users' point of view. This seems to be also a cultural issue; the foundation is US based, it's not by chance that you don't have anything like the EU's GDPR over there, and it seems that US people generally view "big business" more positive than most people here do. The main problem is: in theory you can – as a community member, but not as a foundation employee – make the choice to not trust the third parties mentioned above, but this comes at a huge cost: doing so prevents you from participation in decisions and discussions potentially affecting all Wikimedia projects. I know, those services "just work" and are now well established and easy to use, but this sort of vendor-lock-in should not prevent the foundation from becoming as fully independent as possible from external IT infrastructure – or at least use the services of other free projects with similar goals to our own ones (like the Libera.Chat IRC server network which already hosts many Wikimedia related channels). --CHF (talk) 15:06, 14 December 2022 (UTC)
Language, focus - get help - better help
[edit]You would think that a survey started due to gross incaution with words would be put together with the utmost care. I did not get that impression at all.
In one case I was stunned at an omission. For the question (paraphased) "how did you react to witnessing harassment", where was the answer "I edit less than before" or "I now edit defensively"? There was "I walked away entirely". There was "but I came back". There was "my name is Pangloss". But not "I was depressed at the level of acceptance of ongoing nastiness, and so naturally repelled, felt less desire to participate, though I keep coming back because I love the WP idea." How much review did these questions get?
Again, in this too frequently poisonous climate of modern life, where respectful discourse is preconditioned on how reasonable "the other side" is, how could you not yourselves monitor your own speech? Do you have any idea at the alarm bells started by calling something a 'movement', as you mentioned in one page? You may know what you mean. I get the inkling it may mean WP/WM in general. But 'movements' are those things where social action / social justice groups exercise domination ostensibly to some societal good, but employing mobs and shaming, and chiefly to maintain themselves. Did you mean that kind of 'movement'?
You have not been careful enough here, and need to broaden the people reviewing and finding those 'unforgivable' mistakes. Like calling a trough a spade. Not giving a fig can have great repercussions. (And if you don't know what that refers to, ask someone who knows) Shenme (talk) 01:19, 11 November 2015 (UTC)
- Hi, Shenme. I'm sorry that you felt the language was inappropriate. The questions received extensive review, from staff, our community workgroup and the external experts we consulted. In terms of "movement," it's fairly commonly used to describe our work - see Wikimedia movement affiliates and wmf:Wikimedia Movement Strategic Plan Summary, for two examples. It is linked in the sidebar on Meta as "Movement affiliates." I was unaware that the term was controversial, but its use in the survey reflects its usage ins uch locations as that. --Maggie Dennis (WMF) (talk) 01:27, 11 November 2015 (UTC)
Wondering how many others have seen "elitism" in reverts
[edit]as i did the survey it brought up the anger of every reverted edit.... now my edits often take hours due to being disabled and a slow typist, but often these hard thought about edits end up being deleted in a single click of the revert button and some 10k editor has taken two seconds to eradicate hours of work and record his 10,001st edit
(it is my understanding that a revert counts as an edit for the one that clicks the revert link and the revert also removes the edit from the reverted authors edit tally, please correct me if i have misinterpreted what i have read)
it might do for those interested in motivations of wiki participation to read up on game motivation i found Bartle's Player types fascinating and upon thinking about it seemingly on point. (one should note that even Wikipedia awards badges as well as contributor status not to mention the list of superuser permissions that are given to users to be more productive in titled roles such as moderator and administrator thus the extras like badges, titles and extra permissions and tools used are all "rewards" in the parlance of Bartles Taxomony of Player types and game theory making the editing of a wiki a game just like the board game "Risk")
Before someone says wiki is not a game, i know that. but editing a wiki acts just like a game and i purposely used "Risk" as an example. you can stay in a small area of expertise or "Explore" by clicking the "Random Page" link those who focus on single area often become the experts in that area making them the "killer" according to Bartle. some are more social while the "achievement killers" tend to find an insignificant error and then revert with a terse rule reference (Rule Nazi's) (has anyone been frustrated by someone who spent hours memorizing the rules so they catch you playing Risk phase three before you are done doing your phase two but "Rule Nazi" says you cannot finish phase two because you started phase three on the other side of the board.
there are so many other parallels but as i have pointed out it is hard (even painful) for me to tyle for long periods so i will revert to my main question...
Please give examples of "elitism" you have encountered in wikis (not just Wikipedia products but other wikis also) and any ideas you have to prevent it from discouraging others like you reverts are certainly a key used by this sort but a rewrite that changes the meaning could be just as bad (do not include edits that add info or that changes structure but not the meaning unless there is evidence that the re editor did so with an air of "you have no idea what you are doing but i know everything") remember that a wiki is supposed to be edited and reedited
While my reason to add this question is to get an idea of the extent of lazy editors reverting instead of editing for clarification, i want full feedback on anything you think was motivated by "Elitism", the reason(s) it felt elitist and especially any ideas you can think of that will reduce or eliminate the problem in the future
in the survey i asked for a rule change and a procedural change to verify the rule change (limit reverts to blatant vandalism and require a revert to be verified by multiple veteran editors) i also thought about other checks and balances like a time limit that only an edit can occur (to encourage editing rather than reverting) but that doesn't feel as good as the verify option (besides any edit only time span wouldn't give me enough time unless it was at least a week, i just don't check wiki that often) Qazwiz (talk) 07:26, 11 November 2015 (UTC)
Murder of Meredith Kercher Article
[edit]I just became aware of this survey. I think it's great that the WMF is willing to take a look at the subject. The problems here are worse than the community generally recognizes.
I hope that the organizers of this survey will take a hard look at the events surrounding the Meredith Kercher topic and its treatment of Amanda Knox and Raffaele Sollecito. There has been some measured improvement in the topic since the acquital of the two in March 2015 but the fact remains that many editors who broke few if any rules were blocked because they challenged a deeply troubled article. The use of false allegations (including harassment) was a central element of the events.
Here is an article I wrote which documents many of the flaws in the article. Please note that Jimmy Wales has made extraordinary statements about the article and the associated blocks.
Please take note of this proposed presentation at the WikiconferenceUSA last August. Toward the end one of the commenters, Dominic, points out that I had been banned for threats and harassment. The allegations of harassment and threats are manifest nonsense and were part of a concerted campaign by British administrators convinced of Knox and Sollecito's guilt to eliminate all those who challenged their treatment of the two.
Notes:
- (1) RS excluded from the article
- (2) Commentary by Jimmy Wales
- (3) Criticism of the article by RS and retired FBI agent Steve Moore
- (4) Signpost article talking about the dispute
- (5) Article on Hate Site by a Wiki administrator who has implemented many blocks on the topic
- (6) British tabloid reporting
- (7) Threats made to family of RS Nina Burleigh by Peter Quennell, Webmaster of TJMK
- (8) Presentation to a Wiki Meetup in NY
- (9) Article detailing the dichotomy between the article's administrators, RS, and Jimmy Wales
- (10) List of Editors blocked on the topic
PhanuelB (talk) 14:46, 11 November 2015 (UTC)
- Hello PhanuelB, thank you for your contribution. Please note that this page is not a place to report incidents of harassment, dispute actions taken (or not taken), discuss specific incidents of harassment or dispute specific article content actions. Rather it is to raise issues or concerns about the survey itself in terms of content, translations errors, formatting errors, design, approach, methodology, things missed, etc. If you wish to report a specific incident or raise your objection to actions taken, I would advise that you use the appropriate channels. Otherwise you are welcome to follow the discussions soon to take place at the Harassment consultation 2015 page, which will be open from November 15th onwards.Kalliope (WMF) (talk) 14:43, 12 November 2015 (UTC)
Translations
[edit]In the German version of the survey, one section is not translated:
- Being treated differently/unfairly based on personal characteristics, instead of merit [Discrimination]
Unfair is translated with "unfähr" (instead of "unfair" like in English). --Martina Nolte (talk) 20:54, 14 November 2015 (UTC)
- Thank you Martina Nolte. Corrected. Kalliope (WMF) (talk) 11:57, 15 November 2015 (UTC)
Witnessing others being harassed
[edit]On the third to last page of part II you ask whether the participant has witnessed others being harassed and supply as possible answer ‘never/not sure’ (quoted from memory as it is not possible to return to a page once you’ve gone on). On the following pages you then force the participant to state in what ways they responded to this harassment, not supplying as possible answer ‘not applicable’ and thus forcing them to give nonsensical answers if they want to continue filling in the survey. --JaS (talk) 14:54, 15 November 2015 (UTC)
- @JaS: I also noticed this: an odd mistake. --Rubbish computer (HALP!: I dropped the bass?) 01:07, 4 December 2015 (UTC)
Survey results suggest flaws in design and execution
[edit]I have been challenged by @User:GorillaWarfare on Twitter to generate a discussion on Meta to review potential flaws in the design and execution of this Harassment survey, prompted by some of the hard-to-believe findings in the report.
An initial glaring problem (to me) was the finding that 54% of Wikimedia (Wikipedia, Commons, Wikidata, etc.) users said that they definitely or maybe (they didn't know for sure) have experienced harassment on these sites (Figure 13), of which 61% said that the harassment took the form of "revenge porn" (Figure 16). So, netting those together, it means that just about 33% of all Wikimedia project users have been or may have been personally victimized by revenge porn on a Wikimedia project site.
Now, let me say this. GorillaWarfare has been victimized by pornographic image associations with her name and/or image, through photoshopping and the like, but (as far as I know) she has not been victim of actual "revenge porn", which Wikipedia defines as "sexually explicit footage of one or more people distributed without their consent via any medium. The sexually explicit images or video may be made by a partner of an intimate relationship with the knowledge and consent of the subject, or it may be made without his or her knowledge". The key difference being -- is the image actually of the person in question, or is the image doctored to look like the person in question? While I am sure that across the vast number of users of Wikimedia projects, there may have been a half-dozen instances of a Wikimedia user being depicted in sexually explicit footage, and then found another Wikimedian distributing that content without the subject's consent, it is incredulous to believe that one-third of Wikimedians have been thus victimized. If that were the case, it would be on national nightly news for weeks running.
What I believe is happening here, is that respondents to the survey are taking the opportunity to "check yes" to things that come close to what happened to them, or that they heard have happened to others around them, by which the survey then attributes these specific things actually and literally happening to everyone who "checked yes". Another example, one verbatim from the study results says: "Had an explicit pornographic website created based my username". That's horrifying, demeaning, and defeats the human integrity of the victim. But that's not "revenge porn", and I will bet $10 that the respondent who typed that specific example marked that they had been subject to "revenge porn".
There are numerous other contradictory findings from this survey, which we can continue to discuss here. For example, Figure 28 reveals data that is impossible within a sample tallying to 100%. How could 56% say that they "did not react / ignored" the incident of harassment, while another 50% said that they "discussed it with other community members"? If you did not react, you can't have also discussed it with others. So, at least 6% of respondents to this question misstated something. Or, there are other ways to interpret the discrepancy -- they may have ignored one act of harassment, and then discussed a second act of harassment; or, they initially ignored an act of harassment, but later decided that they felt the need to discuss it with the community. Either way, it's an example of confusing or misleading survey design. You want to construct a survey so that there is not a strong chance that the data will come out in a misleading way or make the respondents look like they don't know how to answer a seemingly straightforward question. This is a challenge for all survey writers, and professionally I've written at least 2,000 surveys in my career. Outside of a few people on the WMF staff, I may be one of the most qualified survey designers who has ever volunteered to help the WMF with survey design, but they don't seek my assistance when new surveys are cooked up. And when I've stepped in before (such as at Talk:Fundraising 2009/Survey), it seemed that the WMF staff leader of the study didn't have any time to respond to the community's input.
I don't want my criticism to be taken as "Well, Mr. Kohs doesn't think there is a problem with harassment on Wikimedia projects". There is absolutely a huge problem with it. I myself have been victim of numerous ongoing forms of harassment -- even an off-Wikimedia wiki site (which had and has heavy participation by Wikipedia regulars, even some administrators) lampooning my sex life, illustrating a "FleshLight" as my favorite toy, and questioning whether my daughter is even my own biological offspring. So, I know what it's like to be a victim of online harassment. On the flip side, I have pursued information about various Wikimedians that they have in turn interpreted as "harassment" actions. For example, when one Wikimedian who had numerous photos of herself on Commons was accused by another Wikimedian that she physically threatened him at a conference, when I copied a couple of her Commons photos to identify her as the accused, suddenly her ID badge on her photos was blurred out, and then shortly after that, all of her photos were summarily removed from Commons. I understand through back-channels that she views my action of copying her freely-licensed images as "harassment", when all I was trying to document is that when Wikimedians are caught doing something embarrassing, the insiders' first instinct is to hide the identity of the person who messed up. Not the hallmark of an "open" and "transparent" community.
If the Wikimedia Foundation is going to tackle important issues like harassment of participants on the Wikimedia sites, then they owe it to us to design surveys that produce credible results, not exaggerated and misleading results. - Thekohser (talk) 13:39, 1 February 2016 (UTC)
- Thekohser, I don't know if this will help you, but the questions from the survey show that some allowed for multiple answers. The confusion you mention in adding up to 100% is due to respondents replying with multiple answers (I believe it was question #13 that Figure 28 came from). CKoerner (WMF) (talk) 16:10, 1 February 2016 (UTC)
- I do already understand that question #13 was a multiple response question, but the WMF set up the list of possible answers with one mutually exclusive item -- if you "did not react / ignored" something, you cannot have also taken any of the other actions listed. So, with 56% saying they "did not react / ignored", that means that the largest possible tally for any of the other ACTIONS would be 44%, the complement to the 56% for INACTION. Do you see what I am saying? It has nothing to do with the fact that it is a multiple-response question. That was implicitly understood, even in my critique. - Thekohser (talk) 16:18, 1 February 2016 (UTC)
- Thank you for your comments Thekohser. Let me see if I can address some of your concerns.
- When reading through a report it is always worth going through the entire document for a clear understanding of the information presented. For example, pp.2 of the report states “As the survey was a voluntary opt-in survey, the sample of people who opted to respond to it might not be representative of the general Wikimedia user base.” To me this statement leaves little room for misunderstanding that the findings of the report represent the entire Wikimedia user base; rather it can only reflect on the contributors who took the time to fill in the survey.
- You have concluded that “just about 33% of all Wikimedia project users have been or may have been personally victimized by revenge porn on a Wikimedia project site.” based on figure 13, pp15. Upon more careful look at figure 13, you will notice the statement “Out of 2,495 that responded to this question...” [followed by the figure, presenting the %]. If this is not a clear enough statement that the figures presented reflect only on the participants who answered that question, let me run some reverse math on your conclusion. By all means correct me if I’m wrong but, 54% of 2,495 respondents accounts for 1,347 users. 61% of those 1,347 users accounts for 822 [users]. Based on your conclusion’s logic this means that 822 users... account for 33% of all Wikimedia project users. I am fairly certain that this is a grossly inaccurate. Thus the very statement it was based on, also erroneous. If anything, the % presented can only reflect on the contributors who took the survey - not the entire Wikimedia community. This, I believe, is made clear on several parts of the report, as pointed above.
- In regards to the definition of revenge porn, it is worth noting that brief definitions to the types of harassment listed in the survey had been provided to the respondents, in an effort to avoid misunderstanding of said terms. Those same definitions are also listed in the report’s Appendix, as per note at the bottom of page 17 [which is where the different types of harassment appear in the report for the first time]. Revenge porn, for the purposes of the survey, had been defined as “publishing of sexually explicit or sexualised photos of without one’s consent”. Which means that this term [for the purposes of the survey and the subsequent report] includes more than just somebody's NSFW photo being publicised. It includes any kind of sexual or sexualised [:photoshopped] image that has been linked to a Wikimedia contributor, even if only through their username. This may differ from other definitions of revenge porn but it may justify the higher-than-expected % of respondents who selected it.
- Not all figures total 100%. It is possible for a respondent to have been subjected to more than one forms of harassment during their contributing to Wikimedia projects. As such it is possible that they reacted in more than one ways. Certain survey questions, allowed the respondents to select more than one of the options listed, and/or add their own if it wasn’t listed already. The survey was not a follow-up questionnaire about a specific a experience of harassment rather an inquiry into whether respondents had experienced harassment overall.
- When reading through a report it is always worth going through the entire document for a clear understanding of the information presented. For example, pp.2 of the report states “As the survey was a voluntary opt-in survey, the sample of people who opted to respond to it might not be representative of the general Wikimedia user base.” To me this statement leaves little room for misunderstanding that the findings of the report represent the entire Wikimedia user base; rather it can only reflect on the contributors who took the time to fill in the survey.
- The report that was released on Friday is a preliminary version, as per the file's description. We are certainly open to suggestions on clarifying certain points made, if those do not appear to be clear.Kalliope (WMF) (talk) 16:45, 1 February 2016 (UTC)
- Thank you for your comments Thekohser. Let me see if I can address some of your concerns.
- @Kalliope (WMF) — So why is WMF spending money on non-scientific surveys? Why are we not doing scientific sampling to generate survey participants? Why are we not presenting response data in tabular format but are instead producing a "report" that appears to be little more than a set of PowerPoint slides? Why are we hiding the percentage of men and women who took the survey (pg. 12) and why are we not more thoroughly examining the ways in which harassment and the reaction to harassment differs according to gender? It seems to me that this product is little more than a propaganda document to support an ongoing political debate rather than a serious examination of a very real problem. WMF would be wise to take up Mr. Kohs's offer of assistance in survey design. It would also be good to get a couple people with degrees in statistics on board. (And, for the record, I'm also a victim of harassment related to my Wikipedia activities, via a non-WMF attack site. It does go with the territory, lamentably.) Carrite (talk) 17:32, 1 February 2016 (UTC)
- Kalliope, your comments about the people answering the survey not necessarily being representative of the userbase is true but trivial. Of course no survey is perfect. But this kind of survey is useful only insofar as the surveyed population is close to the actual userbase, so as to reasonably extrapolate from it. One can't have it both ways. Your last point has already been addressed by thekohser above in reply to another question. Some other comments:
- The Pew survey breaks down harassment by gender, and also divides it into "less severe" (name calling and embarrassment) and "more severe" (stalking, physical threats etc.) This kind of thing should be done because as the survey notes, men and women experience different types of harassment differently (men experience name calling, embarrassment and physical threats more, while women experience stalking and sexual harassment more). Also, the most popular (and most effective) response to harassment is ignoring (as both the Pew survey and this study note): less severe types are more easily and effectively ignored.
- Also, young people experience much more harassment. The data should also have been broken down by age. Kingsindian (talk) 17:39, 1 February 2016 (UTC)
- Astonishing! Upon what foundation can you state with any hint of credibilty that young people "experinence 'much' more harassment" ? Citation needed, dude. I am so strangling that I wish I had taken an extra 'nerve pill' today. Please come be a fly on my wall. and yes, I participated in the survey and yes, I sadly fall into many demographics other than youth... Fylbecatulous talk 15:19, 2 February 2016 (UTC)
- The Pew Research Center survey he linked to. It's in the summary of findings. MLauba (talk) 00:21, 3 February 2016 (UTC)
- Astonishing! Upon what foundation can you state with any hint of credibilty that young people "experinence 'much' more harassment" ? Citation needed, dude. I am so strangling that I wish I had taken an extra 'nerve pill' today. Please come be a fly on my wall. and yes, I participated in the survey and yes, I sadly fall into many demographics other than youth... Fylbecatulous talk 15:19, 2 February 2016 (UTC)
- Also, young people experience much more harassment. The data should also have been broken down by age. Kingsindian (talk) 17:39, 1 February 2016 (UTC)
- Thank you. Then decidedly another flaw in self-selection. This certainly does not pan out in reality. Fylbecatulous talk 14:53, 4 February 2016 (UTC)
- As a member of enWP arb com I have certainly witnessed examples of revenge porn, but I find it almost unbelievable that using even the broad WP definition that 61% (or 31% -- I am not sure which number is the relevant one) of participants have received it. I urge it to be reported to us if related to enWP, for it's the sort of thing I will support our taking whatever on-wiki action is possible, and refer to the Foundation. (There has been some dissatisfaction that we have not taken action when the identity of the harasser cannot be reliably determined, or where the person receiving it is quite sure but our very limited investigatory power cannot confirm it; nonetheless it should always be notified--either to us or directly to the Foundation. If it really is a problem to the extent specified, I think we very much need to consider our responses. The same goes for similarly extreme forms of harassment. (But though I share GK's concern, I think that most people receiving this sort of harassment would very much like the matter hidden from public view as soon as possible, and we serve them best by doing so.) (All this is my personal comment, not that of the committee). DGG (talk) 20:55, 3 February 2016 (UTC)
- That's funny. I'm sure you mean well, but everyone I know of whose concerns about harassment have come to the attention of the arbitration committee has either been indeffed or doxed or both. —Neotarf (talk) 00:30, 4 February 2016 (UTC)
- And as a current Arbitrator I can say with confidence that I've taken part in a case recently where that didn't happen. I don't think this is the place however to discuss the enWiki's ArbCom. I will say that I've certainly been harassed offwiki for my activities onwiki. Oh, and sex and my advanced age were both involved. Doug Weller (talk) 12:04, 4 February 2016 (UTC)
- Ha, I could have said those very words, Doug, except that my age is a bit less advanced than yours (haha!). But what Neotarf says sounds a lot more exciting and Tweetable. Drmies (talk) 18:20, 4 February 2016 (UTC)
- This must be the one that was closed by motion. I'll take a closer look at it. I do realize this arbcom is a new group.
- I do not consider that being harassed qualifies anyone for dealing with harassment as a policy matter--if anything, the opposite, if you consider that children who experience abuse often grow up to be abusers themselves and that the bystander effects of witnessing abuse are well documented. The age question as well may not be particularly significant, especially since the relative ages of various users is not generally known. Of course there can always be backchanneling--I understand that bits of personal information get placed on the arbcom mailing list, completely against policy, where it cannot be monitored, verified, or oversighted, and there may be other factors as well. Of course if you have a bunch of 60-something academics and GLAM types being harassed by a bunch of 10-year-old trolls in their mother's basements, the survey is not designed to catch that--it is only meant to measure the harassees, not the harassers. There were a number of anti-harassment proposals in the recent harassment consultation that mentioned arbcom, you may be interested in browsing the discussions. —Neotarf (talk) 02:36, 6 February 2016 (UTC)
- That's funny. I'm sure you mean well, but everyone I know of whose concerns about harassment have come to the attention of the arbitration committee has either been indeffed or doxed or both. —Neotarf (talk) 00:30, 4 February 2016 (UTC)
- As a member of enWP arb com I have certainly witnessed examples of revenge porn, but I find it almost unbelievable that using even the broad WP definition that 61% (or 31% -- I am not sure which number is the relevant one) of participants have received it. I urge it to be reported to us if related to enWP, for it's the sort of thing I will support our taking whatever on-wiki action is possible, and refer to the Foundation. (There has been some dissatisfaction that we have not taken action when the identity of the harasser cannot be reliably determined, or where the person receiving it is quite sure but our very limited investigatory power cannot confirm it; nonetheless it should always be notified--either to us or directly to the Foundation. If it really is a problem to the extent specified, I think we very much need to consider our responses. The same goes for similarly extreme forms of harassment. (But though I share GK's concern, I think that most people receiving this sort of harassment would very much like the matter hidden from public view as soon as possible, and we serve them best by doing so.) (All this is my personal comment, not that of the committee). DGG (talk) 20:55, 3 February 2016 (UTC)
- Thank you. Then decidedly another flaw in self-selection. This certainly does not pan out in reality. Fylbecatulous talk 14:53, 4 February 2016 (UTC)
┌─────────────────────────────────┘
Not sure what "I do not consider that being harassed qualifies anyone for dealing with harassment as a policy matter" means, where does policy come into it? Although I agree that an adult experience of harassment doesn't necessarily make anyone more qualified to deal with harassment. Experience of being harassed as an adult can't really be compared to being abused as a child. Doug Weller (talk) 16:33, 6 February 2016 (UTC)
- Maybe "policy" wasn't the best word choice, but I can't think of a better one. And not sure why two arbitrators would mention being harassed in this context if not to try to establish some kind of expertise, or at least some kind of rapport. I'm assuming the rumors are true and that all arbitrators automatically get harassed as a function of their high profile. Although I have had to deal with these issues in a professional capacity, I am far from an expert in this topic, but from what I do know about it, being harassed does not necessarily make someone more empathetic; it can have the opposite effect, to create the illusion that harassment is "business as usual", a de-sensitizing effect. But just to bring the 'policy" thing full circle, quite a few of the arbcom cases of the last year or two have had a harassment component, but the thinking of the committee has been rather opaque on this. Hence a number of meta discussions around the theme of WP governance being broken. —Neotarf (talk) 21:42, 8 February 2016 (UTC)
- The good news is the mystery of the obviously false "revenge porn" numbers has been solved. The bad news is that the cause is a software defect that renders all the data on page 17 worthless and which makes any conclusions based upon them impossible. I'll cross post the summary I presented on En-WP at Jimbotalk under a hat for those of you who have not been following the blow-by-blow of the analysis on Wikipediocracy. (this comment was posted by User:Carrite on 23:15, 3 February 2016 - signed below)
x-post from Jimbotalk... |
---|
The following discussion has been closed. Please do not modify it. |
Defective WMF harassment survey[edit]The mystery of the obviously incorrect "revenge porn" results of the WMF Harassment Survey has been solved on Wikipediocracy by Belgian poster Drijfzand... Basically, this survey of 3,845 Wikipedians across a range of WMF projects (45% of whom were from En-WP) generated 2,495 responses to a question asking whether they personally experienced harassment. Of these, 38% (about 948 people) said yes. (pg. 15). However, on page 17, in what is purported to be a breakdown of the forms of harassment experienced by these editors, an astounding 61% (about 578 people) are said to have claimed to be victims of "revenge porn." This, to anyone who ponders the number for more than 6 seconds, appears patently absurd — bearing in mind that the survey respondents were about 88% male and that the great majority of Wikipedians maintain some degree of anonymity. Drijfzand observed that the number of responses for doxxing, revenge porn, hacking, impersonation, and threats of violence all fell within a range of 5% of one another — which she or he argued "simply can't happen." I theorized that the problem was a software glitch and Drijfzand identified the problem as a set of defective sliders in the survey form which refused to accept a value of 0, a bug identified by Burninthruthesky on November 3 and which was apparently remedied on November 4. LINK. Unfortunately, the survey was not launched on En-WP until Day 5 (to allow more responses from smaller Wikis so as to reduce the weight of the large projects, see pg. 2), meaning that bad data was generated on some projects for nearly a week. Whereas the survey should have been aborted and restarted, it apparently was not, and so the data presented on page 17 (and any conclusions derived therefrom) is a case of Garbage-In-Garbage-Out. Once again: a failure to adequately beta-test software is evident. There is one saving grace, and that is we have a very good snapshot of the magnitude of the gender gap based on survey respondents (a ratio 88:12 for those who indicated a gender, with some 7 % of survey participants declining to respond). Assuming a heavier-than-average percentage of women than men in the "decline to respond group," this means we are probably in the ballpark of 86:14 or 85:15. There is also, for the first time ever as far as I am aware, a decent survey of age of Wikipedians. Your takeaway numbers: 35% of respondents (and presumably Wikipedians in general) are age 45 or over; only 24% are under the age of 25. All the fresh faces, many on travel grants, at Wikimania are deceiving — it appears that the median age of Wikipedians is right around 31 years old, give or take. So the expenditure on the harassment survey wasn't a total loss even if it failed at its intended mission (at least in part) due to bad software (leaving aside the very real question of sketchy survey design). Carrite (talk) 19:50, 3 February 2016 (UTC) (male, age 54) Last edited: Carrite (talk) 23:00, 3 February 2016 (UTC) |
Please beef up the beta-testing. Carrite (talk) 23:15, 3 February 2016 (UTC)
What was the wording of the invitation?
[edit]How exactly did the invitation text read? - Thekohser (talk) 19:45, 3 February 2016 (UTC)
- A kind and gentle reader at Wikipediocracy point me to this text: "We invite you to participate in a survey about online harassment on Wikimedia projects." I hope that the survey design team can understand how this immediately biases the results away from reality. People who are interested in commenting about online harassment on Wikimedia projects will self-select into the survey sample, while people who are not interested in commenting about online harassment on Wikimedia projects will self-select out of the survey sample. There is a very easy-to-understand Wikipedia article about self-selection bias, which I think the WMF research staff should read. See also, accidental sampling (which I've always called 'convenience sampling').
- Another question -- was this survey data analyzed and reported on by WMF staff only, or was an outside third-party vendor also involved? - Thekohser (talk) 13:18, 4 February 2016 (UTC)
Comments on design and presentation
[edit]There have been a number of questions asked about the design of this survey and the execution of this preliminary report. We’d like to offer some clarity on our approach and our plans for the final document.
First, as to the question of why we’re not doing scientific sampling to generate survey participants: when we design surveys at the Foundation, we tend to keep them as open and inclusive as possible. This approach often introduces some error when it comes to studying very specific issues related to small user population. Using a sampling approach that targets very narrow user groups would be better, but the high privacy standards that Wikimedia projects maintain tend to hinder such high-accuracy sampling, and subsequently leave us with a broad intake pool of respondents. We constantly try to get better at this while maintaining the inclusivity of our surveys, and any ideas for future survey are always welcome - this page is a good place to leave general feedback about how the Wikimedia Foundation conducts surveys.
In terms of presentation of data, the report has been published in both tabular and slides format. We invested some time in preparing a preliminary executive summary of what we thought were the first important highlights of the survey. A number of people have requested comparisons and summaries of data that were not included here. We are reviewing those requests to see what we can include in the further analysis we’ve already planned.
Understanding how community members perceive harassment gives greater insight when trying to address it. This insight can contribute in (though not necessarily determine) setting directions of action. In collecting data on an issue as divisive and sometimes poorly defined as harassment, we realize there will be difficulties. The data may have ambiguities resulting from how participants define the problem, or specific concepts within the problem. We will examine these areas for improvement in future data collection. We agree that further methods of analysis would be necessary for findings that do reflect on the community as a whole. This survey is a start in better understanding this issue, not the end point. Patrick Earley (WMF) (talk) 23:38, 3 February 2016 (UTC)
- Patrick, thank you for engaging the community in this way. I hope that this question doesn't ruffle feathers, but I'd like to know, do you have professional experience in survey design and administration; and did any of the key developers of this Harassment Survey have previous professional experience in survey design and administration? The main reason I am asking this is because I am scratching my head at how scientific sampling methods are in conflict with openness and inclusiveness. To me, that seems like saying we're not making lemonade with real sugar and real lemon juice because we tend to keep our beverages as cold and frosty as possible. The objective doesn't seem to rationally follow the process being chosen. Random probability sampling does not mean that you target "narrow user groups". I just don't see where your explanation for why the Foundation does non-scientific survey sampling moves this conversation forward in a productive way. I am happy to be shown why I'm wrong, though. - Thekohser (talk) 23:53, 3 February 2016 (UTC)
- I find it ironic that the two individuals who have expressed the most concern about the survey here are also both virulent opponents of any attempts to stop harassment--Mr. Carrite routinely dismisses people who look for institutional solutions to harassment as "safe spacers" and Mr. Kohs has characterized the reporting of harassment as "whining". If the discussion is to be dominated by the "hasten the day" crowd, I doubt that any convincing analysis of the report will be able to emerge from the political posturing. —Neotarf (talk) 00:51, 4 February 2016 (UTC)
- Neotarf is correct that I initially used the verb "whine" to describe some of the participants in the survey; however, Neotarf fails to note that upon further reflection I overtly and publicly apologized for using such a term. Of course, Neotarf's hurtful characterization of me as a "virulent opponent of any attempts to stop harassment" is baseless and mean-spirited. Further, I reject Neotarf's characterization of Carrite as a "hasten the day" crowd member. Among all the people I know, Carrite is one of the most avid supporters of the enduring longevity of Wikipedia. I'm personally interested in this area of endeavor because I've worked in the field of survey research for 24 years now, and I have been a victim of repeated harassment on Wikimedia projects (some of it deserved, much of it not). That doesn't amount to "political posturing", Neotarf. You are an out-of-line troll here. - Thekohser (talk) 01:46, 4 February 2016 (UTC)
- I've consistently been a member of the "reform, not revolution" caucus at Wikipediocracy, as nearly 65,000 edits and some 277 article starts will attest. It is also a little-known fact that my friend Greg Kohs, a dreaded banned editor at EN (as are you, Neotarf), is also a Wikipedian of sorts — although it will pain him to hear me say that. I'd certainly include him in the big tent. Also, to be precise, my term is "friendly spacers," not "safe spacers." Carrite (talk) 16:14, 4 February 2016 (UTC)
- Why Greg, when did you stop being the owner of record for the Blog Which Must Not Be Named? You know, the one that has a reputation for doxing and for making false accusations against people where they can not defend themselves? And is the longest discussion on that site still the one in the members-only section about Gamergate? And now you are positioning yourself as an expert statistician with 24 years experience? Based on what? That little survey about women's BLPs you ran in your blog? It was an interesting exercise to be sure, but remind us again about the sample size? The statistical significance? Sorry, but I am not particularly convinced of your bonafides to critique this survey. And why is Carrite trotting out his edit history numbers? Is that meant to express some sort of entitlement for disrupting the workflow of someone else's project? "Trolling" I believe he called it--if I have left out an adjective somewhere I'm sure someone will let me know.
- And finally, it's "out of line", not "out-of-line". See Webster's Third. Also n-gram viewer, which can now process queries with hyphens. And no, I'm not. There are valid questions about this survey. Your entrance here makes it less likely they will be given a measured answer. —Neotarf (talk) 01:57, 6 February 2016 (UTC)
- Let's all thank Neotarf here, for reminding us why so many editors of Wikimedia projects end up wanting to bash in someone's face, in response to so much unwarranted venom being spewed about, like a broken lawn sprinkler. Since Neotarf doesn't seem to understand the difference between a paid exercise that one does for one's career and an unpaid exercise that one does for leisure, there's probably no point in trying to explain why a professionally-prepared report isn't the same as a blog post about women's BLPs, nor would it be worthwhile to explain that not all market research practitioners are statisticians. No, there's so much out of line with Neotarf's hateful screed, it would just be a waste of time trying to defend against the silly accusations made therein. I'll bet Neotarf gets pissed off at Kyle Busch if he isn't the fastest driver on the road at all times, even if he's just dropping his son off at daycare. - Thekohser (talk) 03:30, 6 February 2016 (UTC)
- You want to bash in my face? —Neotarf (talk) 04:02, 6 February 2016 (UTC)
- I said "someone". You're no one. - Thekohser (talk) 04:39, 6 February 2016 (UTC)
- Whose face do you want to bash in? —Neotarf (talk) 20:22, 8 February 2016 (UTC)
- I don't want to bash in anyone's face, and I never said I did. Your question is ridiculous, similar to "Have you stopped beating your wife?" - Thekohser (talk) 20:11, 9 February 2016 (UTC)
- can be skipped, cause given below, green highlighted
- edit: explanatory note: the post below assumed that between Nov 2 and Nov 4 people had to select all types before they could proceed, resulting in inflated figures. This turned out not to be the case (see next two posts, of Feb 5), users had to enter a value for each type but could choose 0 (zero). The real error was made by the people analyzing the data, who counted zero values as cases of harassment. So instead of a technical error, it was human error in interpreting the data. Since both errors produce the same end result, the analysis is valid for both cases.
- The first question should be: does the data correspond to what the participants wanted or tried to report.
- The discrepancy between observed and reported harassment, the contrast between the selected types versus the "others" (3% misogyny, 34% abuse of admin tools), the striking difference in variance between this one question and all the others in the report, the unlikely distribution of cases: more than 60% report severe harassment (hacking, revenge porn, threats of violence...); 30% are unsure whether they were harassed; 90% of harassment complaints in WP:ANI don't belong to either of those. Are we to assume that people most likely to complain about harassment on WP are least likely to answer a survey about harassment?
- We know that between 2 and 4 november, people choosing "yes" or "unsure" could not complete the survey unless they selected all ten types of harassment. Correcting the data for X such cases, we get results that better match the reported (often or occasionally) observed harassment (fig 35 of the report). For example, with X=729, the figures for content vandalism, trolling, name calling, discrimination, stalking, outing, threats of violence, impersonation, hacking and revenge porn become:
corrected data 63 58 48 35 35 18 13 13 8 3 observed 67 61 57 42 34 20 15 24 6 3
(I'm not saying that these are the correct values, reported and observed figures don't have to be the same, but one would at least expect them to be reasonably similar.
- The main issue here is not the prevalence of harassment on WP. People called the results "deeply disturbing", and indeed it is deeply disturbing: that the report was published, that nobody questioned the data or saw the connection with the technical problems encountered.
- That many people may not have been aware of the technical problems is no excuse, the data was obviously flawed. One could assume that the severe forms of harassment didn't come from the 30% "unsure" reports, meaning that 85% of the "yes" group reported doxxing, 83% threats of violence, 80% hacking, 83% impersonation, 78% revenge porn... I've seen arguments like "people are more likely to participate when they encountered severe forms of harassment", but it wouldn't explain why all those who complain in ANI would not participate in a survey.
- People have been focusing on "revenge porn", trying to find plausible explanations for the high numbers, like using a broad definition (photoshopped pics). They ignore the hacking claims, which are more unlikely. If you don't like an editor, most forms of harassment are easy to apply, but hacking isn't one of them. You would need some very unlikely assumptions about why people are targeted, the likelihood of participating in the survey or correlation between unsafe internet behavior and being harassed to explain the results.
- I considered a whole range of scenarios to explain the results (off-wiki canvassing, etc..), but rejected them as too unlikely. Yet all still more plausible than the results being real. Prevalence (talk) 17:10, 4 February 2016 (UTC)
- Could someone from the WMF tell us how exactly they count the total number of each type of harassment? The raw data shows a column "responses", and based on the qualtrics website, a question counts as answered once the user has moved the slider slightly. So unless I'm missing something,you're counting every answer given, even if a user entered a value of zero. Prevalence (talk) 02:14, 5 February 2016 (UTC)
- This seems to make more sense, it's still (mainly) the answers submitted between 2 and 4 nov that produce the wrong results, but the users didn't have to enter wrong data. Counting the ones with values >=1 should give the correct figures, not sure what to do with values between zero and one (I see that min and max are given as 0.00 and 100.00, so I assume that the entered values have two decimals). Prevalence (talk) 06:50, 5 February 2016 (UTC)
Mathematical impossibility
[edit]The raw data lists nr of responses, average and standard deviation. The report claims these represent the persons who experienced the given type of harassment, in other words the values (=number of times experienced) are equal to or greater than 1.
For given minimum a, maximum b, and average m, the variance is bounded by (m-a)*(b-m); the standard deviation is the square root of the variance, so the maximum possible sdev=sqrt((m-a)*(b-m)).
- revenge porn: average=2.09, standard deviation=12.73
- Maximum sdev when bounded by 1, 100 is sqrt((2.09-1)*(100-2.09))= 10.33
- when bounded by 0, 100 is sqrt((2.09-0)*(100-2.09))= 14.30
- hacking: average=2.69, standard variation=13.67
- Maximum sdev when bounded by 1, 100 is sqrt((2.69-1)*(100-2.69))= 12.82
- when bounded by 0, 100 is sqrt((2.69-0)*(100-2.69))= 16.18
In both cases the reported sdev is larger than the maximum sdev for range 1 to 100. QED. Prevalence (talk) 11:41, 5 February 2016 (UTC)
What is salvageable from this mess?
[edit]As I noted above, this survey actually provides good measures of things it was not actually trying to study — the magnitude of the gender gap and the age demographics of Wikipedians. I believe the 38% number for those who report harassment is correct (bearing in mind Mr. Kohs's on-point observation about selection bias inflating the number). It is worthy of note that administrative tool abuse, threats of sanctions, etc. is considered "harassment" under the parameters of this survey, as is edit-warring with other editors and snarky posts on talk pages. If the actual point of the exercise is to study other forms of harassment, there should be a better way to differentiate the felonies from the misdemeanors, so to speak — so this survey has value in terms of learning what not to do next time. But, all in all, this is a complete clusterhug due to a failure to beta-test adequately. Carrite (talk) 16:06, 5 February 2016 (UTC)
- So selection bias does apply to reporting harassment, but does not apply to gender and age demographics. Gotcha. —Neotarf (talk) 02:03, 6 February 2016 (UTC)
- Correct. That is why these are excellent numbers. Gender gap sits around 88:12, perhaps as narrow as 85:15 if one assumes a strong female preponderance in the "declines to answer" group (7% of total). And we have a really good set of age numbers. That stuff is valuable. Carrite (talk) 06:13, 6 February 2016 (UTC)
- To clarify, if this was promoted as a "gender study" or an "age study" in the same way it was promoted as a "harassment study," then we would expect selection bias to corrupt the results for those topics. But since it was not marketed as either of those things, it seems logical that there was no pronounced selection bias with respect to age or gender and that it is in essence a random survey. Carrite (talk) 06:48, 8 February 2016 (UTC)
- I'm looking at the 2014 Pew online harassment survey, "Key findings...Age and gender are most closely associated with the experience of online harassment." If we can assume that individuals who experience harassment are more likely to complete a harassment survey, I think we can assume these demographic groups will also be more likely to compete the survey, therefore selection bias.
- I am wondering if the survey was able to capture data about the age(of the account)/cohort of the users who completed surveys, if they were all relatively new users, etc. This might not be able to answer the questions about the possibility of off-site coordination, but it might be an interesting data point. —Neotarf (talk) 20:42, 8 February 2016 (UTC)
- On what can be salvaged: I would hope the data on how many people reported each type of harassment would be recoverable. As I understood it, the online tools had some limitations, but all the survey data could be downloaded, a table with every answer by every participant.
- Judging by the raw data available, the specific values entered may not be so useful. Large standard deviations, as the ones mentioned in previous section, indicate that most values are close to the bounds; for example: 720 answers 0, 20 answers 80 would fit the revenge porn data. The average values for the "rare" types also seem too high. Hacking: 2.69 may seem low, but it presents a total of 2040 hacking incidents; doxxing: 4000+. The average for name calling is only 3 times that of doxxing.
- My guess: the instruction "Drag the slider to select the number of times you experienced each type of harassment" was largely ignored for most types. I'm speculating, I haven't seen the actual survey, (users who did participate and who answered this question could shed more light on it), but suppose someone is reporting 1 case of revenge porn (or doxxing or hacking) and numerous cases of name calling, trolling, vandalism. With a 0 to 100 range, a value of 1 would correspond to moving the slider 1%, hardly noticable. Perhaps users indicated the importance of the different types of harassment encountered rather than the number of cases? And if the sliders also allowed decimal values, which the min and max values of 0.00 and 100.00 seem to indicate, selecting 1.00 would be near impossible...
- In future surveys, it may be better to split up the question in several parts: one question about the types of harassment experienced, with checkboxes for selection, and for each type selected a separate question to indicate the frequency. Simply using input fields instead of sliders would also have avoided most or all of the problems. Prevalence (talk) 14:44, 6 February 2016 (UTC)
- No response from Support and Safety since the error has been identified. If they stand by the results, all they have to do to silence the critics is publish the distribution of the responses received: the number of people who reported experiencing hacking 1 time, 2 times etc.. Considering that such a distribution having the average and standard deviation listed in the raw data would disprove the Bhatia–Davis inequality, I'm not holding my breath. Prevalence (talk) 01:06, 9 February 2016 (UTC)
- This is the most sensitive topic on which the WMF ever attempted to acquire data. Before publishing results on topics like this, it's advisable to look at the results. People having any concern at all with harassment can be expected to have some awareness of the usual problems, and the rarer very serious problems. Those of us who do deal with those serious problems should have some understanding of the likely or at least the expected nature and prevalence. Anyone should know that when the results defy common sense, or even are very different from the expected, they need to be checked. (Among the ways is looking at the statistical probability and checking the programming for errors, as reported above) But even with a total lack of knowledge of either discipline or of any aspect of survey design and analysis, any specific management or technical knowledge of any sort, the impossibility is immediately obvious. The only inabilities that would explain it are the inability to read, or to think, or the lack of exercise of those faculties.
- Did nobody at the Foundation look at the results before publishing them? I think we really need a response from everybody who should have been involved. DGG (talk) 06:16, 10 February 2016 (UTC)
- Hi @DGG:, your comment is going too far, and is an unfair to the people at WMF who worked hard to start gathering information about the nature of harassment across the movement. No one is suggesting the survey represents an accurate representation of each type of harassment. But doing it this way made it possible to collect information from the targeted group of people that need to be reached to understand more about the people who are disturbed by harassment. Not reaching those people by doing a different type of survey would not well serve the foundational work that WMF is attempting to do.
- They didn't sit on the results but released the results in a timely way so the community can begin digesting it and forming next steps. --Sydney Poore/FloNight (talk) 20:57, 10 February 2016 (UTC)
- @FloNight: At issue is whether the percentages listed in the report (p.17) are correct, i.e. correspond to the answers given. The evidence suggests that this is not the case. Prevalence (talk) 10:35, 11 February 2016 (UTC)
- @FloNight: results at least an order of magnitude off are not worth releasing. If this were raw data, being published before analysis, so people in the community could work on it, your comment would be correct. But this was an analyzed report, reduced to powerpoint, and therefore meant for presentation. The question at the moment isn't whether it was a perfect survey--I agree that others approaches should be used also, for this type of survey has an inevitable selection bias. The issue is that the programming was so erroneous as to let the most critical values be 10 times the actual rate (based on the reconstruction above). It makes a great deal of difference knowing what we have to deal with revenge porn & hacking being so 3widespread that 2/3 of the users reported it, or whether it is the much rarer. That a few % of users have these very disturbing problems indicates that we have a, sufficiently real problem that we must find better ways to deal with it, but not so enormous we must give up hope of coping. Using the terms of whether it was "accurate" implies the ordinary need for corrections--but this was not just inaccurate -- it was off-the-wall absurd. You suggest we "digest it". The proper next step with such garbage is whatever you do , do not digest it.. DGG (talk) 23:42, 12 February 2016 (UTC)
- DGG, FloNight is correct about our approach here. We did release the raw data with the report, and asked for feedback on our presentation - what could be improved, and any observed problems. The (former) slide 17 was indeed problematic, and we corrected the issue for the revised version. Patrick Earley (WMF) (talk) 02:10, 13 February 2016 (UTC)
- @FloNight: results at least an order of magnitude off are not worth releasing. If this were raw data, being published before analysis, so people in the community could work on it, your comment would be correct. But this was an analyzed report, reduced to powerpoint, and therefore meant for presentation. The question at the moment isn't whether it was a perfect survey--I agree that others approaches should be used also, for this type of survey has an inevitable selection bias. The issue is that the programming was so erroneous as to let the most critical values be 10 times the actual rate (based on the reconstruction above). It makes a great deal of difference knowing what we have to deal with revenge porn & hacking being so 3widespread that 2/3 of the users reported it, or whether it is the much rarer. That a few % of users have these very disturbing problems indicates that we have a, sufficiently real problem that we must find better ways to deal with it, but not so enormous we must give up hope of coping. Using the terms of whether it was "accurate" implies the ordinary need for corrections--but this was not just inaccurate -- it was off-the-wall absurd. You suggest we "digest it". The proper next step with such garbage is whatever you do , do not digest it.. DGG (talk) 23:42, 12 February 2016 (UTC)
- @FloNight: At issue is whether the percentages listed in the report (p.17) are correct, i.e. correspond to the answers given. The evidence suggests that this is not the case. Prevalence (talk) 10:35, 11 February 2016 (UTC)
- @FloNight: — Re: "No one is suggesting the survey represents an accurate representation of each type of harassment." — Of course they are! It's a published survey! These are the conclusions! Page 17 lists up some forms of harassment and in very large, cheesy graphical form shows the percentage of the 38% of the respondents who are said to have experienced each of these forms of harassment. There is no retraction by the survey authors or by WMF. These are the conclusions being drawn from the (faulty) data set. Of course to some, this is all about advancing an agenda rather than examining a problem. Various forms of data inflation, including selection bias, including incorporating some of the common inconveniences of collective editing under the rubric of "harassment" to inflate the rate of incidence, etc. serve these political purposes perfectly. The fact that pg. 17 is gibberish is irrelevant to them, perhaps, as it is illustrative of their axiom that there is a big, big, big problem out here and that money needs to be spent right now, dammit. Well, yeah, there is a problem, but the process of politicization and the sloppy approach to survey science doesn't expedite its examination and elimination in the least. This survey was flubbed. Learn from the mistakes, design another, and run it. WMF has time, personnel, and money to do the job right — there is no excuse for doing it wrong or producing an erroneous propaganda document for a political campaign. Carrite (talk) 16:21, 11 February 2016 (UTC)
- Carrite, I want to emphatically state that at no point in constructing this survey, or in its analysis, did we discuss or even consider how to make it exaggerate the scope of the problem. That is a pure bad faith assumption. Patrick Earley (WMF) (talk) 02:09, 13 February 2016 (UTC)
- @FloNight: — Re: "No one is suggesting the survey represents an accurate representation of each type of harassment." — Of course they are! It's a published survey! These are the conclusions! Page 17 lists up some forms of harassment and in very large, cheesy graphical form shows the percentage of the 38% of the respondents who are said to have experienced each of these forms of harassment. There is no retraction by the survey authors or by WMF. These are the conclusions being drawn from the (faulty) data set. Of course to some, this is all about advancing an agenda rather than examining a problem. Various forms of data inflation, including selection bias, including incorporating some of the common inconveniences of collective editing under the rubric of "harassment" to inflate the rate of incidence, etc. serve these political purposes perfectly. The fact that pg. 17 is gibberish is irrelevant to them, perhaps, as it is illustrative of their axiom that there is a big, big, big problem out here and that money needs to be spent right now, dammit. Well, yeah, there is a problem, but the process of politicization and the sloppy approach to survey science doesn't expedite its examination and elimination in the least. This survey was flubbed. Learn from the mistakes, design another, and run it. WMF has time, personnel, and money to do the job right — there is no excuse for doing it wrong or producing an erroneous propaganda document for a political campaign. Carrite (talk) 16:21, 11 February 2016 (UTC)
One of the things I have learned on WP is the immense benefit to a situation--and to oneself-- in saying outright & unequivocally that one has made an error. To release an incomplete analysis that is badly needed can be helpful; to release one that is absurd in key derived numbers is an error that can even have the effect of destroying the credibility of those trying to solve the problem. To try to call error incompleteness might not seem to show a forthright approach to the problem or the audience, but I understand the need to be tactful to one's colleagues. DGG (talk) 16:44, 13 February 2016 (UTC)
- There have been some questions here about the second version of our report, including why we made some of the changes we did. I’d like to address some of them.
- The issue with the content of slide 17 was that I (the author of the report) wanted to address a specific question through that slide (what types of harassment do Wikimedians experience?), but I made an error in my choice of data set to do it through. It’s not that the data used originally for that slide was inaccurate; rather it was the context in which it was presented that was not the right one. This naturally caused confusion and for that I must offer my apologies. I also need to mention here that the data was not just a single number corresponding to each type of harassment - analysis was a lot more complex, especially when gender and other filters were applied. So, it’s more accurate to say that we’re talking about a set of data.
- Even though the report was reviewed extensively prior to its release, this was missed, and I’m grateful that it was picked up during the preliminary release of the report. Once this was brought up, I went back and reviewed data with the help of colleagues with more expertise in data analysis than myself which led to the updated slide which more accurately answers the question posed. The reason that data set used on slides 17 & 18 are different is because that’s the data that provides the correct context for answering the question. I should point out that the entire slide was reviewed and corrected, not just the graph. Kalliope (WMF) (talk) 19:16, 17 February 2016 (UTC)
Preliminary report
[edit]Hello, all. As this page itself notes, the report as released late last month was preliminary. We knew that work was not complete on it, and rather than delay release we chose instead to release it as it was along with the raw data so that everyone could see what we were working on and with while it was further developed.
We anticipated feedback, and feedback is welcome - but I've received multiple notes from community members pointing out the deteriorating tone of this page as an example of the aggressive environment that makes volunteering here (in this movement, or in this area) so challenging. Questions and criticism are healthy, but in making them please try to be respectful. There's no scheming corporate machine at work here trying to politicize and monetize the movement. There's a small group of people who deal with harassment complaints almost every day who want to try to do something about it - a goal that many, many have supported.
In terms of some of the questions posed on this talk page, yes, we consulted a survey specialist. The Wikimedia Foundation employs one. We also consulted community and outside experts at the Berkman Center with whom Patrick has been collaborating on harassment and misogyny issues since June.
We have, as we said we would, continued working on this report with the feedback of many people, left here and sent privately. The updated report will be released soon. --Maggie Dennis (WMF) (talk) 15:24, 12 February 2016 (UTC)
Updated report
[edit]In response to community feedback and requests, as well as our own ongoing analysis and refinement, we have uploaded a revised and updated version of the survey report.
New analyses
[edit]We have improved how the data regarding gender and harassment are presented. This can be seen on Slide 19.
Revised visualizations
[edit]Several people wrote that they found (former) Slide 17 (“Forms of Harassment”) as problematic or confusing. While response data obtained from each question can be presented in various ways, we agree that the data and graphs weren't as informative as they should have been. They’ve been revised. You can see the new presentation on Slide 18. We thank those who have engaged with the data and helped with improvements. Patrick Earley (WMF) (talk) 02:01, 13 February 2016 (UTC)
- Instead of correcting the percentages, and thereby admitting that mistakes were made, they are simply removed from the report.
- Edit: removed some of my earlier comments.
- If you ignore what the data represents (averages) and view it as number of respondents for each type, set 100% at 35? or thereabouts and you get a rough idea of the actual values (within a +-20% fault range). Trolling, vandalism, discrimination are too high, doxxing and impersonation too low, hacking and revenge porn could be either (but revenge porn the higher of the two). (Based on the assumption that 715 people answered all questions (zero's included), 500 skipped N/A questions, but most should apply regardless.) The decent thing would still be admitting the mistake and giving the correct data. Would tell me the mistakes I made as well.
- The result seems to be a case of errors, bad design and changed settings during the survey coming together in a happy coincidence. The high number of zero values makes the average fairly proportional to the number of people who gave non-zero values, assuming the non-zero values have the same average, like: 100*0, 1*100 and 100*0, 10*100: averages: 100/101 and 1000/110: ratio=0.1089, actual ratio=0.1
- Wouldn't work if people gave correct answers, because they would experience more incidents of vandalism and trolling than of hacking, doxxing, and revenge porn, but enough people gave unlikely high values for those (hacking: 14 people together more than 1000 hacking incidents.). Sometimes you're just lucky I guess... Prevalence (talk) 21:21, 15 February 2016 (UTC)
- @Prevalence:. I want to thank you again for engaging with this data and presentation. Your feedback has been helpful in identifying a problem area.
- With respect to why the content was removed, Kalli has posted a further explanation above.
- The original version is not deleted, it is available the history and can be accessed through the "file history" section of the file page. We uploaded in the same file name because we wanted to avoid having people linking to the wrong/outdated document. The problematic slide was replaced with a better representation. That was always the intention, which is why the summary indicated the original file would be updated within the week. (Although we didn’t quite make it “within the week” - almost two.) Best, Patrick Earley (WMF) (talk) 19:34, 17 February 2016 (UTC)
Page 17 is still quite confusing. What exactly is meant by "average of 23.90 times?" 23.90 times in what period? Were the people surveyed asked to keep a diary or record of incidents? From what I can see this refers to question 6 here. It does not say what period is being considered. How can one compare the numbers of two persons, one who is on Wikipedia for 10 months vs one who has been here for 10 years? The large standard deviations in the data pointed out by Prevalence seem to suggest that many people treated it like a binary variable, rather than a sliding scale. How would people be expected to remember 20 cases of name calling anyway?
The Pew survey simply asked a binary question: "have you experienced harassment of type X", together with follow up questions about the most recent harassment episode, which makes much more sense to me. The graph on page 19 is weird. For instance, in the gender breakdown of the Pew survey, the binary variable gives men experiencing much more name calling than women, but here the corresponding variable, counting number of occurrences, it is reversed. Kingsindian (talk) 13:42, 18 February 2016 (UTC)
- Hello Kingsindian, the data set used for slide 17 & 18 was gathered through responses to question 6 of the survey, as this is listed here and here indeed.
- The questions of the survey are not referring to a specific timeframe. We consciously designed it this way for several reasons.
- First, harassment often lasts for longer than a year, which would be a typical timeframe to use here. Placing a limited timeframe would exclude responses from respondents whose experience of harassment lasted for more than a year.
- Second, there have been no major policy changes over the past several years that I am aware of in the Wikimedia projects, which would be a natural cut-off point for defining a timeframe. So, we decided to allow respondents to tell us about all the harassment experiences they have had in or because of their participation in our projects, ever. I need to stress that one of the main focuses of the survey is the respondents perception of their experiences (or not) of harassment. For this reason, we felt that a timeframe would be limiting.
- Third, this is the first time a survey of this type and magnitude has been conducted by the Foundation through all projects. Because it is the first of its kind there is no established point of time-reference. On a side note, ideally we'd want to run this survey again in, say, 1-2 years time from now and compare the results. When/if that happens, I completely agree with you that the question(s) should be posed for a specific timeframe - the amount of time elapsed from this to the next survey. Even more so if the Harassment project which includes this survey as well as the Community consultations, led by my colleague Patrick Earley, result in major changes in policy, or the ways harassment is addressed in the projects. Then, we'd certainly have something to compare the results against and the use of timeframe would be crucial.
- We did not expect people to list the totals of their experiences with 100% accuracy. As you point out, people won't necessarily keep track or each incident, and we did not ask them to keep a diary. Though we felt it was reasonable to expect that one would know if the harassment incidents they have been subjected to falls under a single digit number, under the 'dozens' scale or is nearer the 'hundred' scale. What we were hoping for here though was a median number for each type of harassment that can be considered as a rough scale / starting point. This is what the averages listed correspond to. Perhaps ranges or numbers would have worked better for this question. This, as well as further review of other surveys, is something we can definitely explore if this survey is run again in the future. Kalliope (WMF) (talk) 15:59, 18 February 2016 (UTC)
- @Kalliope (WMF): Thanks for the response. I do not have any experience in survey design, but as a layperson who has read some of the literature about surveys measuring poverty and unemployment, the recall period makes a lot of difference to the statistics. The Bureau of Labor Statistics in the US, for instance, chooses its recall period very carefully. In the case you mention about harassment lasting longer than a year (continuous I presume), it will still capture the incidents in a single year, which will provide a uniform foundation for comparison. Simply not having a recall period at all simply ignores the problem without eliminating it. Another problem I see with this kind of question is that if your expectation was that the numbers would only be rough, then probably one should not aggregate the numbers like this. Is it really meaningful to think that there is a difference between 26.48 and 23.90 or 17.02? One should break the data into bands, for instance, less than 5, less than 10, more than 10 and simply report the frequency. It would also be good to see the underlying data, about the distribution of responses. If, as Prevalence suggests, it is close to binary, then one can simply dispense with the number of occurrences. It is simply an artifact of the question. Kingsindian (talk) 19:25, 18 February 2016 (UTC)
- Without knowing the settings for the sliders, it's hard to say what the most likely cause is. Was "show values" on or off, were 0 or 2 decimals specified? The raw data gives min and max as 0.00 and 100.00, which suggests that 2 decimals were specified, but I would expect feedback on this page if it was so, unless the users didn't know because "show value" was off. But if users couldn't see what value they entered I would have expected feedback about that too.
- Could be trolls (someone entered "fuck you" as language, so...).
- Could be they entered importance rather than number of times, because of the visual effect (a value of 1 for hacking or revenge porn would seem insignificant compared to vandalism and name calling on 20 or so).
- Could be difficulty in setting the sliders, people on mobiles for example. And even on a normal screen, with a range of 100 it could be difficult to set the exact value. My mouse moves 2.5cm for the full width of the screen, and I use half the screen for my browser. The slider would be less than that, so moving my mouse less than 1/10 of a millimeter would change the value.
- With all types on the same page, entry fields would be better, but checking the options qualtrics offers, specific number fields seem to be missing, text fields would require more work (validation formula, data downloading for analysis). A drop down list with a choice of ranges perhaps (1, less than 5, less than 10, 25, 50, 100, more than 100).. Prevalence (talk) 09:49, 19 February 2016 (UTC)
- @Kalliope (WMF): Thanks for the response. I do not have any experience in survey design, but as a layperson who has read some of the literature about surveys measuring poverty and unemployment, the recall period makes a lot of difference to the statistics. The Bureau of Labor Statistics in the US, for instance, chooses its recall period very carefully. In the case you mention about harassment lasting longer than a year (continuous I presume), it will still capture the incidents in a single year, which will provide a uniform foundation for comparison. Simply not having a recall period at all simply ignores the problem without eliminating it. Another problem I see with this kind of question is that if your expectation was that the numbers would only be rough, then probably one should not aggregate the numbers like this. Is it really meaningful to think that there is a difference between 26.48 and 23.90 or 17.02? One should break the data into bands, for instance, less than 5, less than 10, more than 10 and simply report the frequency. It would also be good to see the underlying data, about the distribution of responses. If, as Prevalence suggests, it is close to binary, then one can simply dispense with the number of occurrences. It is simply an artifact of the question. Kingsindian (talk) 19:25, 18 February 2016 (UTC)
The significance of the survey
[edit]So what does it all mean?
- The WMF discovered harassment on Wikipedia. Let's hope there will be no need for further surveys to discover the color of the sky or the religious affiliation of the pope.
- 3,845 Wikimedians participated in the survey. Of that number, 38% said they had been harassed. Of those who said they were harassed, 54% said they decreased their participation in the project as a result. So, does harassment drive people away? Looks like it.
- The Foundation delivered the results of the survey on the date promised, along with the data dump. Even though there were obvious initial questions about the results, it was clear from the "preliminary" label on the survey that they understood what they had, and there was no attempt to hide or whitewash anything. Compare this to the 2012 editor survey. It seems odd to thank someone for performing their job in a professional, competent, and transparent manner, but thank you, WMF.
- The Foundation has taken a hit lately in its relationship with the volunteer community, and not just with implementation of software products. There has been much skepticism surrounding the WMF commitment to the principles of privacy and non-discrimination and whether these sentiments are mere lip service or whether the Foundation intends to actually do anything about them. Much has yet to be seen, but this is a step in the right direction towards regaining the trust of the community, and in being willing to partner with the volunteer community in working towards mutual goals.
- Wikimedians will no longer speak out publicly about harassment. They will only do so in the context of an anonymous survey.
- The usual suspects (WP:BADSITES) showed up on this thread, having organized off-wiki, and predictably, once again made a collegial discussion impossible.
- The arbitration committee has frequently been in collusion with WP:BADSITES in enabling harassment. I can't really get a take on the individual arbitrators who commented here, whether they believe the committee to be qualified to recognize and deal with harassment and discrimination, or whether they wish the Foundation to take a larger role. I looked at the case they cited, and didn't recognize any discrimination/harassment issues, although it did used to take me 12-15 hours to go through a case properly--time I don't have right now.
So what next? Does the survey go in a drawer, to be forgotten while the Safety team reorganizes itself, or will there be next steps? —Neotarf (talk) 20:23, 16 February 2016 (UTC)
- This is Meta, not the English Wikipedia, where "BADSITES" was a conclusively failed policy proposal. You were banned from that project for adopting a consistently hostile attitude to other contributors, so are hardly the person to be pontificating about collegial discussions. — Scott • talk 13:49, 17 February 2016 (UTC)
- Funny you should mention GGTF, Scott, because the publicity from that arbitration case was a huge embarrassment to both the Foundation and the arbitration committee. It's just possible that the new arbcom doesn't agree that anyone who has the common human decency to object to sexual harassment should just "keep a low profile" or be shown the door.
- This might be a good place to repost a link to the Q&A for Professor Citron's harassment speech. —Neotarf (talk) 22:57, 17 February 2016 (UTC)
New Article summarizing some of the problems this survey had
[edit]Please discuss. Pinguinn (talk) 17:03, 31 May 2016 (UTC)
new section
[edit]This survey or any future one will be forever meaningless unless those who have been driven from Wikipedia by harassment are sought out for their input.....i.e. those blocked on whatever arbitrary grounds fielded by admins - typically NPA and AGF from the start of their witchhunts). WP:EXR is one place to start to find them...and List of Missing Wikipedians....and of course those blocked by the confabulation or conduct "policies" which were all written in defiance of IAR (Ignore All Rules). In my experience, admins are the BIGGEST violators of AGF and NPA that there are.....the Voice of the Blocked must be heard....but it won't be. 69.67.176.34 18:10, 1 June 2016 (UTC)
Harassment based on political ideas
[edit]I was surprised, reading the report, to see that 25% of those who self-reported having been harassed credited said harassment to "political ideas". I imagine it can be hard to parse out the cause of harassment sometimes ("harasser's insecurities" was not presented as an option), but still, the incidence of "political ideas" as the cause far exceeds that of age/ethnicity/gender. I'd be curious to know more about this subset of reporters and what exactly they were harassed about, whether the simpler idea of partisan divisions or something more complex like hounding after trying to address a source bias. The extra information would be useful, as multiple approaches will likely be necessary to address differences in individual causes. (not watching, please {{ping}} if needed) czar 23:29, 13 September 2017 (UTC)