Community Safety/Methodology

Key Points

The survey is currently in the pilot stage (2022).

This means that the methodology is being updated based on what we find out from our first data collections. This is necessary because it is the first time we have conducted such a survey about safety on so many Wikimedia projects. We realize that this is a difficult question to be “testing” but we must start somewhere. We value the feedback from participating spaces so we can work to make the survey better and more useful over time. We especially thank Catalan Wikipedia users for being the first test, and dealing with any mistakes we made.

This survey is based on simple random sampling by the QuickSurveys tool.

This means that the survey is shown to a percent of users when they log onto a specific site (for example, fa.wikipedia.org). Every user who has at least 5 edits and is logged in is eligible to be selected randomly by the tool when the survey is active.

Since there is no way to ask every single user this question, random sampling is used to ensure that:
- The responses are not biased so everyone who logs on to the site has an equal chance of participating by being randomly selected
- This avoids, for example, bias by users who are most involved in communications channels
- The responses cannot be biased by users who have an interest in the results being one way or another
- That we do not make contributors tired by repeatedly asking them the same question

Each survey will need to be active until it gains enough responses.

For most large Wikimedia projects, the amount of responses needed to be able to make a statistically valid assessment with 99% confidence about the results is between 260 and 274. For 95% confidence in the results, we need about 200 responses from each large project. These response numbers are based on surveys intended to measure change over time. They would be different if the survey was intended to only take place once.

The question using the words “unsafe” and “uncomfortable” is open to interpretation on purpose.

Even in one language and one geographic location, it is difficult to be sure a question about feelings is going to mean the same thing to just two people. We believe it is better to risk over-measuring these concepts than to under-measure them. As the intent of the survey is to measure change over time, we should still be able to see whether there are differences in how users feel in different months. However, we still want to know any variations in the specific languages which we should be aware of. We are always happy to take a better translation if one exists.

Methodology

This section will be updated with methods as we find out more from our pilot studies.

For any detailed questions before the reports become accessible, please post on our Talk page or write to surveyswikimedia.org.