Global Reach/Mexico Survey/Documentation

Overview

Mexico phone survey 2016

In the spring of 2016, the WMF partnered with Votomobileand conducted a phone survey to learn more about technology and Wikipedia use in Mexico.

The 19 questions in the survey covered:

Internet use
Mobile phone use (smartphones & basic voice/SMS phones)
Awareness and use of Wikipedia
General demographics

This was a large-scale IVR phone survey, gathering over 2600 completed survey responses from randomly generated numbers across all of Mexico. Voice (IVR) surveys were chosen to include respondents who may not have internet access. This approach allowed us to measure internet and smartphone penetration, along with answering other Wikipedia related questions. Also, the scale and methodology of the survey kept the margin of error low (<2%) for questions asked of all respondents.

Questions this survey was designed to answer

What is the actual number of people who use the internet?

Real-world behavior makes this difficult to measure from industry reports, since people might have access to the internet through school, friends, internet cafés, public Wifi, etc.

What do people mostly use the internet for?
How many people use smartphones?
Do people with smartphones use the internet from just Wifi? Or just cellular service?
How many people thought they didn’t use the internet, but do use Facebook or WhatsApp?
How many people have heard of Wikipedia? What do they use it for? How often?
If they have heard of Wikipedia, but weren’t using it, why not?

Goal: Represent the population of Mexico

To get the most representative data possible, we worked with Votomobile to conduct a phone IVR survey. The reach of a phone survey can encompass nearly the full spectrum of of age, gender, geography, income and education levels. For Mexico, the survey includes both land lines and mobile phones.

Another important aspect to this goal was making sure that our phone survey did not focus on just one geographic area. To do this, we examined the approximate population of each state, and set representative target percentages for the number of responses needed from each state. Although Votomobile did not precisely match our targets per state, the results still closely covered the full geography of Mexico, which means that we achieved a high level of coverage of different geographical regions within Mexico and they are represented in the final result.

For proper statistical validity, the phone numbers were randomly generated within each regions area code(s). Our survey size of 2600 completed responses is large enough where the questions asked of all respondents have a 95% degree of certainty of being accurate within a 2% margin of error.

The survey was recorded in both Spanish and English, but over 99% of respondents chose the Spanish version of the survey.

Addressing Bias

One issue with phone surveys is the tendency for some respondents to favor the first response to a question. To address this problem, most of the survey questions presented the responses in a random order for each call. This distributes any bias evenly among the responses instead of accumulating it all on one response. Note that questions that have a 'none of these' or 'other' response always kept this option as the last one presented.

A couple of survey questions, however, have a strong order dependency of their responses and are confusing if they are presented in a completely random order. For instance, when we ask how often they use Wikipedia, asking in a non-sequential order would not make sense (e.g. an order of “once a week”, “once a month”, “once a day”). For these questions, we would randomly present the question in one of two orders: either from lowest to highest, or highest to lowest.

Where to get the data

This page shows graphs of the responses received for each question in the survey.

The full data set can be found at:

Dan Foy (2016). Mexico phone survey 2016. figshare. doi:10.6084/m9.figshare.4287683

This is the canonical version which contains a CSV including every answer from each of the 2600 responses.

The full text of the questions can be found here.

Using the data

The format of the CSV is straightforward - each row represents one survey response, with each column containing the response to the associated question.

It’s important to note that this survey is not linear. Depending on how a question is answered, the flow of the rest of the survey may change. For example, if a respondent says they do not have a smartphone, we skip the smartphone related questions. You can review the flow diagram in to see how the survey progresses.

The questions asking if the respondent uses Facebook or WhatsApp are only asked if they previously said that they do not use the internet. This is by design - we wanted to use this question to gauge how many people did not understand that Facebook was part of the internet. The responses to these two questions were not intended to measure the full use of Facebook or WhatsApp.

Even though we do not ask the respondent where they are located, each response includes a state/region of their location. This is based off of the area code which was called for this entry. Regarding the differences between urban and rural, we allowed the respondent to decide what category applied to them.

External links

Mexico survey data, figshare