Jump to content

Research:Effectiveness of the new participant pipeline for Wiki Loves campaigns

From Meta, a Wikimedia project coordination wiki
Tracked in Phabricator:
Task T290387
Created
06:13, 14 October 2021 (UTC)
Contact
no affiliation
Duration:  2021-October – 2022-August
This page documents a completed research project.


Many photo competitions or other engagement campaigns that aim for participation by the general audience make use of CentralNotice banners to recruit readers and invite them to participate in the competition or activity. In order to inform community discussions and decision-making, it would be helpful to better understand the effectiveness of the centralnotice banners and the later parts of the engagement pipeline. Especially centralnotice banners are increasingly considered as valuable and precious real estate, where multiple priorities are competing over the readers' attention. However, most of our understanding on the effectiveness of banners is directly related to fundraising.

In this project, we will aim to get a better understanding of the effectiveness of banners and landing pages in the context of trying to engage readers to contribute content. Specifically, we will limit ourselves to Wiki Loves style campaigns. This understanding should help organizers and the community to collaboratively make effective choices regarding banner usage, and help organizers to make effective choices in landing page design.

Methods

[edit]

New contributors that join through Wiki Loves Monuments, generally go through these steps:

  1. User sees banner
  2. User clicks on banner and goes to landing page
  3. User goes through a number of steps, specific to the country (e.g., identify a monument)
  4. User creates a Wikimedia account
  5. User uploads image through Wiki Loves campaign

The assumption is that these contributors can be identified after the fact through their uploads that are labeled with a dedicated template, the upload date and the registration date. What we don't know, is how many people could have participated: how many people did we lose in each of these steps?

There are two initial main questions that we will focus on:

  • For each country, in which 'phase' do we lose the most potential participants (funnel analysis)
  • What diet would be appropriate for future campaigns.

I welcome additional questions that could be answered in the same analysis process, given the required investment of effort.

Some relevant sub-questions that have been identified:

  • Given a unique landing page visitor, what is the first banner they click on (first view, second view, m'th view)?
    • If their first clicked banner is 'm', what is the distribution of m? How many people clicked on the first banner, how many on the second, etc.
  • Given a unique uploader (eventual success!) what was the last banner they clicked on?
    • How many uploaders made use of the 6th, 11th, 21st banner view? (the hypothesis based on fundraising insights is that only the first few banners are effective)
  • Given a banner click, how many people click on the banner a second time? Third time?
  • How do these numbers differ from mobile vs from desktop?
  • How do these numbers differ between countries?
  • Are certain landing page designs more effective than others?
  • What can we say about where users are coming from? (tourists vs residents)

These questions need further ironing out, based on what data is available.

Data

[edit]

We will use different datasets to capture interaction with banners and the respective landing pages

  • webrequest-logs allows us to collect page views to landing pages and the referrers of the requests
  • Banner activity contains data about banner impressions, campaigns, status codes and others across all wikis and languages in minutely resolution
  • EventLogging from CentralNoticeBannerHistory Summary of recent items in the client-side log of CentralNotice banner events, for CentralNotice campaigns with the banner history feature enabled
  • (potentially others)


Tentative Timeline

[edit]
  • Onboarding (1 month)
  • Identify relevant datasets for capturing main question (1 month)
  • Exploratory analysis (1 month)
  • Detailed analysis (3 months)


Outcomes

[edit]

Banner performance on Wikimedia projects is historically being analyzed quite comprehensively by the Wikimedia Foundation for fundraising banners. However most of the banners being shown are actually related to community activities. In this project, we took a first look at banner and pipeline performance of banners for Wiki Loves Monuments (WLM), the annual photography competition. The competition is organized in a federated fashion, and each participating country has their own banners, landing pages and upload process. We analyzed web requests from readers that were shown a WLM banner during 24-hour time windows. We defined a 'reader' in this context as an approximate user IDs for visitors and describe the methodology in more detail below.(as a proxy for visitors), We noted how many banners they received, whether they visited a landing page, visited an upload page (a proxy for attempts to upload an image) and/or visited an account creation page (a proxy for an attempt to create an account). We describe the methodology a bit more in detail below.

We find that the readers that see a WLM banner, fewer than 1% (0.2 – 0.8%) visit the landing page within the same day. Of those readers that both saw a WLM banner and visited WLM landing page, ~0.1-1% continued to visit an upload page or continued to visit an account creation page, respectively.

We observed that visiting a WLM landing page seems to have a positive effect on the likelihood that the same reader visits an upload page and/or an account creation page.

One of the initial questions was whether showing additional banners beyond the first few banners is still effective. We find that about a third of the readers that see the WLM landing page, have seen 5 or more WLM banners that day (this is a conservative estimate).

Data Collection

[edit]

Between the dates 24 September and 21 November 2021, we queried the webrequest logs of the Wikimedia servers using Spark to find log entries related to readers that had seen at least one WLM banner that day, that were related to a banner impression, a visit to a landing pages, a visit to an account-creation page or a visit to the file-upload page. The data was collected in UTC-days: we created a daily approximate user ID for a reader by hashing the IP-address and user agent. If the UTC-day, user agent or IP-address changed, this would therefore be considered a new 'reader'. These approximate user IDs (hereafter: readers) were used to aggregate data, but not stored longer than necessary.


From this data, we created five anonymized tables per day: banner impressions (in total 4.8B rows), landing page views (5.4M rows), visits to Create Account (0.5M rows), visits to Upload (90K rows) and aggregated data by reader (1.2B rows). The tables were anonymized and shuffled to only maintain timestamps at the date level.

Caveats

[edit]

Data was not collected for the first 3.5 week of September 2021, which means that for most countries, data was collected for the tail of the campaign. We were not able to analyze landing page visits for WLM-campaigns with a landing page outside the Wikimedia projects.

There was a known data loss issue during the data collection, which means that absolute numbers of page loads are under estimates (up to 15-21% depending on the country). There is no reason to suspect that this data loss was not at random in terms of which page loads were affected.  

Readers are defined as within a UTC-day. This definition has some downsides, and means that readers might get artificially split if they extend across the UTC-midnight. All percentages therefore are likely under estimates to some extent. Visits to account creation and upload pages do not directly match with actual account creations and uploads. For example, actual account creations have been found to be about 25% higher than the webrequest logs suggest. It is unclear how this ratio holds up over time or how random this is.

Results

[edit]

We were able to collect data with landing page information for the entire campaign for 11 countries (Algeria, Benin, Brazil, Malaysia, Moldova, Peru, Qatar, Slovenia, Uganda, United States and Zimbabwe) and partial campaign for 12 countries (Armenia, Croatia, France, Germany, Ghana, Ireland, India, Israel, North Macedonia, Sweden, Taiwan, Ukraine). In this set, the United States represented by far the most impressions as other large WLM-campaigns were organized in September and we only collected partial data for those (such as France, Germany, India, Ukraine) or their landing page is hosted externally so were excluded from this analysis (such as UK). The countries with many banner impressions had their campaign in September.

Conversion Rates

[edit]

In our analysis, we assume a pipeline where readers first see a banner, click on it to arrive at a landing page (interested in the competition), possibly create an account, go to the upload page (intend to upload an image) and then uploads an image (we can't identify successful uploads from a reader). For each step, we calculated the conversion rate for each country's banner. Each conversion step is low (0.2-0.8% click on the banner, less than 1% goes from landing page to upload page), but the differences between countries may offer some insights into the effectiveness of the upload pipeline per campaign.

We capture the effectiveness of banners in terms of how likely people are to click on them, i.e. a clickthrough-rate. These rates are mostly within the same range, with a few outliers with a much lower rate. These outliers can be explained by encoding errors in the page title during data collection, or the use of an external website to host the landing page. We see only small numbers of readers that have seen banners from multiple countries (WLM campaigns are by design country specific). The outlier upwards is the group of readers that saw multiple different banners which were excluded from further analysis.

More interesting may be the effectiveness of landing pages. We attempt to estimate this in terms of the percentage of (attempted) uploaders: readers who, after seeing the landing page, continued to visit the upload page. There is a big difference between the countries with the highest and the lowest percentage of upload-attempts, with the percentage ranging between 0.08-1.7%. While there are plenty of caveats about sample size, we noticed qualitatively that the countries with the lowest percentage had often an encyclopedic-looking page design, lacked visuals and/or a clear single call to action. Every situation is unique however, and national organizers can use this as a single data point. It should be noted here that this only shows people who visited an upload-page, and not actual uploads.

We also detected some correlations with visiting the landing page (which one reaches through clicking on the banner):

  • The likelihood that a reader visiting a landing page then visits an account creation page is more than 25 times larger than for readers that did not visit a landing page. This means that people who are interested in participating in Wiki Loves Monuments are much more likely to try to create an account.
  • The likelihood that a reader visiting a landing page then visits an upload page is more than 100 times larger than for readers that did not visit a landing page. This means that people who are interested in participating in Wiki Loves Monuments are much more likely to try to upload an image.
  • Readers that visit an account creation page are more likely to try to visit an upload page if they visited an account creation page through Wiki Loves Monuments. The likelihood that a reader of an account creation page continues to visit an upload page is around 10 times higher if they also have seen a landing page in comparison to if they didn't.

Number of banners seen

[edit]

We were able to determine how many banners a reader visiting a landing page, account creation page or upload page had seen prior to that page. This may be helpful for the community in their discussions about whether to limit the number of banners that people can see.

We found that of the readers visiting a landing page, about half did so after only seeing 1-3 banners in the same 24-hour period. This percentage quickly decays as a function of the number of banners seen (from 7% with 4 banners to less than 2% for 9 banners). Nevertheless, about 24% of the readers visiting a landing page had seen 10 or more banners by the time they clicked on it. These numbers vary somewhat between banner campaigns, but the trend is always the same: 19-29% of the readers had seen 10 banners or more. It should be noted that in the middle of the campaign, around October 8, a so-called "diet" of 3 banner impressions per reader per week was introduced in the US campaign. This provides some sort of natural experiment, which we can analyze to infer what the effect may have been. First, when the diet was introduced, we observe indeed a steep drop in users that have seen a banner of 63%, in landing page visitors of 46%, in upload page visitors of 20% and account-creation page visitors of 24%. This means that the theory that people will likely click on their first few impressions does not entirely hold. We observe that the US is the only country that has such a significant drop suggesting it is caused by the diet. There is a lot of noise in this data though, so to test whether the diet is indeed responsible for the drop, we created a synthetic control based on the 5 countries aside from the US that had the most landing page visits. in order to see how the US campaign may have behaved without the diet introduction. This test confirms that there is indeed a much bigger drop in landing page visits than would have been expected without the diet. Second, we see a substantial drop in the proportion of readers with landing-page visits that saw 10+ banners around the same time of the introduction of the diet. However, we also observed (unexplainably) similar drops in campaigns of other countries. In addition, there is still a large number of readers with more banner impressions than could have been expected based on the diet (around 15% of the landing page visitors had seen 10+ banners), suggesting that a substantial portion of the readers were exposed to banners beyond the number expected based on the diet. This might be because of how the diet is practically enforced in the browser vs our construction of 'readers'. Therefore, for more robust conclusions with fewer caveats, a controlled trial would be needed, comparing the behavior of readers with and without a diet in the same region.

Future Work

[edit]

After these results, there is a number of questions that is still open, that may benefit from future research.

  • Based on the results we see in the United States, we are forming a hypothesis that the use of diets results in a significant drop of contributions. However, from our data it was not possible to establish how large that effect would be. For that, a cleaner comparison would be needed, for example with a randomized controlled trial, more granular discontinuity analysis or a combination of both.
  • We collected some data that provided insights into how many banners readers had seen before they arrived on the landing page. However, this was rather limited by time (a UTC day) and we did not log how many banners readers had observed when they continued to the account creation or upload pages. Ideally we would like to understand how many banners a reader had observed in the time period just before visiting the landing page and then subsequently the upload page.
  • It would be helpful to better understand how often people visit the landing page before they continue to create an account or upload a file.
  • In order to better understand why people participate, or don't, a mini survey to people who visit the landing page may shed a lot of light.
  • In order to better understand the effectiveness of landing page designs, a more traditional usability study might be more effective as a first step, comparing different designs across different cultures.

References

[edit]


See also

[edit]