Jump to content

Research and Decision Science/Data glossary/Retention Rate

From Meta, a Wikimedia project coordination wiki

This document was created as part of Signals and Data Services Objective 2 Key Result (KR) 2 from the Wikimedia Foundation's 2024-2025 Annual Plan. The KR focused on developing a set of product and experimentation-focused essential metrics. This document outlines frameworks for defining retention rates for both single cohort and multiple cohorts and provides relevant use cases. It  also specifies the instrumentation requirements for accurate measurement. Finally, this document details the  analysis methodologies for evaluating retention rates in A/B testing.

Glossary

[edit]
Unit
A single source of related interactions such as a session, user, or install.
Cohort
A set of units that share a common characteristic or experience within a specific timeframe.

Metric definitions and measurement

[edit]

We explored the retention rate metrics commonly used in the industry and examined the measurement use cases typically handled by the WMF product team. We then integrated this with our in-house data tracking methods on app users and proposed four definitions of retention rate: retention rate of one cohort, cumulative retention rate,  average retention rate of multiple cohorts, and retention rate with two return windows.  

Retention rate of one cohort

[edit]

In general, the retention rate measures the percentage of users who continue to use an app after their initial installation or interaction over a specific period. There are three major steps to calculate it: 1) measure the number of units in the cohort, 2) measure the number of returning units in the cohort over a given period, and 3) calculate the proportion. In the first two steps, specific details need to be defined based on your analysis needs and project requirements.

Step1: Measure the cohort size

When measuring the cohort size, there are two factors to define based on your application: the initial action and the cohort window.

Initial action: this could be an event such as app installation or the first interaction with the app.

Cohort window: this refers to the time period during which you want to sample the units that performed the initial action and group them into one cohort. The cohort window could be as short as a day or a few days, or it could be longer, like 1 week, 2 weeks, 1 month, etc. The length of the window depends on the unit’s behavior patterns and the purpose of the measurement.

Cohort Window Case Example

For shorter-term measurements, such as daily monitoring, the most commonly used cohort window is one day. For medium-term measurement, such as A/B testing (which typically runs 2-4 weeks in WMF), the ideal cohort timing window could be a few days or a couple of weeks to ensure sufficient data for return measurement, which is the numerator in the equation.  For longer-term measurements, such as high-level metrics, the cohort timing window can be one month or even several months. Since some unit identifiers may expire or be hashed after a certain period, they should also be examined when selecting the window.

To measure the cohort size, count the total number of unique units that performed the initial action during the cohort window. Here are a few examples.

  • If app installation is the initial action and 1 day is the cohort window, the cohort is counted as the number of units that installed the app on a specific day.
  • If app opening is the initial action and 1 week is the cohort window, the cohort is counted as the number of unique units that opened the app during a specific week.
  • If app engagement is the initial action and 1 month is the cohort window, the cohort is counted as the number of unique units that engaged in the app in the given month.

In the example shown above, User 1,2,3,4 are included in the cohort, while User 5 is not included because they did not take initial action within the cohort window.

Step 2: Measure the returned units in cohort over a given time period

When measuring the returned units, there are five factors to to define based on your application: unit, return action and its frequency, retention length and return window

Units: these units should belong to the cohort you selected in step1.

Return action: This is an interaction with the app that indicates users have reused it. Typically, this action is the same as the initial action.

Frequency of the return action: this refers to the number of return actions the unit must take to be considered retained.

Retention length: This refers to the duration over which you want to assess retention. Depending on varying measurement interests, retention length can range from hours to days, weeks, or even months.

Return window: This refers to the time frame during which the user’s return actions are measured. For the retention rate of one cohort, the return window is defined with the same length for all units. While the length of the return window is consistent, the start date may vary based on when the user’s initial action occurred, and the end day may differ due to these varying start dates. With the return window defined in this way,  each unit has the same amount of time to complete their return action. This method is often used to measure N-th day/week retention rates.

Here are a few examples.

  • If the retention length is 1 day and the return window lasts for 1 day, the return window starts 1 day after the unit’s initial action, covering the entire 2nd day. In the example shown below, User 2 and User 3 are considered retained because they took actions within their return windows.
Return Window Case Example 1
  • If the retention length is 7 days and the return window lasts for 1 day, the return window starts 7 days after the unit’s initial action, covering the entire 8th day. In the example shown below, User 1 is considered retained because they took actions within their return windows.
    Return Window Case Example 2
  • If the retention length is 7 days and the return window lasts for 3 days, the return window starts 7 days after the unit’s initial action, covering the period between 8th day and 10th day. In the example shown below, User 1, User 2  are considered retained because they took actions within their return windows.
    Return Window Case Example 3
  • If the retention length is 2 weeks and the return window lasts for 1 week, the return window starts two weeks after the unit’s initial action, covering the entire 3rd week.

We suggest using this definition for A/B testing, as it offers a more accurate and equitable return status for all units. But it increases query complexity and demands more data over a longer period.

For example, during a 3-week A/B test period, from August 1 to August 21. If we choose a cohort that made the initial action between August 1 and August 7, the retention length is 7 days, and the length of return windows is also 7 days,  then return window varies as follows:

  • For a unit that took the initial action on August 1, the return window is from August 8 to August 14.
  • For a unit that took the  initial action on August 7, the return window is from August 15 to August 21.

This means that for a 1-week retention rate, you need to collect data for 2 weeks, and for a 2-week retention rate, you need to collect data for 3 weeks. Please keep this in mind when planning your A/B testing.

With the above defined elements, you can measure the returned units by counting the total number of unique units that performed the return action during their own return windows.

Step 3: Calculate the proportion

Using the results from the previous two steps, you can calculate the retention rate by dividing the number of returned units by the number of units in cohort at the start. For example, the retention rate might be the percentage of units that opened the app on one day and then used it again the next day, known as the second-day retention rate. Another example would be the percentage of units that opened or engaged with the app on August 1st and then reopened or engaged with it in the second week– this is known as the second-week retention rate. In summary, if you are interested in the Nth-week retention rate, it is defined as the percentage of units that opened or engaged with the app in week 1 and then reopened or engaged with it in the Nth week.

Data aggregation

[edit]

This is the query framework. Analyst needs to define the query conditions for 1) cohort window, 2) initial action, 3) return action, 4) return window, 5) return frequency.

Cumulative retention rate of one cohort

[edit]

In real projects, analysts often face limited data points, short experiment durations, and a small number of returned units. To include as many returning units as possible, analysts might  loosen the time criteria and set a return window as long as possible, covering all potential time periods and ending at the same time for all units, usually the last day of the available data.  It’s often referred to as the cumulative retention rate.

The definition of cumulative retention rate follows the same framework mentioned in ‘ Retention rate of one cohort’, except for how the return window is defined. The return window is set to end at the same time for all units. In this scenario, the start date may vary based on when the user’s initial action occurred, but the end date is the same for everyone. This implies that units that initiate their actions later have less time to complete their return action.

Here is an example of a 10-day cumulative retention rate. The return window starts on the 2nd day of the unit’s initial action and ends on day 10. As long as the unit completes the return action by the end of day 10, it is counted as retained.

Cumulative Retention Rate Case Example

Data aggregation

[edit]

The query framework is the same as the one used for the retention rate of one cohort. The analyst needs to define the query conditions for the following: 1) cohort window, 2) initial action, 3) return action, 4) return window, and 5) return frequency. The major difference lies in how the return window is defined: the return window ends on the same day, rather than shifting over a period of the same length. For more details, refer to the example below.

Average retention rate of multiple cohorts

[edit]

In some cases, you might want to measure the retention rate across multiple cohorts, for purposes like daily monitoring and baseline collection. To do this, you can calculate the average retention rate for the multiple cohorts, which may be categorized based on different initial action windows or other criteria, such as by wikis. It is important to use the same definition for calculating the retention rate across all cohorts before averaging them. For example,  If you are interested in the average Nth-week retention rate, make sure to use the same retention length (N weeks) for all cohorts before averaging the results.


Retention rate of one cohort:

Data aggregation

[edit]

This is the query framework. Analyst needs to define the query conditions for 1) cohort window, 2) initial action, 3) return action, 4) return window, 5) return frequency.

Retention rate with two return windows

[edit]

In the interest of user stickiness, analysts may want to measure retention with two return windows. Specifically, you might want to know the percentage of units that returned during window 1 and  then also returned during window 2. To do this, most steps are the same as calculating the retention rate for one cohort. However, when calculating the number of returned units, you need to add the condition that users must complete return actions in both windows.

Here is an example of a 7-day retention rate with 2 return windows. To be counted as retained, the unit needs to complete a return action on the second day and another between the 8th and 10th days.

Two Return Windows Case Example

Data aggregation

[edit]

The query framework is the same as the one used for the retention rate of one cohort. The analyst needs to define the query conditions for the following: 1) cohort window, 2) initial action, 3) return action, 4) return window, and 5) return frequency. The major difference lies in the number of return windows defined. You define two return windows instead of one. For more details, refer to the example below.

Data Collection (Instrumentation)

[edit]

To measure retention rate, instrumentation must log unit identifiers, like user ID, app installation ID, session ID, and so on. As some types of the IDs may have a limited lifespan for privacy or other reasons and could be hashed or regenerated for individual users after a certain period, this needs to be carefully planned along with defining the cohort window and return window.

Instrumentation must also log the actions taken and their timestamps.

For the purpose of A/B testing, a column indicating the experiment assignment is also needed.

Statistical analysis

[edit]

The recommendations of methods for calculating confidence intervals for baselines and analyzing A/B tests are documented at https://people.wikimedia.org/~jiawang/notes/05_demo_AB_test_analysis.html

Repo:https://gitlab.wikimedia.org/jiawang/sds2_2024_retention_rate/