" "

Longitudinal Experimentation for Personalization: Understanding and Acting on Dual-Layered Impacts

For Personalization, Why Longitudinal Experimentation over A/B Testing?

Personalization is the art and science of optimizing individual experiences. By harnessing the power of data, advanced machine learning algorithms, and business insights, a well-designed personalization system can offer tailored experiences through the most suitable channels and at the perfect moment. Achieving personalization across various dimensions poses the challenge that each marketing lever can have many variations, resulting in millions of possible combinations to create the best experience for each individual. This solution is not something that comes out of the box.

While product recommendations can be based to some extent on historical data (except for new products), responses to offer incentives cannot simply rely on past data, especially around personalized and ever-optimized incentives. The key to providing the best response is through experimentation. But experimenting with so many permutations (while still managing the overall customer experience), can be both time-consuming and expensive, when considering computation resources, FTE time, and the opportunity cost of holding out control groups.

Simple A/B testing frameworks are effective when there is limited piloting activity and relatively straightforward customer experiences. However, to optimize multiple correlated marketing levers, simple A/B testing quickly runs out of room for statistical significance and misses the broader picture of the cumulative personalization impacts. In these cases, longitudinal experimentation is an efficient approach to understanding and acting on “dual-layered personalization impact” to maximize customer experience and value delivery.

Understanding the Dual-Layered Personalization Impact

Dual-layered personalization impact refers to the effectiveness of:

  1. Individual test-and-learn pilots
  2. The overall program, which may consist of hundreds of pilots combined over time

We need to move beyond simple A/B testing to understand the dual-layered impact of personalization while maintaining a consistent customer experience over a long period of time. A longitudinal experimentation framework establishes a global test-and-control structure to measure impact at the program level, and a set of individual test-and-learn pilot-level tests and controls to understand the impact of specific strategies. The global test group receives personalized interventions guided by a learning agenda, while the global control group serves as a benchmark and receives business-as-usual (BAU) experiences. Individual pilot-level tests can then be carved out within the global test group. Pilot-level control can come from the global test for simplicity, or as synthetic control from global control for non-diluted impact.

There are two main approaches to creating test and control groups: stratified sampling and synthetic control. Stratified sampling ensures comparability by selecting groups based on a small set of key performance indicators (KPIs) such as last 30-day engagement and purchase history. This method is straightforward and provides clear measurements at both the program and pilot levels. However, stratified sampling carves out local controls from the global test group (e.g., pilot control groups in the chart above), incurs a high opportunity cost, and dilutes overall program-level impacts. In contrast, the synthetic control approach creates a weighted combination of pilot-level controls from the global control group. This minimizes the control population and reduces impact dilution, though it requires significant computational resources for iterative calculations.

In summary, choose synthetic control when minimizing control size, when impact dilution is crucial, and when you have the necessary computational resources. Opt for stratified sampling when ease of implementation and clean measurement are priorities.

Furthermore, longitudinal experimentation enables regular and systematic population refreshes to accommodate new and engaged customers and provide the most relevant experiences along their journeys. For example, in a personalized promotion campaign based on a predicted product-affinity score, new eligible customers may be assigned to either the pilot test or control group every week, while customers with recent purchases will “graduate” from the test group and stop receiving promotions. Designing audience refresh requires close collaboration between the data science and marketing teams. Key decisions include defining the eligibility and experience of the global control group, determining the audience size, and establishing the time needed to achieve statistical significance.

Acting on the Dual-Layered Personalization Impact

  1. On individual test-and-learn pilot level: Know when to stop testing and move on to the next experiment
  2. On the overall program level: Know when to increase the size of the global test population and eventually become the new BAU

As we noted earlier, experimentation is time-consuming and poses an opportunity cost (keeping individuals out of marketing). However, with a longitudinal experimentation framework to understand the dual-layered personalization impact, the personalization system can also efficiently act on the insights to orchestrate the piloting and scaling process in a way that will maximize the financial impact of personalization. Value-driven orchestration involves prioritizing which tactics to test and in what order, enabling dynamic adjustments based on real-time data and insights.

On the individual test-and-learn pilot level, if measurement results show sustained positive impacts (e.g., pilot 1, 3, and 4 in the chart above), a longitudinal experimentation framework may negate the need for further experiments to scale the impact and, thus, free up capacity for new experiments.

On the overall program level, as the personalization program identifies more successful pilots and overall collective positive impact, the longitudinal experimentation framework will expand overall personalization programs by increasing the global test group (e.g., from 50% to 80%) to scale the personalization impacts. Down the road, the global test group becomes, in effect, the new BAU group for starting another round of program-level optimization.

Building a Foundation for the Future

Longitudinal experimentation is an efficient approach to understanding and acting on the “dual-layered personalization impact” to maximize customer experience and value delivery. By efficiently acting on insights and orchestrating the piloting and scaling process, the personalization system can prioritize successful tactics and dynamically adjust strategies based on real-time data and insights. This allows for both individual test-and-learn pilots and overall program-level optimizations, ultimately leading to the expansion and enhancement of personalized customer experiences. But building longitudinal experiments from scratch is not a simple undertaking; complexities such as cold-start, customer dropouts, and overlapping strategies may arise. Marketing and data science teams can best address these complexities by working together to evaluate and customize the experiment design so that it can achieve its fullest potential.