Blog>Minimizing Margin of Error for A/B Testing - 4 Easy Steps

Minimizing Margin of Error for A/B Testing - 4 Easy Steps

Learn simple best practices for mitigating type I and II errors, cumulative alpha error, and sample pollution while A/B testing.

4 minutes
Minimizing Margin of Error for AB Testing - 4 Easy Steps

There’s always a risk that the data you’re reviewing for your A/B tests is not truly representative of the impact the change will have on your customers - this unfortunately comes with the territory. However, there are things that you can do to mitigate the margin of error and provide more confidence and reassurance for your team that the results you’re seeing are legitimate. If you’re concerned about the validity of your A/B tests or the effectiveness of your CRO plan in general, here are 4 questions to ask yourself.

Are you running multiple experiments at once?

Many companies choose to run lots of parallel tests to increase their experimentation velocity. This can be an effective strategy, especially if you are running experiments on different audience segments or funnels of your site that have very little crossover. However, by running tests simultaneously you do open yourself up to a higher risk of inaccurate results.

For example, let’s say you are running 3 simultaneous experiments: Experiment A on your homepage, Experiment B on the category page, and Experiment C on the product page. A potential customer lands on the homepage then moves to the category page and finally reaches the product page, being served a variant from each test along the way. This user ends up purchasing something - triggering a revenue conversion. Now the question is, which test ultimately impacted user behavior, and did the experiments influence each other? This cross-pollination of users dramatically increases your chance of a Type 1 Error, also known as a false positive. In the case of A/B testing, this would mean that you concluded that there was a winning variant with statistical significance, when in fact the test was inconclusive.

How many variants are you running in each experiment?

One way to avoid the possibility of simultaneous tests influencing each other differently depending on how they are combined is to implement a multivariate test. A multivariate test allows you to change multiple elements on a page and test all possible combinations to determine what will work best. Like the simple example below that is testing both the color and text of the CTA button by creating variants for each possible combination.

Multivariate tests can be great for determining how variables may be positively or negatively impacting each other. However, the more variants you run in a single experiment, the more you will be affected by cumulative alpha error. When you run a simple 2-variant experiment that results in 95% statistical significance, that essentially means there is only a 5% chance that there is a false positive. The odds are definitely in your favor. However, the more variants you add to the equation, the higher the chance for a false positive because each variant has the potential for that type 1 error. For example, if you run an experiment with 10 variants, even if the test reaches 95% statistical significance there is now a 40% chance of a false positive. 

This can be overcome but will require your test to run for significantly longer. When experiments run for longer you’re negatively impacting your experimentation velocity, so it’s important to weigh the value of multivariate or A/B/n tests vs running a larger number of experiments with fewer variables. 

Additionally, the more variants you run, the more likely your results will be skewed by sample pollution. Ton Wesseling describes this by saying, “When users return to an experiment, some of them will have deleted their cookies, some of them (more often a lot!) will use a different device. With 1 variation there is a 50% chance they end up in the same variation if they return in the experiment. If you have 3 variations, there is only a 25% chance they end up in the same variation. The more variations, the bigger the pollution.” 

How long are you allowing each experiment to run?

Every company will have a different max velocity for experimentation depending on the number of visitors to your website. You should be allowing your experiments to reach statistical significance before confidently proclaiming a winning variant. Statistical significance is defined as how likely it is that the difference between your experiment's control version and the test version isn't due to error or random chance. Ideally, all experiments should reach a minimum of 95% confidence in order to decide on a winner, for this to happen you need to have enough traffic. Without having a large enough sample size you run a higher risk of a Type 2 Error (or a false negative). This means that you determined there was no statistically significant improvement for any of the variants when there actually was. The only way to decrease your chance of a Type 2 error is to increase your sample size by allowing your test to run for longer.

It’s also a good idea to ask yourself if your customers’ behavior and purchasing patterns are different throughout the week. If so, you may want to consider allowing all of your experiments to run for at least one calendar week to account for differences in your audience and their activity on the site.

Are you running experiments without a clear strategy?

This one is less related to actual margins of error, but it’s still an incredibly important question to ask yourself to evaluate the overall effectiveness of your CRO efforts. By establishing an actual CRO plan/strategy you can be sure that the hypotheses you’re testing are in alignment with your primary business goals. Plus, by organizing your CRO strategy with planning and road mapping tools you can also increase the velocity of your experimentation by keeping your team on the same page about what’s coming next. For more information on how to create a clear and effective CRO strategy - check out this article.

Conclusion

Ultimately, it’s entirely up to you and your business whether you want to run simultaneous experiments or test lots of variables at a time. There’s no right or wrong answer to CRO, these are just ideas on how you can mitigate risk if accuracy is what is most important for your team. We suggest that you weigh what is most important to you to decide what balance of velocity and risk is best. But no matter what, it’s important to understand the risks and margins of error in A/B testing, how you can limit them, and ultimately how you can orchestrate a more effective CRO strategy that generates growth for your business.