Are Multi-Armed Bandit Tests Superior To A/B Tests?

Are Multi-Armed Bandit Tests Superior To A/B Tests?

Multi-armed bandits seek to aggressively optimise content

What are multi-armed bandit tests?

Multi-armed bandit tests (MAB) use an algorithm to proactively seek out the best performing experience. It aggressively optimises and increases the average conversion rate during a test. This means that you can earn and learn simultaneously. Traffic is automatically switched to the variant with the highest conversion rate.

What is a bandit?

A bandit is another name for a slot machine. Imagine that you were in Vegas with a limited budget and time to play a selection of slot machines with different pay-outs. Multi-armed bandit (MAB) tests seek to maximise your winnings by trying to work out which slot machine has the highest pay out. It automatically adjusts resources (i.e. traffic) to optimise revenues.

This is very different from A/B testing where traffic is evenly split between each variant. In the example below there are two variants (a control and a challenger). Each receives 50% of all traffic from the beginning to the end of the test.

Multi-armed bandit tests work so that for 10% of the time the traffic is split equally between the two variants (the exploration phase). For the remaining 90% of the test it sends traffic to the best performing variant (the exploitation phase). Bandit tests also provide the option to weigh traffic according to the value of different variants from the beginning of the test. This is a best guess approach. Below is an example of potential weights for a three variant experiment.

What about statistical confidence?

Multi-armed bandit tests aggressively optimise the best performing variant by sending little traffic to the worst performing variant during the explorative phase. However, this will usually occur before full statistical confidence is obtained. We may not be able to identify whether a variant is the worst performing variant or if it’s just down to chance. This means that it will require a lot more traffic to reach full statistical confidence for poorly performing variants in an MAB test and thus take longer to get a conclusive result.

What Assumptions Do Multi-Armed Bandit Tests Make?

Most use algorithms which make a number of assumptions about conversion rates.

  • Serving a variant and observing a conversion happen instantaneously. This means that multi-armed bandit tests are not suitable for email marketing. Or where there is a significant time-lag between when a customer sees a variant and the conversion occurring.
  • Conversion rates are fairly constant and don’t significantly change over time. If your conversion rate is subject to substantial fluctuations due to factors such as the weather or other seasonal factors then MABs may not be appropriate.
  • Samples in MABs are independent of each other and so don’t influence the conversion rate.

What Are The Benefits of Multi-Armed Bandit Tests?

Exploit winning variants:
  • MABs generally achieve a higher average conversion rates during the test period. They allow you to reduce the opportunity cost of testing by allowing for a smooth transition from exploration to exploitation to increase revenues.
Automate optimisation:
  • MABs allow you to automate the optimisation process with machine learning so that low performing variants can be dropped and traffic can be channelled towards the best revenue generating variant.
Continuous optimisation:
  • Where you are frequently adding or removing variants to be tested it provides the flexibility that A/B testing is not designed for. If you want to add new variants to replace low performing experiences during the testing process MABs facilitate this. They also work well with targeting specific ads or content to customer segments.
Innovation tests:
  • MABs perform best when there is a very large difference in the conversion rates of different variants. MABs are best suited for optimisation when you have radically different experiences. Like an innovation test, where you might expect to see big differences in the conversion rates of each variant.
Persuasive profiling:
  • MAB’s are suitable for persuasive profiling so that you identify what content works best for a particular personality trait.
Time is not a priority:
  • When you are not in any rush to identify the best performing variant and want to optimise the average conversion rate MABs can be a suitable tool.


Traffic greedy:
  • MABs require more traffic and more time to reach full statistical confidence. If you are not bothered about the average conversion rate during the test and need a speedy, but conclusive test result. Then A/B testing is probably the right methodology for you.
Needs large differences:
  • When there is little difference between the conversion rate for each variant the benefit of multi-armed bandits disappear. This is a concern as we know from experience it is almost impossible to predict how much a difference a new design or heading will make to the conversion rate. The danger is that our own subjective opinions and biases come into play here. This is what experimentation is designed to avoid.
More room for error:
  • As bandits begin switching traffic before full statistical confidence is reached. There is more danger that a variant that is performing better purely by chance will be selected as the winning experience. Conversely a variant that is initially performing poorly due to chance is more likely to be dropped by the algorithm and revenues lost.
Implementation is not easy:
  • Setting up MABs is technically challenging. You may need a data scientist to advise on how to integrate and scale the code and a developer to program the test.

When Should You Use Multi-Armed Bandit Tests?

  • When you want to simultaneously explore and exploit an optimisation opportunity.
  • Optimising radically different variants where there is a need to begin exploiting the best performing experience without delay.
  • Headlines and short-term campaigns, particularly if the content has a limited time span.
  • Automation for scale.
  • Targeting to understand how different customer segments respond to content.
  • Combining optimisation with attribution. By including a bandit algorithm on your website and in your call centre automated software you can seek to optimise across multiple touch points.


These are not an alternative to A/B testing as they are designed for different roles in the optimisation toolkit. A/B testing is excellent for conducting online experiments to identify the best performing variant with a high degree of statistical confidence. MABs are more suited to continuous optimisation and short-term campaigns where the objective is to achieve a high average conversion rate. Ideally you would want to use both A/B and MAB testing as part of a comprehensive optimisation program.


Gambling icons created by justicon – Flaticon

More reading

Secrets of Optimising Gambling Sites - Bonuses

A Closer Look at iMotions Eye Tracking


Leave a Reply

Your email address will not be published. Required fields are marked *