The null hypothesis in a statistical test is normally the default position that there is no relationship or significant difference between two sets of observations. If a test shows a significant difference between two experiences the null hypothesis is said to be rejected. This is referred to as a superiority test in A/B testing. The aim is to avoid making the mistake of implementing something that turns out to be no better than the existing experience.
It is important to define the null hypothesis before you begin an A/B test. It can influence the outcome of an experiment. For example, for a simple and low risk test like changing the colour of a call-to-action button. It may not be necessary to prove the change is significantly better than the default variant. The business may be happy for it to be implemented provided it is not significantly worse than the current experience. They may simply want to change the colour to be consistent with brand guidelines or a new design template.
Types of Null Hypothesis:
There are four different types of null hypothesis:
- Superiority test – the conventional hypothesis where the null hypothesis is zero.
- Non-inferiority test – where the aim is to avoid accepting the null-hypothesis if the new experience is no worse than the default experience.
- Strong superiority test – when the null hypothesis is only rejected if the variant is significantly better than the default by a preset margin.
- Equivalence tests – often used in medical trials, this aims to show the difference between the default and the variant is clinically unimportant.
When to Use the Different Types of Hypothesis:
Each test should be evaluated on its own merits regarding the nature of the hypothesis used. The superiority test should be considered where:
- The test is considered high risk and a small decline in conversion could significantly impact on the North Star metric.
- Implementation is costly and difficult to change.
- High maintenance and ongoing costs.
A non-inferiority test is more appropriate when:
- Design and IT implementation costs are very low
- Very easy and quick to roll back changes
- Unlikely to have any significant impact on conversions or revenues.
- Low maintenance costs.
A strong superiority test may be needed when:
- Senior stakeholders are dead against the idea and need convincing that the change will definitely improve conversions by a large margin.
- Very high risk part of the user journey, such as the checkout or payment gateway.
- Rolling back the change could would take time and be very costly.
- Extremely high maintenance costs.
An equivalence test is normally conducted for pharmaceuticals to demonstrate that the response to two or more drugs differs by amount which is clinically unimportant. For A/B testing an equivalence test could be considered when we want confirmation that a cosmetic change to the look and feel of the website has no significant impact on our North Star metric.
In statistical tests like t-tests etc we use the z score to help calculate the margin of error of an experiment. This helps us determine how likely it is that the alternative hypothesis could be true (i.e. how many standard deviations from the mean) and so whether we can reject the hypothesis. Google Optimize uses Bayesian statistics instead to determine the outcome of experiments.
Z Scores – Z score table and how to use it.
Customer reviews of A/B testing solutions – A/B testing tools customer reviews – movers and shakers.
A/B testing software – Which A/B testing tools should you choose?
Types of A/B tests – How to optimise your website’s performance using A/B testing.