Although the simple mention of the word ‘statistics’ might make your skin crawl, statistics are an essential tool of A/B testing. We created a simple cheat sheet to help you navigate the basic terminology of statistics with A/B testing.
1 – Populations and samples
In statistics, we want to find results that apply to an entire population of people or things. However, in most cases, it is impossible to collect data from the entire population. This is why we collect data from a small subset of the population, called a sample.
2 – Mean
The mean is the simple mathematical average of a set of numbers. If you, for example, sell lamps and your customers’ purchases are as follows:
- Customer A: purchased 2 lamps
- Customer B: 2 lamps
- Customer C: 3 lamps
- Customer D: 2 lamps
You would calculate the mean by adding the values (lamps purchased) and dividing that sum by the number of values measured (number of customers): (2+2+3+2)/4 = 2.25.
3 – Standard deviation vs. standard error
These two metrics are very often mixed up. In short, standard deviation is about your data and standard error is about your sample.
The standard deviation is a measure of how well your mean represents your data. It indicates how much members differ from the mean value of the group. Conversely, the standard error tells you how well your sample represents the total population.
4 – Confidence intervals
Confidence intervals are boundaries within which we believe the true value of the mean will fall. The purpose of confidence intervals is to establish a range within which values of a population will fall with certain probability. Usually, we look at 95% confidence intervals and sometimes 99%. This means that we expect 95% or 99% of values to fall within the range of the confidence interval.
5 – Null Hypothesis vs. experimental hypothesis
An experimental hypothesis is the prediction that your experimental manipulation will have some effect or that certain variables will relate to each other. In contrast, the null hypothesis is the prediction that you are wrong and that the predicted effect doesn’t exist. In other words, the experimental hypothesis is something a researcher tries to prove whereas the null hypothesis is something the researcher tries to disprove.
6 – Type I and Type II errors
A type 1 error, also called a false-positive, occurs when we believe that there is an effect in our population, when in fact there isn’t. The opposite is a Type II error, a false-negative, which occurs when we believe that there is no effect on the population when in reality there is.
7 – p-value
The p-value is the probability of observing an outcome equally or more extreme than the one observed in the test, assuming that the null hypothesis is true. The smaller the p-value, the more certain we are that we should reject the null hypothesis.
8 – Statistical Significance
The result is statistically significant when the p-value is smaller than the significance level. The significance level (𝛂) is the probability of rejecting the null hypothesis when it is actually true. In other words, it is the probability of wrongly rejecting the null hypothesis. For example, a significance level of 0.05 indicates a 5% risk of making the wrong assumption: rejecting the null hypothesis when, in reality, it is true.
9 – Statistical Power
Statistical power is the probability of observing a statistically significant effect if there is indeed one. In other words, it allows you to detect a difference between test variations. That is, if that difference actually exists.
We hope this A/B testing blog will help you along! Part 1 of our A/b testing blog can be found here.
Feel like reading more data stories? Then take a look at our blog page. Always want to stay up to date? Be sure to follow us on LinkedIn!