A/B testing your SEO changes can give you a competitive edge and avoid the bullet of negative changes that could reduce your traffic. In this episode of Whiteboard Friday, Emily Potter not only shares why it’s important to A/B test your changes, but also how to develop a hypothesis, what goes into collecting and analyzing the data, and how to reach your conclusions.

Click on the whiteboard image above to open a high resolution version in a new tab!

## video transcription

Hello Moz fans. I’m Emily Potter and I work at Distilled over in our London office. Today I’m going to talk to you about it **Hypothesis testing in SEO and statistical significance.**

At Distilled, we use a platform called ODN, the Distilled Optimization Delivery Network, to conduct SEO A/B testing. We use hypothesis testing. You may not be able to use ODN, but even today I think you can learn something valuable from what I am talking about.

## hypothesis test

### The four main steps of hypothesis testing

So when we use hypothesis testing, we use four main steps:

- First we will
**formulate a hypothesis.** - Then we
**collect data**on this hypothesis. - we
**analyze the data**and thenâ€¦ - we
**draw some conclusions**of it at the end.

**The most important part of A/B testing is a strong hypothesis.** So, up here, I talked about how to formulate a strong SEO hypothesis.

## 1. Form your hypothesis

### Three mechanisms for formulating a hypothesis

Now we have to remember that with SEO we are trying to influence three things to increase organic traffic.

- We either try
**Improving organic click rates.**So this is any change you make to make your appearance in the SERPs appear more attractive to your competitors and therefore more people click on your ad. - Or you can
**Improve your organic ranking**so you climb higher. - Or we could too
**Rank for more keywords.**

They could also affect a mix of all three things. But you just want to make sure that one of them is clearly targeted, otherwise it’s not really an SEO test.

## 2. Collecting the data

Next we collect our data. Again, at Distilled we use the ODN platform to do this. Using the ODN platform, we now run A/B tests and split pages into statistically similar buckets.

### A/B test with your controller and your variant

Once we’ve done that, we take our variant group and use mathematical analysis to decide what we think the variant group would have done if we hadn’t made that change.

So up here we have the black line and that’s what it does. It predicts what our model thought the variant group would do if we hadn’t made a change. This dotted line here is the start of the test. So you can see after the test that there was a separation. That blue line actually happened.

Now that there is a difference between these two lines, we can see a change. Moving down here, we’ve just drawn the difference between these two lines.

Because the blue line is above the black line, we call this a positive test. Well, this green part right here is our confidence interval, and this one here is a 95% confidence interval by default. Now we use that because we use statistical tests. So if the green lines are all above the zero line, or if the test is negative, they are all below, we can call that a statistically significant test.

Our best estimate in this case is that sessions would have increased by 12%, and that’s roughly 7,000 organic sessions per month. Well, on both sides here, you can see I wrote 2.5%. It all adds up to 100, and that’s because you never get a 100% sure result. There is always a chance that there will be a random chance and you will have a false negative or positive result. So we then say that we are 97.5% sure that this was positive. That’s because we have 95 plus 2.5.

### Tests without statistical significance

Now at Distilled we’ve found that there are many circumstances where we have tests that aren’t statistically significant, but there’s pretty strong evidence that they’ve had an uplift. If we move down here, I’ll have an example for that. So this is an example of something that wasn’t statistically significant, but we saw a sharp increase.

Now you can see that our green line still contains a negative area, and that means there’s still a chance at a 95% confidence interval that this was a negative test. Now if we fall back down I’ve made our pink again. So we have 5% on both sides and we can say here that we are 95% confident that there was a positive outcome. That’s because that 5% is always above that too.

## 3. Analyze the data to test the hypothesis

Well, the reason we’re doing this is to try and implement changes that we have a strong hypothesis with and be able to get those wins from those, rather than just rejecting them outright. Part of that is because we say we do business, not science.

Here I’ve created a chart of when we might use a test that isn’t statistically significant, based on how strong or weak the hypothesis is and how cheap or expensive the change is.

### Strong hypothesis / cheap change

Over here in your top right corner, if we had a strong hypothesis and a cheap change, we’d probably use that. For example, we recently had a test like this with one of our clients at Distilled, where he added his main keyword to the H1.

This end result looked something like this graph here. It was a strong hypothesis. It wasn’t an expensive change to implement, and we chose to do this test because we were fairly confident it would still be something positive.

### Weak hypothesis / cheap change

Well, on this other side here, if you have a weak hypothesis, but it’s still cheap, then maybe evidence of an increase is still a reason to use it. You would have to communicate with your customer.

### Strong hypothesis / expensive change

In the expensive change with strong hypothesis point, you need to weigh the benefit you might get from your return on investment if you calculate your expected revenue based on the percentage change you get there.

### Weak hypothesis / cheap change

If the hypothesis is weak and an expensive change, we would only want to use it if it is statistically significant.

## 4. Draw conclusions

Now we need to remember that when testing hypotheses, we are only trying to test the null hypothesis. That doesn’t mean that a zero result means there was no effect at all. It just means that we can neither accept nor reject the hypothesis. We say that this was too random to say whether this is true or not.

Now the 95% confidence interval is able to accept or reject the hypothesis and we say that our data is not noise. When it’s less than 95% confidence, like over here, we can’t claim that we’ve learned something like we would on a science test, but we could still say that we have some pretty strong evidence for it that this would lead to a positive effect on these pages.

### The benefits of testing

Now when we talk about it with our customers, it’s because we really aim to give here **Competitive advantage over other people in their industries. **Now is the main benefit of testing **Avoid these negative changes.**

We just want to make sure the changes we make don’t really crash traffic, and we see that a lot. At Distilled we call this a **dodged bullet.**

I hope you can take this into your work and use it with your clients or with your own website. Hopefully you can start hypothesizing, and even if you can’t deploy something like ODN, you can still use your GA data to try and get a better idea of â€‹â€‹whether the changes you’re making will affect your traffic help or harm. That’s all I have for you today. Many Thanks.

Video transcription from Speechpad.com

source