Stability in Your Testing Data – Conversion Wednesday

By Chris C.


How do we identify stability in our testing results? The first is to look at the significance levels of your test data (to read more: Have Confidence in Your Testing Data). However, sometimes you don’t have confidence in your results, but the test has been running for over 4 weeks. You have your next test ready to go and you aren’t sure if the current test is helpful or not to your site. Should you let the test continue to run or should you pull the plug and start the next test? AND, if you DO pull the plug on the current test, should you make the change to your site or not?

In order to answer these questions, we have to know how to determine if our results are stable regardless of our significance/confidence levels. To do this, we look at our graphs…

So, let’s look at a few graphs and try to identify traits in those graphs that show stability in your testing data. There are a few different ways you can plot your data to understand what is happening. So, let’s cover the different types of plots and then identify trends displayed by each one.

The first is by plotting your averages over time. Often times, this is the clearest picture that can be provided for your test data. This is always the first view to look at. In the graph below, The red line is our control and the blue line is our winner. You can see stability in our results in two ways:

  1. The graphs don’t cross. They stay parallel to each other as the test continues. And,
  2. The lines remain pretty equidistant throughout the test.

The two added red marks are the same height and show that the distance between the lines has remained stable. Not only that, but you can see that the control AND test experience move together. They both move up and down during the test, but they stay parallel to one another as the test is run. This graph is a good example of stable test results.

However, this graph does not show consistency over time. These lines do not show parallel movement and they are also moving together. They almost seem independent of one another.

The big giveaway is the fact that the lines in this graph cross three times over the course of the test! These results are not helpful in understanding anything about this metric unless we do some digging and find another reason why we might be seeing results like these.

The second way you can plot your data is by the Percentage Of Lift Over Time. This graph shows you the percentage difference above (or below) the control as the test is being run. In this graph, you are looking for the same things as you were for the Average Over Time graph (again the control is in red):

  1. The graphs don’t cross, and
  2. The lines run parallel to the control.

In this graph, the green line is much more stable than the blue line because it has stayed pretty parallel to the control for the entire test. The blue line is starting to decline, but it hasn’t crossed the control. It’ll probably level out if we let this test run another week or two, but the only way to know for sure is to actually let this run for another week or two.

On the contrary, this graph is unstable because the lines cross constantly and there is no parallel trend developing. This graph is not very helpful when trying to make a decision on if the test is working or not.

The last type of graph we’ll cover is Trajectory. Trajectory graphs are good for raw sums of numbers and the distance between the two graphs is the improvement (or negative effects!).

Here’s an example of a stable increase in revenue. It shows the total sum of revenue added over time. Just like our two previous graph types, we want to make sure that the lines don’t cross. Unlike the other graph type, we aren’t looking for parallel lines.

Here, we are looking for an increasing distance between the two lines. In this example, you can see that the green line continues to separate from the red line more and more each day.

This graph is the complete opposite. Not only do the lines cross, but after they cross, there’s no continual improvement over the control.

So, to review, here are a few tips to remember when looking for stability in your test data graphs:

  • Make sure the test winner and the control lines don’t cross
  • If it is an Average or Percentage over time graph, look for parallel lines.
  • If it is a Trajectory graph, look for the space between the lines to continue to widen (it should look like a less than sign “J<“).

Looking for these trends in your graphs will help you see if your results are stable or not. Even if your data is telling you that you have reached confidence, it’s always good to look at these graphs to see if there’s stability in those results.