Arpit J | Notes

A/B Testing 101

Since this year started, I have been working on a project on controlled online experimentation system implemented from scratch, christened:
Abacus: The A/B Testing Project™.
It’s currently in the SIT environment right now where I’m interning, with a few natural holdups that happen due to cross-team collaboration. Nevertheless, I’m confident we’ll take it to production soon.

Case Study

Let’s say I bake two different batches of cookies. The first batch I call Hide n’ Sleep (my favourite), and the second I call Padhle-G (the classic). I’m not sure which batch would do better in the market if I was to sell them, so I conduct a small experiment on some 20 friends to see which cookie is more liked.
I give 10 of my twenty friends Hide n’ Sleep, and the remaining 10 get Padhle-G and let’s say

  • 3/10 friends in the first group liked Hide n’ Sleep.
  • 7/10 friends in the second group liked Padhle-G.

Assuming my friends have been random and have had no bias in each group, it’s safe to say that Hide n’ Sleep was a bad idea. Our winner cookie, to convert profits in the market: Padhle-G 🏆

What is A/B Testing?

The above between-subject study is fundamental to the concept of A/B testing.
By definition, A/B testing (a.k.a. split testing, bucket testing) is a method of comparing two versions (A and B) of a webpage, application or other user experience to determine which performs better.

This is carried out by randomly assigning different groups of users one variant each and measuring the fulfilment of a Key Performance Indicator (KPI) with respect to the experiment for each variation.

In the previous example, our variations, conventionally called:

  • 0: Control (Hide n’ Sleep)
  • 1: Variation (Padhle-G)

were tested for the KPI - Cookie Preference Rate. This metric allowed us to decide the winner variant.

Why Exactly?

Instead of guessing what might work based on opinions, experience or assumptions, A/B tests helps in testing hypotheses directly on the consumers and how they respond to different variants. This helps entities running A/B tests see firsthand what clicks with their audience and what falls flat.

Data driven insights offered by A/B tests and other similar methodologies highlight the “we know this will work” as opposed to the “we think this will work” offered by opinions.

Steps to Conduct an A/B Test

The procedure for conducting an A/B test is as simple as:

  1. Identifying the goal (or KPI) - The metric which we want to measure and improve performance for.
  2. Creating variations - Develop the variations against which the KPI has to be measured.
  3. Variation assignment - Users need to be bucketed randomly, in an unbiased manner to experience the different variants. We also need to ensure that the users experience the variant consistently or it would compromise the data integrity.
  4. Collect data - Track the performance for each variant as a result of user journey/actions performed.
  5. Data analysis - Analyze the gathered data to compare the performance of variants against each other.

Findings of the A/B test can help data-driven decision making, a preferred alternative to the traditional HiPPO.

Going back to the case study, if I hadn’t conducted A/B tests, I would’ve baked and sold my favourite Hide n’ Sleep. But thankfully, now I know how selling Padhle-G would easily make me a profit of ~40% over selling Hide n’ Sleep (eliminating other complicated market factors, for this simple example).

Super Basic A/B Test

A simple A/B test I developed compares the performance of different texts over a ‘submit’ button.

Variants for the super basic A/B test

The goal of this A/B test is to compare which text gets us more clicks on the submit button. While the impact may appear negligible initially, on a larger scale small things like this can have huge financial implications.

Data flow

Flow of data when user experiences a variation on webpage hit

Combining the data on:

  • table which has the data for which user experienced which variant
  • the table where submit button clicks are logged with respect to each user

Gives me a result set with which I can calculate which variant experienced how many clicks on the button. This helps me identify the winner easily. And it’s backed by science!