OPINION: Lazy statistics could be our greatest UX challenge

14 Jun 2014

Share on FacebookTweet about this on TwitterShare on LinkedInShare on Google+Pin on PinterestShare on RedditEmail this to someone

Share on FacebookTweet about this on TwitterShare on LinkedInShare on Google+Pin on PinterestShare on RedditEmail this to someone

If you toss a coin 10 times, what is the probability you will get five heads and five tails?

The question isn't a general question about how many heads and tails you might expect; the question refers to that specific instance of the experiment, what is the probability you will receive five heads and five tails on that particular go? The answer isn't a half, it is in fact less than half of a half, 0.246 to be precise. P(X = 5) = (10C5) * .5^5 * .5^5 = .246 for those who really wish to know.

The City of Boston has famously embraced big data as part of its ongoing programme of regeneration and has been highly regarded for its Street Bump initiative. This programme involved smartphone users downloading an app that measured their car's acceleration and deceleration in certain parts of the city, allowing it to predict where potholes were occurring and repair was required. As Boston residents drove around the city, their smartphones were collecting small data, which city authorities collated into big data to keep roads smoother and safer.

The city proudly proclaims the "data provides the city with real-time information it uses to fix problems and plan long-term investments".

Predictable outcome

While the initiative is laudable, the outcome, when examined, is entirely predictable based on statistical theory. Unmoderated, Street Bump strongly favours young affluent areas, where a greater proportion of residents own smartphones. The key insight is that every pothole detected from Street Bump-enabled smartphones is not every pothole in the city.

This represents a key statistical challenge, avoiding sample bias. The other challenge is to ensure the data set used is large enough to provide the experiment with enough statistical power.

Statistical power is the probability that a statistical test will detect a difference between two values when the underlying difference is real. Going back to our coin test, if we tossed it 10 times it's not inconceivable that we would get three heads and seven tails. Without consideration for size of data set and power of the statistics, we might incorrectly conclude that tails is the dominant result for the coin. In the context of A/B or multivariate tests upon which many website improvement programmes are based, this might lead us to recommend the "tails" option, which as we know would be incorrect.

The reason that statistical power is important is that because we live in a random universe (we may not but that is a discussion for another day when we have lots more time) tests will sometimes create false positives, such as the one above. The probability of this happening is fixed by common convention at 5pc and so the closer a test power is to 80pc or 90pc then the more confidence we can have that on balance false positives have been dealt with.

Volume = meaningful insights

Web mega-brands such as eBay, Amazon and Google have built nearly their entire user-experiences using A/B and multivariate tests and we are right to replicate their approach to success and product design improvements. It is said we may never know what the true Google is because at any one time it is running up to 7,000 split tests in a bid to constantly improve and enhance life for the user. However, the great luxury which these online behemoths enjoy is volume and that enables them to glean statistically meaningful insights very quickly and very regularly.

Let's copy their focus on user behaviour and learn from their pioneering processes but let's remember that until we reach their scale, we are going to have to be much more focused on avoiding statistical bias, and getting our hands on sample sizes of adequate scale to make robust recommendations.

Veteran American sports broadcaster Vin Scully claims "Statistics are used much like a drunk uses a lamppost: for support, not illumination". It's time for the UX industry to sober up and tackle lazy statistics.

Gareth Dunlop

 Gareth Dunlop Gareth Dunlop owns and runs Fathom, a user-experience consultancy that helps ambitious organisations get the most from their website and internet marketing by viewing the world from the perspective of their customers. Specialist areas include user testing, usability and customer journey planning, web accessibility and integrated online marketing. Clients include Fáilte Ireland, Power NI, Telefónica, Ordnance Survey Ireland, and Savile Row. Visit Fathom online.

Statistics image via Shutterstock