The good, the bad and the ugly: Battles of bias in statistical stories
Vas Vasileiou, Head of Data Science
The Good
During the tumultuous era of World War II, the skies were battlegrounds for brave fighter planes. The allies were desperate to improve their planes' survivability, but they faced a conundrum - how to add armour without compromising speed and fuel efficiency.
Enter Abraham Wald, a brilliant Jewish-Hungarian mathematician who fled Nazi prosecutions and found solace in the Statistical Research Group (SRG) at Columbia University. The SRG, a team of savvy statisticians and mathematicians, were tasked with analysing military data to aid in decision-making.
One day, Wald stumbled upon a peculiar revelation while studying the damaged areas of returning planes - a concept that would forever change their approach!
In the iconic picture below, the red dots marked the areas most frequently hit by bullets. The traditional wisdom suggested reinforcing these red zones to protect the planes from incoming fire. But Wald, with a twinkle in his eye, saw through the illusion.
He realised that they were only looking at the planes that made it back safely - the survivors! Naturally, these planes had endured the bullet onslaught without fatal consequences in those specific areas. The key insight was simple yet profound - the untouched or minimally damaged areas were the real weaknesses, unable to withstand the enemy's onslaught and not allowing the planes to return safely and to be analysed. Wald proposed a radical shift in strategy. Instead of reinforcing the heavily hit red areas, reinforce the non-red ones!
This brilliant revelation exposed what we now call "survivorship bias." Wald understood that by solely focusing on the survivors and neglecting the fallen, they were drawing incorrect conclusions due to incomplete data. Survivorship bias is a cautionary tale for decision-makers and analysts alike. It reminds us to consider the full picture, not just the successes, in our assessments. Whether we're analysing historical data, making business decisions, or even charting our personal journeys, let's avoid falling into the trap of survivorship bias.
The bad
Step back in time to the historic 1936 US Election, where one influential weekly magazine embarked on a quest to predict the showdown between Roosevelt and Landon. Armed with ambition, they dispatched a staggering 10 million straw ballots to gauge voting preferences. Remarkably, 2.4 million responses flooded back, a feat that would still impress today.
The magazine's prediction? Landon would reign supreme, winning 57% to 43% against Roosevelt. But alas, reality had a different script, as Roosevelt emerged victorious with an overwhelming 62% to 37% triumph. What went awry, you ask? Let's delve into the intriguing twofold reasons for this predictive failure:
1. Sampling Bias - The Not-So-Average Voter
Ah, the infamous "sampling bias" - the devil in the details! The magazine's first misstep was selecting participants from its own readers, and then randomly plucking names from the lists of registered automobile owners and telephone users. Sounds innocuous, right? Not quite! Picture the 1936 landscape - the mighty Great Depression reigning supreme. Those who could afford weekly subscriptions, cars, or telephones were a league apart from the average US voter. So, naturally, their preferences diverged significantly from the general populace, leading to a skewed prediction.
2. Participation Bias - The Enthusiastic Few:
The sneaky "participation bias" - a subtler foe! Hindsight reveals a startling revelation: those who harboured strong distaste for Roosevelt were all too eager to air their opinions and mail back the survey. Meanwhile, the more ambivalent or supportive voters were less enthusiastic about partaking in the polling process.
Nowadays, we know that the telltale signs are low response rates or significant differences between characteristics of survey selected participants and survey respondents. The solution? Offer incentives to sweeten the deal and entice a more representative pool of respondents.
As we unravel this enthralling tale of polling pitfalls, it becomes evident that understanding biases is the key to unlocking accurate predictions. Today, we stand armed with knowledge, armed with incentives, ready to embrace the intricate dance between sampling and participation. So, dear reader, the next time you encounter a poll, remember the lessons of 1936, and venture forth with newfound confidence in the quest for unbiased, reliable predictions.
The ugly
Picture this: a nasty disease named cholera, wreaking havoc on London between 1830 and 1860, wiping out 6% of the city's population. Doctors believed it was caused by "bad air" (they called it miasma) while the non-medical populace attributed it to all sorts of mysterious reasons…
Enter John Snow!!
John Snow was a young English doctor, armed with scepticism and an unyielding belief in the power of logic and data.
In 1854, an ominous cholera outbreak struck Broad Street in Soho (now Broadwick Street). Unfazed by the prevailing theories, John Snow embarked on a groundbreaking mission. He traversed the streets, knocking on doors, interviewing residents, and recording cholera-related deaths on a map. Little did he know that this map would be a pivotal piece in unravelling the disease's secrets.
An intriguing pattern emerged as Snow's ink traced the chilling tale of casualties. The highest number of deaths clustered around a specific pump in Broad Street. Curiously, none of the brewery workers fell ill. Why? The answer lay in daily allowances of beer that kept the brewery workforce hydrated without having to rely on water from the suspect well.
Armed with his map and unwavering determination, Snow successfully convinced authorities to shut down the problematic pump, saving countless lives in the process. Soon, a profound realisation dawned on the people - the significance of clean water and sanitation in safeguarding public health.
What makes this story truly captivating is the revelation that one can study a disease and save lives using nothing more than logic, statistics, and graphs (hello, data science!). John Snow never peered through a microscope, as the concept of germs and bacteria was yet to be proven. Nevertheless, he unlocked a critical truth using data: cholera is often waterborne, a consequence of inadequate sanitation.
So, next time you stroll through the vibrant streets of Soho, take a moment to pay homage to the heroic pump on Broadwick Street. Its closure may have been a simple act, but it led to a monumental leap in our understanding of disease transmission, thanks to a brilliant doctor armed with wit, wisdom, and the magic of data!!