Bias-variance tradeoff: a data science concept that applies to living well
It's all about the balance between simplicity and complexity
Welcome to Art Science Millennial, a newsletter for non-techies navigating the world of tech! I know the struggle because I’m one of you.
When it comes to living the good life, it’s striking how many aspects have to do with balance. Juggling family and career? You need work-life balance. Eating healthily? That’s a balanced diet. Seeking inner well-being? It’s all about finding your inner balance.
Finding balance is about tradeoffs, and it turns out that tradeoffs are a fundamental aspect of data scientists’ work as well — specifically, the bias-variance tradeoff, one of the first concepts you will learn if you explore data science.
What is the bias-variance tradeoff?
Data science is about using data to train a model that can make predictions. For instance, you could train a model to spot signs of disease in eye scans.
No matter your prediction problem, there are many models to choose from but they all come with two forms of error: bias and variance. Keep those two words in mind and we’ll return to them in a while.
A basic example below shows how a straight line is used as a model to approximate the trend of the data.
Simple models — such as a straight line — tend to be less flexible and indeed, notice that the straight line doesn’t quite capture the data points to the right. Inflexible models are said to have high bias.
If we add some flexibility to the line and make it a curve that fits the data better, we lower the bias of the model.
Like what you’re reading so far? Check out these podcasts where I talk about my career switch and how those interested in data can get into the world of data analytics!
In general, the more complex a model the lower the bias. So if we crank up the complexity we can end up with something like this squiggly line that has really low bias.
But such an overly complicated model leads to the other problem of high variance. Models with high variance perform really well on data it’s trained on but really poorly on data it hasn’t seen. In data science, this is known as overfitting the training data.
The curve, having struck a happy balance between simplicity and complexity, represents both the old and new data equally well, precisely because it doesn’t fit either set perfectly.
Simple models have high bias and low variance while complex models have low bias and high variance. As seen in the diagram below, data science is about minimising error by getting the right bias-variance tradeoff.
Check out this video explanation of bias and variance for a fuller understanding.
Lessons for life
We grapple with balance in life all the time as well. When we have too much bias, our perspective becomes inflexible even in the face of new evidence. But too much variance and we flip-flop every time we hear a new opinion.
I try to keep this in mind when it comes to my career. If I get too comfortable and set in my ways in a certain job, then it’s likely that I won’t be able to see the writing on the wall even when change and disruption is imminent. But there’s also the danger of perpetually chasing the next trend and never staying long enough in one area to make a lasting impact.
Another good example of balance is in investing. It’s a poor strategy to invest in a company just because it’s famous without knowing anything else. On the other hand, obsession with every fluctuation in the markets usually leads to losing money.
To me, it’s comforting to know that even something as sophisticated as data science boils down to not having a simplistic view and also not overly complicating matters. It reminds me that in life, there is often no perfect solution but rather tradeoffs to make the best of a situation.
I’d love to know what you think of this newsletter and what you’d like me to write about. You can reach me by replying to this email or by leaving a comment if you’re reading this on the Art Science Millennial website. If you enjoyed this piece, sign up so you get subsequent updates in your inbox!
This is a nice model for assessing systems and processes in daily life. Was a good read to help me improve my ability to determine if I have the right balance in my own things