Home » Data stories » Simplifying Machine Learning: Less is More

Simplifying Machine Learning: Less is More

As businesses across the globe accelerate their digitalisation efforts, they are increasingly captivated by the power of artificial intelligence (AI) and machine learning (ML). Companies, especially those new to AI and ML, are often intrigued by the allure of complex models which promise high performance and prestige. Business often proudly announce that they use “haute couture” ML techniques such as random forests, support vector machines, neural networks, K-means clustering and boosting.

This idea is not difficult to wrap your head around. Let’s be honest: as a Data Science Consultant, who would not be tempted to tell potential clients that they will be using a “support vector machine with infinity kernel” instead of a logistic and a “least absolute shrinkage and selection operator” regression, instead of a linear regression. Complex technologies are often considered to have extraordinary capabilities and tell compelling narratives of success.

Life-saving simplicity

However, a trend towards “state of the art” models overlooks one crucial question: do we actually need them? Think about this: if you had to put a campfire out, would you prefer to mobilise a thousand firefighters and a helicopter ,able to airlift a water tank, or would you rather reach for a nearby bucket of water? Is the goal not just to extinguish the fire and would a bucket not accomplish the same goal? The same concept applies to the world of machine learning: are complex models necessary for driving business success?

Black boxes

Questioning necessity is especially important as complex models come with a trade-off: they are often a “black box”. The underlying mechanism and computation are difficult, sometimes near impossible, to understand. Given input data, the model gives predictions as output, without knowing for certain what happens in between. Since business decisions are high-stake by nature, this uncertainty poses a risk.

Business decisions require accountability, trust, and an understanding of the underlying reasoning. Imagine making a million-dollar investment based on a prediction from a model you don’t understand. It is as if you are driving blindfolded on a high-speed highway. You might be able to drive straight for a while with help, but would you take the risk?

To further illustrate, consider an e-commerce company that is trying to increase sales by recommending products to its customers based on their browsing and purchasing history. A complex model, such as a convolutional neural network, is able to provide highly personalised product recommendations and potentially drive up sales. However, this model is a “black box”. It churns out recommendations, but it does not provide insights as to why it recommends a certain product to a particular customer.

Imagine a scenario where a customer who has only ever purchased cat food suddenly receives a recommendation for a high-end camera. If the customer questions the recommendation, the company would not be able to provide a satisfactory explanation. This could lead to confusion, mistrust, and a loss of engagement. More importantly though, by understanding why certain products are recommended more than others. The e-commerce company can gain valuable insights into customer behaviour.

These insights can help drive marketing strategies. For example, if the model shows that customers who buy cat food are more likely to purchase certain types of cat toys, the company could recommend cat toys to these customers, or decide to offer a bundle deal combining cat food and cat toys, attempting to increase sales.

Opting for simple

To help identify clear patterns in a customer’s purchase history, a simpler and more interpretable model such as a rule-based recommendation system can be used. It could, for instance, identify that a customer buys dog food every month, and therefore recommend products popular among dog owners. Although it might not offer the same level of personalisation, there is an argument to be made for interpretability. Interpretability is crucial for model debugging and improvement. If a model is making consistent errors, understanding why it makes those errors is the first step towards improving it. If we are unable to interpret the model, the debugging process becomes akin to finding a needle in a haystack.

Complex models are often not as interpretable. This means that when they make a decision, it is difficult to understand why. If an AI model makes significant decisions, such as who gets a loan, who gets hired, or even who gets parole, the lack of transparency can be deeply problematic. It undermines accountability: when things go wrong, it is challenging to identify where and how the system failed. Moreover, there is a risk that decision-makers over-rely on complex models because they perceive them as all-knowing, without fully understanding them. This could lead to critical decisions being made without any human judgement.

While advanced models have their place and are undeniably powerful, the pursuit of complexity for the sake of complexity makes little sense and can lead us astray. Just because a model is complex and can yield high accuracy, it doesn’t necessarily mean it is the best choice for a given problem. In many cases, simpler models can be just as effective, if not more so. They can provide a good trade-off between interpretability and accuracy and can be much easier and cost-effective to implement.

Wait, we’re not done! Read part 2 on the advantages of both will be put side by side for a comparison.

Need some help?

Nino Weerman

"As a passionate data enthusiast, I am fascinated by the incredible potential of data to uncover hidden insights and predict future trends. I'm constantly energized by the myriad ways data can be harnessed to drive success and thrive on transforming complex business goals into clear, actionable data analysis plans."

How server-side tracking enhances data accuracy in GA4

Have you ever noticed that the number of conversions displayed in your reports is smaller than the actual amount shown in the CRM? If yes, you should have already gone...

Data stories

Beyond Accuracy: How to Evaluate Unsupervised Models for Reliable Data Insights

Unsupervised learning is a form of machine learning that identifies patterns and structures in data without relying on labelled examples or predefined outcomes. That is both its greatest strength and its biggest...

Data stories

GA4 Measurement Protocol: Sending Server-Side Events with Webhooks

Have you ever wondered how to see refunds in GA4 or how to add (dis)qualified leads to GA4? This is where the GA4 Measurement Protocol really shines. The GA4 Measurement...

Data stories

Why you need a marketing data pipeline in 2026

It’s Monday morning. You take a sip of your first coffee and open Looker Studio as the weekly marketing performance meeting begins. Everyone is eagerly waiting for the numbers. You...

Data stories

A scalable way to handle multiple GA4-properties in Dataform

Many organisations don’t just have one GA4 property – they have several. A webshop might split brands, countries and domains across different properties, and before you know it you’re maintaining...

Data stories

Staying on Track with Dataform Railway Design - Streamlining Dataform Development with Local Setup and CI/CD

Explore how to streamline Dataform local development using CI/CD integration. Automate schema testing, manage environments, optimize workflows, and build scalable, reliable data pipelines.

Simplifying Machine Learning: Less is More

Life-saving simplicity

Black boxes

Opting for simple

Need some help?

Nino Weerman

More Data stories

How server-side tracking enhances data accuracy in GA4

Beyond Accuracy: How to Evaluate Unsupervised Models for Reliable Data Insights

GA4 Measurement Protocol: Sending Server-Side Events with Webhooks

Why you need a marketing data pipeline in 2026

A scalable way to handle multiple GA4-properties in Dataform

Staying on Track with Dataform Railway Design - Streamlining Dataform Development with Local Setup and CI/CD

The Data Story