Choosing the right path: following the Dataform trail

In the dense forest that data science can often be, finding a way through can be cumbersome. Although this forest is not made up of decision trees, making choices is essential: without choosing the right path, the right destination will never be reached. Recently, The Data Story has made the decision to switch from the familiar path of DBT to a new route, Dataform. After giving a brief explanation of what DBT and Dataform are, this blog will discuss four of the reasons that led us to switch: integration with Google Cloud Platform, the use of languages, and its capability to perform custom operations and more advanced testing.

Dataform trail

DBT, which stands for Data Build Tool, and Dataform are both data transformation tools. As such, they are both tools that help with the Transform step in the ELT process. DBT is an open-source analytics engineering tool that enables transforming raw data into a more structured and refined format using SQL queries. Dataform is very similar.

As the Dataform Docs put it, ‘Dataform is a service for data analysts to develop, test, version control and schedule complex SQL workflows for data transformation in BigQuery’. Both also provide the possibility to use another language to define operations that would be impossible or intricate to construct with just SQL. DBT, being open-source, is free with a command-line interface, excluding any data platform charges. Dataform, on the other hand, offers an IDE for free, again excluding data platform charges. DBT does offer a cloud version with an IDE, though charging extra on top of data platform costs.

GCP

The main difference between the two is Dataform’s seamless integration with Google Cloud Platform (GCP). This distinction directly introduces the first and most significant reason for switching to Dataform. After Google acquired Dataform in 2020, it has become a big part of data analytics in Google Cloud. The Data Story uses GCP products for nearly all stages in data analysis: from Google Analytics to BigQuery to Looker Studio. It is at the core of our data analyses, which is why Dataform makes such a big difference. For instance, it integrates incredibly well with BigQuery, almost as if it is a part of it, helping us provide efficient solutions while saving time.

It is as if we brought a bike, GCP, with us that rides a lot smoother on one of the two paths of the forest: the Dataform trail. Obviously we want to use this bike if we can; integration with GCP made the decision to switch an easy one.

Dataform trail

Use of languages

Although both tools use SQL to query data, there is still a difference in their choice of language. As explained before, both DBT and Dataform offer a way to go beyond what is possible with SQL. DBT uses Jinja, a templating language, and Dataform uses Javascript. It allows developers to, for instance, easily use if statements and for loops and define reusable functions. Javascript is one of the most popular programming languages and will therefore likely be a preference for most people. It has more similarities with other commonly used languages, making the learning curve less steep. This ease of use is another plus for Dataform.

Custom operations

What’s more, Dataform provides the possibility to write custom SQL operations aside from the possibilities provided by Javascript. Contrary to DBT, Dataform can execute custom SQL queries that do not fit in the Dataform model of defining tables. A separate file can be created with custom SQL commands. Dataform will then execute these in BigQuery. A broader range of capabilities opens the door for good solutions, a big advantage; another point for Dataform.

Testing

Lastly, Dataform allows for more advanced testing than DBT. Both tools provide the option of performing assertions, running tests on data. Through assertions, one can automatically ensure data quality by for example requiring that certain columns are unique or not null. On top of that, as the testing options in DBT reach their limit, Dataform lets you write unit tests. Where assertions test the content of tables, unit tests verify the quality of SQL code. As the Dataform Docs put it: ‘assertions verify data, unit tests verify logic’. At last, Dataform leads the scoreboard and beats DBT again.

In the dense forest of data science, the right path has become clearer. Dataform integrates seamlessly with Google Cloud Platform, its most obvious and convincing advantage. Moreover, Dataform uses Javascript instead of Jinja, making the developer’s life a lot easier. And finally, Dataform offers more possibilities in terms of custom operations and testing. All in all, although DBT is also a great and very similar tool, opting for Dataform has turned out to be a great decision. We have chosen to follow the right path and have not considered looking back through the forest to return to the DBT trail. Tune in next time for a quickstart guide to learn some tips and tricks!

Need some help?

Jorian Faber The Data Story

Jorian Faber

“Data en creativiteit gaan hand in hand: mijn hart gaat sneller kloppen bij het bedenken van innovatieve oplossingen en het ontdekken van patronen die niet vanzelfsprekend zijn.”

More Data stories

dataform blog 2
Data stories

Getting started with Dataform: tips and tricks to make you more confident

Now that we have decided to switch from Dataform to DBT, how do we get started? Starting to use a new tool always comes with some hurdles. We definitely came...
Screenshot 2023-09-13 at 11.21.48
Data stories

Choosing the right path: following the Dataform trail

In the dense forest that data science can often be, finding a way through can be cumbersome. Although this forest is not made up of decision trees, making choices is...
Mike the headless chicken
Data stories

You should filter out useless traffic, here’s how to!

Useless traffic, be it bot or developer traffic, can be a major issue for data analysts. Certain traffic can be considered useless if it is not representative of actual user...
TDS_part2
Data stories

Simplifying Machine Learning: Less is More - PART 2

In part 1 on simplifying machine learning we talked about the pitfalls of complex models and why opting for simpler models is more often than not the right thing to...
Machine Learning
Data stories

Simplifying Machine Learning: Less is More

As businesses across the globe accelerate their digitalisation efforts, they are increasingly captivated by the power of artificial intelligence (AI) and machine learning (ML). Companies, especially those new to AI...
Analytics for a Better World
Data stories

Analytics for a Better World

I, Sophie, attended the Analytics for a Better World (ABW) annual conference at Amsterdam Business School on May 24th. This event brought together speakers and panelists from different groups: nonprofits,...
nl_NLNederlands