Choosing the right path: following the Dataform trail

In the dense forest that data science can often be, finding a way through can be cumbersome. Although this forest is not made up of decision trees, making choices is essential: without choosing the right path, the right destination will never be reached. Recently, The Data Story has made the decision to switch from the familiar path of DBT to a new route, Dataform. After giving a brief explanation of what DBT and Dataform are, this blog will discuss four of the reasons that led us to switch: integration with Google Cloud Platform, the use of languages, and its capability to perform custom operations and more advanced testing.

Dataform trail

DBT, which stands for Data Build Tool, and Dataform are both data transformation tools. As such, they are both tools that help with the Transform step in the ELT process. DBT is an open-source analytics engineering tool that enables transforming raw data into a more structured and refined format using SQL queries. Dataform is very similar.

As the Dataform Docs put it, ‘Dataform is a service for data analysts to develop, test, version control and schedule complex SQL workflows for data transformation in BigQuery’. Both also provide the possibility to use another language to define operations that would be impossible or intricate to construct with just SQL. DBT, being open-source, is free with a command-line interface, excluding any data platform charges. Dataform, on the other hand, offers an IDE for free, again excluding data platform charges. DBT does offer a cloud version with an IDE, though charging extra on top of data platform costs.

GCP

The main difference between the two is Dataform’s seamless integration with Google Cloud Platform (GCP). This distinction directly introduces the first and most significant reason for switching to Dataform. After Google acquired Dataform in 2020, it has become a big part of data analytics in Google Cloud. The Data Story uses GCP products for nearly all stages in data analysis: from Google Analytics to BigQuery to Looker Studio. It is at the core of our data analyses, which is why Dataform makes such a big difference. For instance, it integrates incredibly well with BigQuery, almost as if it is a part of it, helping us provide efficient solutions while saving time.

It is as if we brought a bike, GCP, with us that rides a lot smoother on one of the two paths of the forest: the Dataform trail. Obviously we want to use this bike if we can; integration with GCP made the decision to switch an easy one.

Dataform trail

Use of languages

Although both tools use SQL to query data, there is still a difference in their choice of language. As explained before, both DBT and Dataform offer a way to go beyond what is possible with SQL. DBT uses Jinja, a templating language, and Dataform uses Javascript. It allows developers to, for instance, easily use if statements and for loops and define reusable functions. Javascript is one of the most popular programming languages and will therefore likely be a preference for most people. It has more similarities with other commonly used languages, making the learning curve less steep. This ease of use is another plus for Dataform.

Custom operations

What’s more, Dataform provides the possibility to write custom SQL operations aside from the possibilities provided by Javascript. Contrary to DBT, Dataform can execute custom SQL queries that do not fit in the Dataform model of defining tables. A separate file can be created with custom SQL commands. Dataform will then execute these in BigQuery. A broader range of capabilities opens the door for good solutions, a big advantage; another point for Dataform.

Testing

Lastly, Dataform allows for more advanced testing than DBT. Both tools provide the option of performing assertions, running tests on data. Through assertions, one can automatically ensure data quality by for example requiring that certain columns are unique or not null. On top of that, as the testing options in DBT reach their limit, Dataform lets you write unit tests. Where assertions test the content of tables, unit tests verify the quality of SQL code. As the Dataform Docs put it: ‘assertions verify data, unit tests verify logic’. At last, Dataform leads the scoreboard and beats DBT again.

In the dense forest of data science, the right path has become clearer. Dataform integrates seamlessly with Google Cloud Platform, its most obvious and convincing advantage. Moreover, Dataform uses Javascript instead of Jinja, making the developer’s life a lot easier. And finally, Dataform offers more possibilities in terms of custom operations and testing. All in all, although DBT is also a great and very similar tool, opting for Dataform has turned out to be a great decision. We have chosen to follow the right path and have not considered looking back through the forest to return to the DBT trail. Tune in next time for a quickstart guide to learn some tips and tricks!

Need some help?

Jorian Faber The Data Story

Jorian Faber

“Data en creativiteit gaan hand in hand: mijn hart gaat sneller kloppen bij het bedenken van innovatieve oplossingen en het ontdekken van patronen die niet vanzelfsprekend zijn.”

More Data stories

google ads traffic
Data stories

Why GA4 classifies Google Ads traffic as (Organic) and how to fix it

If you rely on Google Analytics 4 (GA4) and noticed that some of your Google Ads traffic is showing up under the campaign name “(organic)”, you might be wondering why...
BLOG_koekje
Data stories

Five Ways to Enhance Your First-party Data Strategy

Google planned on phasing out third-party cookies due to issues mainly concerning privacy, at the end of 2024. However, they have postponed this phase-out once again, giving businesses (and Google)...
European Women in Technology
Data stories

European Women in Technology 2024 – Part 2

On May 26th and 27th, our team members Yvette and Sophie attended the 2024 installment of European Women In Technology. This event’s main purpose is to share ideas and discuss...
BLOG_ewit
Data stories

European Women in Technology 2024 - Part 1

On May 26th and 27th, our team members Yvette and Sophie attended the 2024 installment of European Women In Technology. This event’s main purpose is to share ideas and discuss...
Ontwerp zonder titel
Data stories

An algorithm for automatic data analysis, explained with a hint of orange

Over the past few months, The Data Story has had its first intern! Dorian is now finishing his Master’s in Data Science with a graduation project, also at The Data...
BLOG_afbw2
Data stories

Analytics for a Better World

For the second year in a row, our team-member Sophie Caro attended the Analytics for a Better World conference on May 14th. Nowadays, analytics play an important role in increasing...
nl_NLNederlands