Choosing the right path: following the Dataform trail

In the dense forest that data science can often be, finding a way through can be cumbersome. Although this forest is not made up of decision trees, making choices is essential: without choosing the right path, the right destination will never be reached. Recently, The Data Story has made the decision to switch from the familiar path of DBT to a new route, Dataform. After giving a brief explanation of what DBT and Dataform are, this blog will discuss four of the reasons that led us to switch: integration with Google Cloud Platform, the use of languages, and its capability to perform custom operations and more advanced testing.

Dataform trail

DBT, which stands for Data Build Tool, and Dataform are both data transformation tools. As such, they are both tools that help with the Transform step in the ELT process. DBT is an open-source analytics engineering tool that enables transforming raw data into a more structured and refined format using SQL queries. Dataform is very similar.

As the Dataform Docs put it, ‘Dataform is a service for data analysts to develop, test, version control and schedule complex SQL workflows for data transformation in BigQuery’. Both also provide the possibility to use another language to define operations that would be impossible or intricate to construct with just SQL. DBT, being open-source, is free with a command-line interface, excluding any data platform charges. Dataform, on the other hand, offers an IDE for free, again excluding data platform charges. DBT does offer a cloud version with an IDE, though charging extra on top of data platform costs.

GCP

The main difference between the two is Dataform’s seamless integration with Google Cloud Platform (GCP). This distinction directly introduces the first and most significant reason for switching to Dataform. After Google acquired Dataform in 2020, it has become a big part of data analytics in Google Cloud. The Data Story uses GCP products for nearly all stages in data analysis: from Google Analytics to BigQuery to Looker Studio. It is at the core of our data analyses, which is why Dataform makes such a big difference. For instance, it integrates incredibly well with BigQuery, almost as if it is a part of it, helping us provide efficient solutions while saving time.

It is as if we brought a bike, GCP, with us that rides a lot smoother on one of the two paths of the forest: the Dataform trail. Obviously we want to use this bike if we can; integration with GCP made the decision to switch an easy one.

Dataform trail

Use of languages

Although both tools use SQL to query data, there is still a difference in their choice of language. As explained before, both DBT and Dataform offer a way to go beyond what is possible with SQL. DBT uses Jinja, a templating language, and Dataform uses Javascript. It allows developers to, for instance, easily use if statements and for loops and define reusable functions. Javascript is one of the most popular programming languages and will therefore likely be a preference for most people. It has more similarities with other commonly used languages, making the learning curve less steep. This ease of use is another plus for Dataform.

Custom operations

What’s more, Dataform provides the possibility to write custom SQL operations aside from the possibilities provided by Javascript. Contrary to DBT, Dataform can execute custom SQL queries that do not fit in the Dataform model of defining tables. A separate file can be created with custom SQL commands. Dataform will then execute these in BigQuery. A broader range of capabilities opens the door for good solutions, a big advantage; another point for Dataform.

Testing

Lastly, Dataform allows for more advanced testing than DBT. Both tools provide the option of performing assertions, running tests on data. Through assertions, one can automatically ensure data quality by for example requiring that certain columns are unique or not null. On top of that, as the testing options in DBT reach their limit, Dataform lets you write unit tests. Where assertions test the content of tables, unit tests verify the quality of SQL code. As the Dataform Docs put it: ‘assertions verify data, unit tests verify logic’. At last, Dataform leads the scoreboard and beats DBT again.

In the dense forest of data science, the right path has become clearer. Dataform integrates seamlessly with Google Cloud Platform, its most obvious and convincing advantage. Moreover, Dataform uses Javascript instead of Jinja, making the developer’s life a lot easier. And finally, Dataform offers more possibilities in terms of custom operations and testing. All in all, although DBT is also a great and very similar tool, opting for Dataform has turned out to be a great decision. We have chosen to follow the right path and have not considered looking back through the forest to return to the DBT trail. Tune in next time for a quickstart guide to learn some tips and tricks!

Need some help?

Jorian Faber The Data Story

Jorian Faber

“Data en creativiteit gaan hand in hand: mijn hart gaat sneller kloppen bij het bedenken van innovatieve oplossingen en het ontdekken van patronen die niet vanzelfsprekend zijn.”

More Data stories

BLOG_afbw2
Data stories

Analytics for a Better World

For the second year in a row, our team-member Sophie Caro attended the Analytics for a Better World conference on May 14th. Nowadays, analytics play an important role in increasing...
BLOGquality (1)
Data stories

Building High Performance Data & Analytics Teams

When we work together with our partners, we often assist them in building a data team. Numerous screenings and interviews all demonstrate one thing: finding the right candidate for the...
Data quality
Data stories

Data Quality is not difficult - how automation simplifies

When companies start collecting more data, data quality eventually becomes a topic of discussion. With more plants in your garden, maintenance is a larger responsibility. It is good to be...
clean ai
Data stories

AI needs clean, high-quality data - here’s why

With AI becoming more and more popular, its usage as a technology as well as a buzzword is growing ever so quickly. Even so, the title of this blog contains...
Product Owner
Data stories

Unlocking Data Potential: Role of a Product Owner in Data Teams

In the ever-evolving landscape of data science and analytics, organisations are recognising the need for a holistic and strategic approach to manage their data and turn it into value. One...
BLOGuit
Data stories

What is Server-side tagging in Google Tag Manager?

You’ve probably heard about Server-side tagging and might be wondering “What is it exactly?” and “How is it any different than the current Google Tag Manager setup?”. This blog will...
nl_NLNederlands