In the dense forest that data science can often be, finding a way through can be cumbersome. Although this forest is not made up of decision trees, making choices is essential: without choosing the right path, the right destination will never be reached. Recently, The Data Story has made the decision to switch from the familiar path of DBT to a new route, Dataform. After giving a brief explanation of what DBT and Dataform are, this blog will discuss four of the reasons that led us to switch: integration with Google Cloud Platform, the use of languages, and its capability to perform custom operations and more advanced testing.
DBT, which stands for Data Build Tool, and Dataform are both data transformation tools. As such, they are both tools that help with the Transform step in the ELT process. DBT is an open-source analytics engineering tool that enables transforming raw data into a more structured and refined format using SQL queries. Dataform is very similar.
As the Dataform Docs put it, ‘Dataform is a service for data analysts to develop, test, version control and schedule complex SQL workflows for data transformation in BigQuery’. Both also provide the possibility to use another language to define operations that would be impossible or intricate to construct with just SQL. DBT, being open-source, is free with a command-line interface, excluding any data platform charges. Dataform, on the other hand, offers an IDE for free, again excluding data platform charges. DBT does offer a cloud version with an IDE, though charging extra on top of data platform costs.
GCP
The main difference between the two is Dataform’s seamless integration with Google Cloud Platform (GCP). This distinction directly introduces the first and most significant reason for switching to Dataform. After Google acquired Dataform in 2020, it has become a big part of data analytics in Google Cloud. The Data Story uses GCP products for nearly all stages in data analysis: from Google Analytics to BigQuery to Looker Studio. It is at the core of our data analyses, which is why Dataform makes such a big difference. For instance, it integrates incredibly well with BigQuery, almost as if it is a part of it, helping us provide efficient solutions while saving time.
It is as if we brought a bike, GCP, with us that rides a lot smoother on one of the two paths of the forest: the Dataform trail. Obviously we want to use this bike if we can; integration with GCP made the decision to switch an easy one.
Use of languages
Although both tools use SQL to query data, there is still a difference in their choice of language. As explained before, both DBT and Dataform offer a way to go beyond what is possible with SQL. DBT uses Jinja, a templating language, and Dataform uses Javascript. It allows developers to, for instance, easily use if statements and for loops and define reusable functions. Javascript is one of the most popular programming languages and will therefore likely be a preference for most people. It has more similarities with other commonly used languages, making the learning curve less steep. This ease of use is another plus for Dataform.
Custom operations
What’s more, Dataform provides the possibility to write custom SQL operations aside from the possibilities provided by Javascript. Contrary to DBT, Dataform can execute custom SQL queries that do not fit in the Dataform model of defining tables. A separate file can be created with custom SQL commands. Dataform will then execute these in BigQuery. A broader range of capabilities opens the door for good solutions, a big advantage; another point for Dataform.
Testing
Lastly, Dataform allows for more advanced testing than DBT. Both tools provide the option of performing assertions, running tests on data. Through assertions, one can automatically ensure data quality by for example requiring that certain columns are unique or not null. On top of that, as the testing options in DBT reach their limit, Dataform lets you write unit tests. Where assertions test the content of tables, unit tests verify the quality of SQL code. As the Dataform Docs put it: ‘assertions verify data, unit tests verify logic’. At last, Dataform leads the scoreboard and beats DBT again.
In the dense forest of data science, the right path has become clearer. Dataform integrates seamlessly with Google Cloud Platform, its most obvious and convincing advantage. Moreover, Dataform uses Javascript instead of Jinja, making the developer’s life a lot easier. And finally, Dataform offers more possibilities in terms of custom operations and testing. All in all, although DBT is also a great and very similar tool, opting for Dataform has turned out to be a great decision. We have chosen to follow the right path and have not considered looking back through the forest to return to the DBT trail. Tune in next time for a quickstart guide to learn some tips and tricks!