Help starting Dataform

Getting started with Dataform: tips and tricks to make you more confident

Now that we have decided to switch from Dataform to DBT, how do we get started? Starting to use a new tool always comes with some hurdles. We definitely came across them and had some painful stumbles. To help get started and avoid some of these hurdles, this guide will take you by the hand for your first steps onto the Dataform path. If you already have some experience with Dataform, take a good look at the four section headers to find what you want to learn more about!

First this guide will go over the basics of Git, the backbone of version control in Dataform. Next, this guide will give an example file structure to cling onto when working on your first Dataform project. Thereafter, this guide will explain how to incorporate Javascript into SQL. Lastly, the way to reuse variables and methods will be discussed, along with how to apply this to create ‘templates’ for your workflows that can be deployed across different projects.

1. Basics of version control with Git

Dataform uses Git for its version control system. Although familiar to many, there are definitely people who can use a short explanation to understand the basics. For developers, version control is a foundational utility as it allows to go back and forth between different versions of code. This way, when developers add new features that cause errors, they can easily switch back to working versions of their code.

It can be considered a more advanced variant of the ‘undo’ action when writing documents. The way it works is as follows: all versions are saved on a branch. When a developer wants to add a new feature, he pushes his update to the branch. When a developer wants to get the latest version, he pulls from the branch.

Developers can also create a new branch which is a copy of another branch. Making changes to this copy, this new branch, does not change anything on the original branch, meaning new features can be developed without interfering with the working product. To add a finished feature to the working product, that branch is merged back into the original branch. Dataform creates a new branch for each development workspace that is created. Version control actions can be performed on the top-left of Dataform’s editor.

Dataform
Visualisation of how branches work in Git

2. File structure for your Dataform project

As you might have learned over the years, an organised working environment helps you work more efficiently. Working in Dataform is not any different, organising your project goes a long way. It is good practice to use a file structure, such as the one below that we use:

  • /definitions: contains SQLX files for views and tables
    • sources: declaration of data sources
    • directory for all stages: definition of base, mart, reporting tables
    • tests: assertions
  • /includes: Javascript constants and functions, to be reused in the rest of your repository
    • constants.js
    • functions.js
  • dataform.json: default settings file

3. How to use Javascript in Dataform

The ability to use Javascript in Dataform or Jinja in DBT is what makes these data transformation tools so versatile and useful. Javascript can be incorporated into SQL code in an SQLX file by placing it between curly braces preceded by a dollar sign like so: ${Javascript code here}. This way, Javascript variables and methods defined in different files can be referenced and reused. When your code contains an operation or piece of SQL code that is used over and over again, writing a Javascript method for it can be a wise choice.

Be careful when using Javascript inside of strings, so-called templating: use backticks instead of apostrophes! This tip would have saved me an embarrassing amount of time. Let’s not turn the knife into the wound and quickly move on!

Developers can use the ref() function to reference another SQL view or table and declare it as a dependency for the view or table that contains the ref() function. Dataform uses this to execute actions in the right order when running a workflow, as well as for constructing a dependency graph that gives developers a handy overview of dependencies in a project.

An example is shown below. More information about the ref() function can be found in the Dataform Docs. Another use for Javascript in Dataform is to define how operations should differ when an incremental table is run incrementally or non-incrementally. Incremental tables are a way to append new data to a table without looking through all the old data that is already there, making it a more efficient way of, for instance, saving snapshots. To do so, use the when() function as follows: ${when(incremental(), THEN, OPTIONAL ELSE)}. This enables developers to, for example, define a date range that Dataform should query when an action is run incrementally.

Example dependency graph in Dataform

4. Reusing Javascript variables and methods in Dataform

Reusing Javascript variables and methods is one of the most, if not the most, important feature for us in Dataform. It means code can easily be reused across different datasets and projects, making the setup of similar projects much easier. Javascript variables and methods can be defined in files and exported to the rest of the project by putting them inside a module.exports object inside that file like so: module.exports = { variablename, methodname}. To change variable values across different deployments, use the dataform.json file. The dataform.json file can be used to change a project’s default settings as well as to define project variables. These project variables can be changed per deployment, so let’s consider them ‘template variables’. To define them, add a vars object to dataform.json and add variables in the same way as below:

Dataform

Once these variables have been added, they can be referenced in other files like so: ${dataform.projectConfig.vars.deploycode}. When you create a new release configuration, which is how you deploy and schedule a Dataform project, you can set compilation variables and override the values in vars. After following these steps, you have created a ‘template’ that can be deployed to different projects by pressing only a few buttons, super handy

Overwriting ‘deploycode’ by adding it as compilation variable

After reading these four sections of tips and tricks, you should have a lot more fun and confidence using Dataform! If you still happen to stumble over another hurdle and fall down, take another good look at the Dataform Docs. A more inspiring alternative when falling down is to think of what Rocky said and “keep moving forward”! We are also more than happy to help you move forward!

Dataform

Feel like reading more data stories? Then take a look at our blog page. Always want to stay up to date? Be sure to follow us on LinkedIn!

Need some help?

Jorian Faber The Data Story

Jorian Faber

“Data and creativity go hand in hand. I get excited just thinking about innovative solutions and finding patterns that are not obvious.”

More Data stories

clean ai
Data stories

AI needs clean, high-quality data - here’s why

With AI becoming more and more popular, its usage as a technology as well as a buzzword is growing ever so quickly. Even so, the title of this blog contains...
Product Owner
Data stories

Unlocking Data Potential: Role of a Product Owner in Data Teams

In the ever-evolving landscape of data science and analytics, organisations are recognising the need for a holistic and strategic approach to manage their data and turn it into value. One...
BLOGuit
Data stories

What is Server-side tagging in Google Tag Manager?

You’ve probably heard about Server-side tagging and might be wondering “What is it exactly?” and “How is it any different than the current Google Tag Manager setup?”. This blog will...
Data Maturity Models
Data stories

Why Bother with Data Maturity Models?

In a world saturated with data, the ability to make the best use of it, is not always straight forward and it is a defining factor in the competitive landscape...
Data value
Data stories

Unlocking Data Value with MDMA

Recent research by Salesforce shows that a lot of companies still have issues getting value out of their data. In our opinion a big part of a (partial) solution to...
Help starting Dataform
Data stories

Getting started with Dataform: tips and tricks to make you more confident

Now that we have decided to switch from Dataform to DBT, how do we get started? Starting to use a new tool always comes with some hurdles. We definitely came...
en_USEnglish