A scalable way to handle multiple GA4-properties in Dataform

Many organisations don’t just have one GA4 property – they have several.  A webshop might split brands, countries and domains across different properties, and before you know it you’re maintaining a small fleet of almost-identical setups. The setup works fine, until you need to change something: Add a new metric? Fix a bug in your event model? Adjust a channel grouping? Suddenly you’re touching the same logic in every property-specific SQL file. It’s repetitive, error-prone and it certainly doesn’t scale when “a few properties” slowly turns into “many”.

In this blog, I’ll walk you through how we tackle this in Dataform by generating SQL through modular functions in JavaScript.

The problem: many properties, many manual changes

A typical GA4 modelling setup in BigQuery often looks like this:

  • One flattened events table per property
  • Multiple sessions models per property  
  • One trends or reporting model per property 

Structurally, those models are almost identical: they query the same GA4 export schema, apply the same session logic, and use the same channel grouping rules. The main difference between these properties are the dataset name and (possibly) property-specific event parameters. 

If everything is written directly in SQL, every change in the pipeline becomes a mini-migration across all properties. For each adjustment you have to individually open each property’s model and edit the SQL.

That’s manageable for two or three properties. It’s far less fun for ten or more, and it becomes a real risk when your analytics landscape keeps evolving, as typos could slip in, or a property could be missed.

The solution: JavaScript-generated SQL in `/includes`

Dataform allows you to mix SQL with JavaScript. An `.sqlx` file can contain a `js { … }` block and embed JavaScript expressions inside the SQL using `${ … }`. We use this to push all core logic into JavaScript files in the `/includes` folder, and keep the property-specific models as thin as possible.

A simplified example for an events model might look like this:

-- models/property_1/base__events_flattened.sqlx
config {
  type: "incremental",
  schema: dataform.projectConfig.vars.ga4_schema
}

js {
  require("includes/helpers");
  var _project          = dataform.projectConfig.vars.project;
  var _dataset          = dataform.projectConfig.vars.property_1_dataset;
  var _table_final_days = dataform.projectConfig.vars.table_final_days;
  var _param_set        = "property_1";
}

${ga_events_flattened.base__events_flattened(
  _project,
  _dataset,
  _table_final_days,
  _param_set
)}

This file doesn’t contain the actual query logic. Instead, it calls a JavaScript function:

// includes/ga_events_flattened.js
function base__events_flattened(source_project, source_dataset, table_final_days, set_key) {
  // build and return a full SQL query as a string
}
module.exports = { base__events_flattened };

For another property, say `property_2`, we create almost the same `.sqlx` file, but only change the dataset variable and the parameter key:

var _dataset   = dataform.projectConfig.vars.property_2_dataset;
var _param_set = "property_2";

The pattern repeats for every step in the pipeline:

  • `/includes/ga_sessions_traffic_source.js` for sessions and traffic source
  • `/includes/ga_sessions_channelgroup_lnd.js` for channel groupings
  • `/includes/ga_reporting.js` for trend and reporting tables 

Each file exports functions like `base__sessions_traffic_source(…)` or `reporting_trends(…)` that return SQL as a string. The models themselves simply plug in the right project, dataset and key. As a result, we have all business logic in one place, and each property-specific model only needs to pass names and identifiers. If at some point we decide to adjust session logic, change how we read GA4 events, or update our trend tables, we do that once in `/includes` and every property benefits automatically.

Extra customisation: per-property `event_params` in config

Of course, GA4 properties aren’t completely identical. One property might track the parameter `current_location`, whereas another might have `content_group`, and yet another might capture something entirely different. To support this, we add a shared configuration layer for event parameters (and other fields) in a central config file in `/includes`.

Conceptually, that config looks like this:

// includes/config.js

const event_params_set_property_1 = [
  { type: "int",    name: "batch_ordering_id" },
  { type: "string", name: "campaign_id" },
  { type: "string", name: "current_location" }
];
const event_params_set_property_2 = [
  { type: "int",    name: "batch_ordering_id" },
  { type: "string", name: "campaign_id" },
  { type: "string", name: "content_group" }
];
// same idea for item_params_set_* and user_properties_set_*


module.exports = {
  event_params_set_property_1,
  event_params_set_property_2,
  // ...
};

We follow a simple naming convention:

  • `event_params_set_<property>` 
  • `item_params_set_<property>` 
  • `user_properties_set_<property>`

The key we pass from the `.sqlx` file (`_param_set = “property_1″`) is enough to look up the right configuration inside `ga_events_flattened.js`:

const config = require("includes/config");
const { generateParamsSQL } = require("includes/helpers");

function base__events_flattened(source_project, source_dataset, table_final_days, set_key) {
  const resolvedEventParams    = config[`event_params_set_${set_key}`] || [];
  const resolvedUserProperties = config[`user_properties_set_${set_key}`] || [];
  return `
    SELECT
      event_date,
      event_timestamp,
      user_pseudo_id,
      ga_session_id,
      ${generateParamsSQL(resolvedEventParams)}
      -- plus user_properties, etc.
    FROM \`${source_project}.${source_dataset}.events_*\`
    WHERE _TABLE_SUFFIX >= FORMAT_DATE(
      '%Y%m%d',
      DATE_SUB(CURRENT_DATE(), INTERVAL ${table_final_days} DAY)
    )
  `;
}

A helper like `generateParamsSQL` then turns that configuration into the repetitive SQL needed to safely extract and cast each parameter from `event_params`.

This allows each GA4 property to have its own list of parameters and user properties, all defined in one place.

Keeping the pipeline aligned downstream

So far, we’ve focused on the base models. But downstream tables (sessions, channel groupings, trends, dashboards) still need to know which property-specific tables to use.

Two things matter here:

  1. Consistent naming: We keep a clear naming scheme such as `ga_property_1_sessions_intermediate`, `ga_property_2_sessions_intermediate`, and so on, inside each dataset
  2. Explicit dependencies: Even though the SQL is generated by JavaScript, we still declare dependencies in Dataform so the DAG is correct and things run in the right order.

Because the SQL is generated in JavaScript, we don’t rely on `ref()` inside SQL strings. Instead, the JavaScript itself uses the consistent naming pattern to address the right tables.

A simplified example for a sessions step might look like this:

// includes/ga_sessions_traffic_source.js

function base__sessions_traffic_source(
  source_project,
  source_dataset,
  source_property
) {
  return `
    SELECT
      session_date,
      session_id,
      traffic_source,
      medium,
      campaign
    FROM
      \`${source_project}.${source_dataset}.ga_${source_property}_sessions_intermediate\`
  `;
}

module.exports = { base__sessions_traffic_source };

And the corresponding model for `property_1`:

-- models/property_1/base__sessions.sqlx
config {
  type: "table",
  schema: dataform.projectConfig.vars.ga4_schema,
  dependencies: ["property_1__base__events_flattened"]
}

js {
  var _project         = dataform.projectConfig.vars.project;
  var _dataset         = dataform.projectConfig.vars.property_1_dataset;
  var _source_property = "property_1";
}

${ga_sessions_traffic_source.base__sessions_traffic_source(
  _project,
  _dataset,
  _source_property
)}

A few things to notice here: 

  • The dependency on `property_1__base__events_flattened` is declared in the `config` block. That tells Dataform that the sessions table for `property_1` depends on the events table for `property_1`, even though the actual SQL is generated in JavaScript
  • Inside the JavaScript, we construct the table name using a consistent pattern: 
      `ga_${source_property}_sessions_intermediate`.
  • The `_source_property` string (`”property_1″`, `”property_2″`, etc.) is the only thing that changes per property; the logic stays the same.

Configuration and naming conventions make it easy to target the right property, dependencies make sure everything runs in the right order.

Conclusion

Working with multiple GA4 properties does not have to mean maintaining multiple parallel SQL codebases.

By:

  • recognising the core problem,
  • moving pipeline logic into JavaScript functions,
  • capturing per-property differences in a simple `config` file,
  • and using consistent naming for intermediate tables

we end up with a setup that is both flexible and maintainable! New GA4 properties become a matter of adding configuration and a small wrapper, rather than cloning and editing a stack of queries. Because when we refine our modelling logic, we do so once, with confidence that every property stays in sync and every downstream model uses the right inputs. In other words: the number of properties may grow, but the amount of manual work – and the risk of mistakes – does not need to grow with it.

More Data stories

blog_pic
Data stories

A scalable way to handle multiple GA4-properties in Dataform

Many organisations don’t just have one GA4 property – they have several.  A webshop might split brands, countries and domains across different properties, and before you know it you’re maintaining...
Dataform Railway Design
Data stories

Staying on Track with Dataform Railway Design - Streamlining Dataform Development with Local Setup and CI/CD

Explore how to streamline Dataform local development using CI/CD integration. Automate schema testing, manage environments, optimize workflows, and build scalable, reliable data pipelines.
BLOG_dorian
Data stories

The Data Story’s research on heuristic and data-driven attribution models: rigidity versus flexibility

Marketers recognise that accurately attributing revenue to marketing efforts is key to better decision-making, budgeting, and strategy. However, implementing marketing attribution effectively is challenging. In marketing attribution, we assign credit...
BLOG_Bayesian
Data stories

Frequentist Over Bayesian: A Statistician's 'Normal' Choice

Probability and statistics lie at the centre of data science. There are different ways of interpreting and expressing probability. Very often, it is expressed using the function P(), where P(a)...
google ads traffic
Data stories

Why GA4 classifies Google Ads traffic as (Organic) and how to fix it

If you rely on Google Analytics 4 (GA4) and noticed that some of your Google Ads traffic is showing up under the campaign name “(organic)”, you might be wondering why...
BLOG_koekje
Data stories

Five Ways to Enhance Your First-party Data Strategy

Google planned on phasing out third-party cookies due to issues mainly concerning privacy, at the end of 2024. However, they have postponed this phase-out once again, giving businesses (and Google)...
nl_NLNederlands