Analyzing trends in search keywords
And here it is, our very first own blogpost! As already mentioned in the announcement of our new site, we will use this blog to share interesting insights, knowledge, work arounds, fixes and scripts that we have created over time. In this very first, we will share and discuss an easy to use BigQuery script to find trending search keywords, products, etc.
Often the problem with analyzing trends is that it is hard to make a distinction between meaningless spikes and impactful trends. A search word that is used a lot on a single day might not mean anything, but if it is used often over a longer period it might be meaningful. To analyze the meaningful trends, we took inspiration from stock trading.
Defining trends in stock trading
In stock trading a frequently used calculation to evaluate the price trend of a stock is based on two moving averages. For a full explanation on how this is used in stock trading, please click here. In short, the average of a value is calculated based on a set timeframe to create the Moving Average. This can then be done for multiple timeframes, let’s say one long and one short timeframe. For example, the average of the last five days (MA5) and the average over the last 20 days (MA20). The longer timeframe (MA20) will move more smoothly and slowly, while the shorter timeframe (MA5) is more susceptible to sudden changes. The difference of these two numbers indicates which items are suddenly spiking and how meaningful this is.
Let’s take a stock and keep track of it over time in a graph (black line in figure 1). We will calculate the MA200 and MA50 and plot these as well in the graph. If the stock declines in value, the MA50 line will follow that decline shortly after, while the MA200 will not show a decline yet. The moment the MA50 line drops below the MA200 line, you should sell your stock (red arrow). Similarly when an upward trend is started and the MA50 line crosses the MA200 again after a price increase, you should buy stock (green arrow). In other words when the index MA50/MA200 is positive then there is a positive trend and vice versa.
The advantage of this approach is that it neglects meaningless single day spikes and looks at the longer term trends. Therefore this same approach can be used in finding trends in search keywords.
Create the calculations in BigQuery
To create the calculations in BigQuery you need an item-set with items you tracked over time. If you have a Google 360 environment where Google Analytics data is offloaded into a BigQuery project, this is already available.
The script we have created makes the same calculations as the chart above but now for search keywords. Next to that, it creates some measures to compare keywords and their trends with each other. These are called the “Power_nr_searchers” and the “Power_share” that indicate how strong the trend is compared to other search keywords in terms as the absolute number of searches or the share of searches that keyword has.
# Create the long and short range (in days) for the trend calculation.
# See the explanation in the blog for the proper use of these values.
# SET (short, long) = (5,14)-- any number
# Create dataset with all onsite searches based on unique visitors and keyword+date
# Only picking up words that do not contain other values then text or only numbers
WITH
preselect AS (
SELECT
DISTINCT(date) AS date_derived,
UPPER(Page.SearchKeyword) AS SearchKeyword,
COUNT (DISTINCT fullvisitorID) AS nr_searches_word
FROM
--- enter: dataset with the Google Analytics offloading schema
,
UNNEST(hits) AS T
WHERE
Page.SearchKeyword IS NOT NULL
AND REGEXP_CONTAINS(Page.SearchKeyword, '^[A-Za-z0-9]+$')
AND REGEXP_CONTAINS(Page.SearchKeyword, r'[a-z]')
AND (isEntrance IS NULL
AND trafficSource.medium NOT LIKE 'cpc')
GROUP BY
1,
2
ORDER BY
2,
1 DESC ),
# Calculating the number of searches per day
dates_volumes AS (
SELECT
DISTINCT(date) AS date_derived,
COUNT (DISTINCT fullvisitorid) AS nr_searches_dag
FROM
--- enter: dataset with the Google Analytics offloading schema, same as in "preselect"
,
UNNEST(hits) AS T
WHERE
Page.SearchKeyword IS NOT NULL
AND (isEntrance IS NULL
AND trafficSource.medium NOT LIKE 'cpc')
GROUP BY
1 ),
# Calculating the share of searches for each word on that specific day
share_of_searches AS (
SELECT
A.*,
B.nr_searches_dag,
nr_searches_word/nr_searches_dag AS share
FROM
preselect A
JOIN
dates_volumes B
ON
A.date_derived = B.date_derived
WHERE
nr_searches_word > 2 ),
#Finding the previous values for number of words/share for each unique keyword
previous_share_of_searches AS (
SELECT
*,
LAG(nr_searches_word) OVER(PARTITION BY Searchkeyword ORDER BY date_derived) AS LAG_nr_word,
LAG(share) OVER(PARTITION BY Searchkeyword ORDER BY date_derived) AS LAG_share
FROM
share_of_searches
ORDER BY
searchkeyword,
date_derived ),
# Calculating indexnumbers for share and number of searches for each unique keyword
index_share_of_searches AS (
SELECT
*,
nr_searches_word/LAG_nr_word AS index_searches_yesterday,
share/LAG_share AS index_share_yesterday
FROM
previous_share_of_searches ),
# Calculating the power of the indexnumbers
# Calculating the moving averages
moving_averages AS (
SELECT
*,
(index_searches_yesterday - 1)* nr_searches_word AS Power_nr_searches,
(index_share_yesterday - 1) * nr_searches_word AS Power_share,
AVG(nr_searches_word) OVER(PARTITION BY SearchKeyword ORDER BY date_derived ASC ROWS BETWEEN 5 PRECEDING AND CURRENT ROW) AS MA_short_searches,
AVG(nr_searches_word) OVER(PARTITION BY SearchKeyword ORDER BY date_derived ASC ROWS BETWEEN 15 PRECEDING AND CURRENT ROW) AS MA_long_searches
FROM
index_share_of_searches )
# Creating the final dataset for DataStudio
# Calculating the power of the trend numbers based on the moving averages
SELECT
*,
(MA_short_searches - MA_long_searches) AS trend_power,
(MA_short_searches - MA_long_searches) / MA_short_searches AS relative_trend_power
FROM
moving_averages
Implement it yourself
To implement it just follow the following steps:
- Enter the correct dataset from Google BigQuery in line 15.
- Enter the correct dataset from Google BigQuery in line 37.
- Choose the correct Short term period at line 90 for the ‘X’ PRECEDING ROWS
- Choose the correct Long term period at line 91 for the ‘X’ PRECEDING ROWS
- Run the script
For the long and short term period we have chosen 5 and 15 since this worked best with our dataset and the business we applied this script to the last time. However, you should always try to understand your business. This means that you should know:
- How fast is the business moving?
- How fast do/can we react?
- How important/expensive is it to be on time/too early/too late?
You can either save the query result to a table for analysis purposes, or run it on a daily basis via a scheduled query. Currently the script is focused on a single run for the complete timeframe. Although it does not require a lot of resources, you can optimize it when you want to do a run daily. Therefore you need to save only the last rows that are applicable for your timeframe (say last day in a daily run) and the timeframe of the datasets needs to be at least the same as the longest timeframe used for the Moving Average.
Creating a dashboard with this data
In our next blog, we will share a dashboard that uses this data to create an interactive chart that looks like the example above. Please feel free to use, alter or add to this code in your own environment or a business environment. We hope it helps you to guide your business and would love to hear from you how it worked and if it gave you any useful insights.