Category Archives: Google Analytics

Sequence segments in Google Analytics

With segments in Analytics, we can analyze a certain group of users on our website. Segments can help us to get conclusions which we can show to our client in order to help them to make better business decisions.

Google Analytics has some predefined segments, for example:

  • Bounced Sessions
  • Mobile Traffic
  • New Users
  • Non-Converters etc.

If we can’t extract a group of users with predefined segments, Analytics has an option to create custom segments. Custom segments give us the ability to group users based on our needs and to analyze their behavior on our website.

But, before you start creating custom segments, it is important to know the difference between User, Session and Hit scopes.

Which scope should I use? Depending what you want to analyze, you will use a different scope in the segment:

  1. User segments – we use them when we want to analyze the behavior of our users on the website (for example, how many users created $5.000 revenue across all their transactions in certain period)
  2. Session segments – they come in handy when we want to analyze certain traffic source or to see what happens inside sessions (for example: what users from newsletter campaign do on our website )
  3. Sequence segments – we use them exclusively when we want to segment users who done something in particular order, when we want smoother segmentation of users behavior (for example: how many users went straight from homepage to contact page)

Let say, we define segment with User scope, where user visited the category /Office and added product to cart:

user-scope-segment

Let’s take an example where we have three users who visited our website. With user scope segment, we will include all three users. The first user in his first session added the product to cart on category /Electronics, while in his second session he visited the category /Office. Although these two activities happen in different sessions (visits), this user will be included in our segment.

The second user who visited category /Electronics in his third session added the product to cart and after that visited the category /Office, will also be included in our segment.

The third user, who in his second session visited the category /Office and added the product to cart will be also included in our segment.

user-segments

What will happen if we create session scope segment with the same conditions?

session-scope-segment

Session scope segment will include only second and third users because these two activities happen inside the same session (visit):

session-segments

Sequence segments allow us more precise segmentation if we want to follow a particular order of activities that users made on our website.

In our previous example, when we created session scope segment, we included only last two users with our segment. If we take a better look, the second user visited the category/Electronics, added product to cart and after that visited the category /Office, where he didn’t add the product to cart.

If we want to segment only users who were on category /Office and added products to cart from /Office category, then we will create sequence segments where we define two steps.

sequence-segment

In the first step we define hit (interaction) in which we want to include only users who visited the category /Office. And in the second step, we define hit where we want to include only users who also added the product to cart.

But, between steps you have two options:

  • is immediately followed by – means the user visits the category /Office and then the next thing he does is add the product to cart
  • is followed by – means the user visits category /Office and then at any point later in the same session (visit) they add the product to cart

Since we chose is immediately followed by we can be pretty sure that user added to cart one of products from /Office category.

With this segment, we will include only third user, for which we can be sure that he added the product to cart from category /Office:

sequence-segments-user

To conclude, if we use User scope segments, conditions that we define in our segment can happen across multiple sessions (visits). If we use Session scope segments, then conditions must happen inside the same session (but without any particular order).

With sequence segments, we define the order of hits (interactions) that user made, which allows us more precise segmentation. It is important to understand scope in segments because otherwise, we can read the data on a completely wrong way.

When creating sequence segments you can set up to 10 steps, and inside one segments you can combine more sequences.

Solving Attribution with GA Measurement Protocol Payload Widening

 

For starters let’s have Google explain what Measurement Protocol actually is and when it makes sense to use it (explained by Justin Cutroni):

As the documentation states Measurement protocol is used for this purpose:

  • Measure user activity in new environments.
  • Tie online to offline behavior.
  • Send data from both the client and server.

You can even find some of the basic hit payload examples in the dev docs here https://developers.google.com/analytics/devguides/collection/protocol/v1/devguide#commonhits. These hits will be valid yet if sent as is it is a question how valuable the data will be perceived once in reports. A suggestion would be to widen. Always. Making the hit valid and valuable at the same time.

The issues with MP can best be explained by using a simple example such as:

Company A sells product B in multiple countries. The process contains multiple steps. 1st step where a user visits the site and fills in and successfully submits a lead form. After validation and internal risk check  we enter 2nd step where the initial transaction is processed. Delay between step 1 and 2 can be up to 14 days. 3rd step – a user can have a recurring payment contract which automates billing process each 30 days – there is no need for the end user to visit the site in order to confirm.

Scenario – Sending a valid hit with bare essentials in step 2 and 3 results in:

Device, Geo and traffic attribution will be or can easily be skewed. For instance sending a minimum valid hit will basically send information which will be generically attributed to the IP of the server sending the hit (messes up geo attribution), device sending the hit (messes up device attribution), and traffic defaults to direct (which is in some cases fine as GA uses last non direct traffic source attribution meaning IF any campaign exists prior to the MP hit tied to a specific Client Id within the Campaign Timeout window (6 months by default) the direct traffic will just be overridden).

One additional thing which may happen if scenario follows this path:

  1. step 1 form submit via google / cpc
  2. between step 1 and step 2 an additional session / visit happens via google / organic
  3. step 2 sends the transaction to GA yet the transaction will be attributed to google / organic – so it is a question which of the sources you want the transaction to be tied with – cpc or organic?
  4. subsequently recurring transaction attribution for step 3 will always subscribe to the latest known campaign and not the initial source of Lead or Transaction

In order to prevent this from happening there are some additional parameters which you should:

  1. store (in your internal system for processes such as lead form submits) and
  2. send with the measurement protocol hit

Parameters for widening the MP hit (solving attribution)

dl – document location
Circumvent any hostname filtering (dp and dh parameters as options to this)

uip – user IP
Preserve location

ua – User agent
Preserve info on user agent (optional)

ds – Data source
Segment on data source. (cdN (custom dimension) as option)

cn/cm/cs/gclid/dclid/dr – traffic – campaigns from UTM / Adwords or DC / document referrer / (none if no info exists or create a virtual direct?)
Preserve traffic – partially. The issue is to preserve attribution for the traffic source which generated the sales – if no info would be provided some mid newsletter driven session may miss attribute results.

Sending the Measurement protocol hit

Prior to pushing all your work to production you should definitely test the setup and outcomes using https://ga-dev-tools.appspot.com/hit-builder/ and a dummy Google Analytics property.  In most cases you will send the hits as events, non interaction ones so make sure ni parameter is set to 1. This may challenge the way you see the standard reports as if you send a non interaction hit something unexpected will happen – no session will be created but there will be a user reported for the period in question – the attribution will stay intact though.  MCF reports will attribute and report as expected.

I would really be interested to know which other parameter you find essential or some of your MP cases where the valid hit is just not enough – Happy tracking!

Logging Oversized Enhanced ecommerce Google Analytics hits in Google Sheets

 

If you ever worked on a Enhanced Ecommerce implementation project there is a chance you encountered the ‘Payload size is too large (15226).  Max allowed is 8192.’ message which actually means there will be nothing stored in reports. There are some workarounds for this particular issue such as:

In this post I’ll only focus on explaining how to determine the EE actions and hits the payload may be considered an Issue.

Step 1 – Google Sheets setup

The Sheets procedure and the entire script which reports on the questionable hits can be found here – https://gist.github.com/mhawksey/1276293 (by Martin Hawksey). Follow the entire procedure and do not forget to publish and update the Script once you made the necessary changes!

A quick note on the Google Sheet itself. This one needs to be filled with header row so that we can map our payload to Google Sheets as in this example:

screenshot_54

Just the header (1st) Row. The column entries are case sensitive so whatever you define in the payload sent to Google Sheets needs to exact match! For my example I used these column labels (you may notice a ‘slight’ resemblance to Measurement protocol parameters):

  • timestamp
  • payLoadLength
  • tid (tracking id)
  • cid (client id)
  • uid (user id)
  • t (type of hit)
  • pa (product action)
  • ni (non interaction)
  • dl (document location)
  • dp (document path)
  • dt (document title)
  • ec (event category)
  • ea (event action)
  • el (event label)
  • ti (transaction id)
  • tr (transaction revenue)

Make sure that the cid and ti columns are formatted as Plain Text otherwise some unwanted auto formatting may occur. Additional parameters may be added as needed.

 

Step 2 – Google Tag Manager Setup (Mimic the Google Analytics payload)

When proceeding with this it would be prudent to create an additional GA tag in GTM leading to a dummy GA property just to be sure nothing goes wrong with the data. Basically add a new field to set to each GA tag which passes on EE payload as in the following example:

screenshot_55

So I will use the sendHitTask as means to read the entire payload and send it to Google Sheets once the payload exceeds a certain length. The sendHitTask field will use a Custom JS variable (Code below – based on this post) which returns a function which is designed (apologies for the sucky code) to push the default payload to GA and further on send the custom payload to Google Sheets – :

function sendHitTask(){
  return function(model) {
    var payLoad = model.get('hitPayload');                     
    var trackingBaseUrls = ['https://www.google-analytics.com/collect', 'https://script.google.com/macros/s/AKfycbwIYbCxf2HOBIQwpjsc3bmCSEpzcfwXC7q9ZMMaO3lOQRxGSWEi/exec'];

    for (i = 0; i < trackingBaseUrls.length; i++) { var baseUrl = trackingBaseUrls[i]; if (trackingBaseUrls[i].indexOf('collect') > -1) {
          var req = new XMLHttpRequest();
          req.open('POST', baseUrl, true);
          req.send(payLoad);
      } else if (payLoad.length > 7000){
        var payLoadExtract = payLoad.split('&');
        var payLoadArray = {};
        // Push values to array for later access
          for (i = 0; i < payLoadExtract.length; i++){
                  var splitArray = payLoadExtract[i].split('=');
                  payLoadArray[splitArray[0].trim()] = splitArray[1].trim();
          }
            // Specify values to be sent to Google Sheets from array
            var tid = 'tid=' + payLoadArray.tid,
            cid = '&cid=' + payLoadArray.cid,
            uid = '&uid=' + payLoadArray.uid,
            t = '&t=' + payLoadArray.t,  
            pa = '&pa=' + payLoadArray.pa,
            ni = '&ni=' + payLoadArray.ni,
            dl = '&dl=' + payLoadArray.dl,
            dp = '&dp=' + payLoadArray.dp,
            dt = '&dt=' + payLoadArray.dt,                                      
            ec = '&ec=' + payLoadArray.ec,                                      
            ea = '&ea=' + payLoadArray.ea,                                      
            el = '&el=' + payLoadArray.el,                                      
            ti = '&ti=' + payLoadArray.ti,                                      
            tr = '&tr=' + payLoadArray.tr,                                      
            timestamp = '&timestamp=' + {{UTIL - Date}},
            payLoadLength = '&payLoadLength=' + payLoad.length;
            var collectPayLoad = tid + cid + uid + t + pa + ni + dl + dp + dt + ec + ea + el + ti + tr + timestamp + payLoadLength;
        // Send Values to Google Sheets
        var collectUrl = baseUrl +'?'+ collectPayLoad;
        var myImage = new Image();
        myImage.src = collectUrl;
      }
    }
  }
}

And this is basically it. Whenever this tag is triggered it will push data to GA but if the payload exceeds the defined condition it will log it in the Google Sheet which will hopefully help you in debugging the implementation.

DataStudio how to Report Google Analytics MCF and Standard Ecommerce data from Google Sheets

 

How to use DataStudio in order to report on Average Days to Purchase or Average Touchpoints to purchase a specific product or category of products based on data collected in Google Analytics? Easy! For the entire setup you only need to have / do:

  1. Have access to a Google Analytics property with Ecommerce implemented
  2. Have access to DataStudio
  3. Read one of the previous articles on how to join MCF report data and Standard report data in Google Sheets

Google Sheets as Data Source

Google Sheets GA plugin extension set and queries prepared as in example:

datastudio_google_sheets_0

Reporting Sheet where you do all the join operations:

datastudio_google_sheets_0b

Once this is set proceed to adding a Data Source inside Data Studio (Google Sheets as Data source):

datastudio_google_sheets_1

Select the appropriate file and Sheet (the reporting one) and proceed in adapting the source by formatting the fields as in example:

datastudio_google_sheets_2

Note that we only had to edit Aggregation settings to Average and created an additional Calculated Field Metric which basically Averages the Transaction Value based on the Count of Transactions Ids – Formula: Total Conversion Value (MCF) / COUNT(Transaction Id (MCF)).  Proceed to add any desired visualization to the report page – in this example I used Scatter Chart and Simple Scorecard. For example the Scatter Chart uses a simple setup as follows:

datastudio_google_sheets_4

The final result was a simple one page dashboard:

datastudio_mcf_core_google_sheets

The point of the exercise should be to easily push data to DataStudio (from any available source – even ‘offline’) in order to quite simply explore the data and relations of 2 or more variables – who knows something interesting may pop up:)

 

 

DataStudio how to create a Google Analytics Custom Channel Grouping

Google DataStudio is an excellent addition to the Google Suite tool arsenal which deals with the reporting / exploration part.  A much more complex system than what we are used to with Google Analytics Dashboards yet still really easy to use! The product is still in Beta but looks very promising.

For more information about DataStudio here are some useful links to get you up and running:

  1. Product overview – http://online-behavior.com/analytics/google-data-studio
  2. Help center – https://support.google.com/360suite/datastudio?hl=en#topic=6267740
  3. Creating a report (case) – http://datarunsdeep.com.au/blog/visualising-m-night-shyamalan-google-data-studio

Back to the issue at hand:

How to create a Google Analytics Custom channel grouping inside DataStudio

Most of avid Google Analytics users are really keen on redefining the default channel grouping or adding custom channel grouping to have a better understanding of how ‘channels’ perform on a high / low detail level. Unfortunately GA Reporting API / Core API still does not have such a dimension exposed / available so in our reporting and analysis effort we have to resort to recreating the conditions with filtering. Same process has to be applied when reporting in DataStudio. So, whenever you want to widen the Default channel grouping by creating a Custom Channel grouping inside GA and want this to be accessible via DataStudio you need to do this:

  1. Add a data source – Google Analytics – procedure described in links above
  2. Add a custom field to the data source – which I will describe next

To begin with here is the official support for calculated fields – https://support.google.com/360suite/datastudio/answer/6299685?hl=en.

Step 1
Create the field
04 datastudio create a calculated field

 

Step 2
The formula

07 datastudio building blocks

 

This is the tricky part. You need to mimic the order and definitions as you would in the GA Admin interface using prebuild DataStudio functions (https://support.google.com/360suite/datastudio/table/6379764).  The basis will be the function CASE (https://support.google.com/360suite/datastudio/answer/7020724) which uses this simple syntax:

CASE
WHEN X1 THEN Y1
WHEN X2 THEN Y2
WHEN Xn THEN Yn
ELSE Y
END

This will translate to:

CASE
WHEN our source or campaign matches ‘some string’ THEN the dimension output will be ‘desired channel grouping dimension value’
ELSE if we missed to group some traffic inside a group it will output ‘You missed this’
END

The best way to proceed is to open Notepad or similar text editor and do it there as DataStudio formula entry has auto fill feature and you can only write in a single line so it will eventually strat to be extremely hard to track what is defined or what needs to be corrected.

Examples:
‘Referral’ channel group
WHEN Default Channel Grouping = “Referral” THEN “Referral”

‘Paid listings’ channel group
WHEN
REGEXP_MATCH(Medium, “.*(listing|xyz).*”) AND REGEXP_MATCH(Source, “.*(tripadvisor|abc).*”) THEN “Paid Listings”
In this case we are using REGEXP_MATCH function where we evaluate Medium AND Source for phrases we use when creating Custom Channel Grouping in Google Analytics.  We can also use other logical operators to create a more complex condition. Please note that REGEX we are used to in Google Analytics does not behave the same as in DataStudio (after some tinkering guys from http://datarunsdeep.com.au/ confirmed it – thanks!)  – more on DS regex – https://github.com/google/re2/wiki/Syntax.

Your end result should have the following structure:

CASE 
WHEN Default Channel Grouping = “Referral” THEN “Referral”
WHEN Default Channel Grouping = “Direct”  THEN “Direct”
WHEN Medium = “cpc” AND REGEXP_MATCH(Campaign, “.*(brand|bcmp).*”) THEN “Paid Brand Campaign”
WHEN REGEXP_MATCH(Medium, “.*(listing|xyz).*”) AND REGEXP_MATCH(Source, “.*(tripadvisor|abc).*”) THEN “Paid Listings”
… add more definitions as you see fit
ELSE “Other”
END

Copy your entire code and paste it inside the formula field:

05 datastudio fill in the formula

Step 3
Update the data source

07 update data source

 

 

Step 4
Change the Dimension in your widget from Default Channel grouping to your Custom Channel grouping one.

Click on the current dimensions.
09 choose the calculated field

Choose the newly created one.

10 My custom channel grouping

A small hint – add a secondary dimensions – Source / Medium just to check if all the traffic is indeed in predefined and expected channel group.

11 make sure it is applied

Step 5
Refresh the data

12 refresh the data

 

Final notes:

  • Any change you do inside GA should be redone inside DataStudio Custom field (at least till GA surfaces some kind of a custom channel grouping dimension via API)
  • The same approach can be used for any kind of filtering / segmentation and once you get used to the syntax it is pretty useful and offers nice customization options

 

Connecting products and categories with Google Analytics MCF API Intro

 

How to track specific product purchased by name, id, category or similar and tie the sales to the MCF API data (Multi Channel Funnel)? – as an example:

  • Black T-shirt was bought with this purchase path Organic Search > Paid Search > Social Network inside 5 days with total conversion value of $5 (Like this will ever happen:))

Some additional insight may come up:

  • Track how much time, on average, is needed for a purchase of specific product or a group of products (currently limited to max default lookback window of 30 days – shame – feature request to Google – increase to 90 days or give the ability to set it manually for the MCF API – please).
  • Track if expensive products need a longer conversion path in terms of days and touch points.
  • Track channels which create awareness for a particular group of products and find ideas for scaling sales.

You will hopefully find some delicious insights while you explore this pretty useful connection between CORE and MCF API – just by browsing the dimensions and metrics available from the APIs you’ll pile up some exploration ideas:

  •  CORE API dimensions and metrics – https://developers.google.com/analytics/devguides/reporting/core/dimsmets – please note this page includes the Reporting API v4 dimensions and metrics as well
  • MCF API dimensions and metrics – https://developers.google.com/analytics/devguides/reporting/mcf/dimsmets/

Before getting into details we will be working with CORE API v3 as it is supported in Google Sheets Google Analytics add on. This is an intro article so the idea is to enable access to this data without unneeded tech hassle.

Prerequisites

  1. Working ecommerce implementation
  2. At least some historic data
  3. Access to Google Analytics account / property / view
  4. Some basic Excel / Google Sheets experience

Step 1 Google Sheets setup

Using Google Analytics add on for Google Sheets create the following 2 queries using this procedure …

After you open a new Google Sheets document and installed the GA add on create a new report:

1 create new report GA sheets

Fill in the details for the GA Account / Property / View access (remember use only views with ecommerce enabled (goals may work as well)):

2 define account property view for pulling data

Add the required metrics and dimensions and build the queris for mcf and core API:

3 mcf and core queries

1st query is a MCF API call which uses:
Metrics:

  • mcf:totalConversionValue

Dimensions:

  • mcf:transactionId
  • mcf:conversionDate
  • mcf:basicChannelGroupingPath
  • mcf:pathLengthInInteractionsHistogram
  • mcf:timeLagInDaysHistogram

2nd query is for CORE API (standard reports) where we pull product info:
Metrics:

  • ga:itemRevenue

Dimensions:

  • ga:transactionId
  • ga:productName

Both queries can of course be customised extensively based on the questions you are trying to find answers for.

Note that the max count of results returned from the API is 10 000 if you have more than that you will have to create additional queries yet you will use the Start Index filed in the query set to 10001 and so on – there are existing plugins such as Analytics Edge for Excel which can help with that.

If a transaction can contain multiple products the entire process is a bit different so your main metric becomes item revenue instead of total conversion value. You can in that case add an additional calculation where you would basically do item revenue / transaction revenue and the attribution would be used only if the result would be > 50%. 

Once all is set you can proceed with running the report:

4 run report

The result of running the reports will be two additional Sheets which hold the data for further tasks.

5 reports in sheets

2 Google Sheets String manipulation (adding fields for more granular analysis and pivot options)

Add a new empty sheet (one of the ways):

6 add new sheet

Copy data from MCF report (Dummy MCF sheet) to the empty sheet (the new sheet should be named Reporting) (If transactions contain multiple products you will have to revert the process in terms of first copying data from Dummy Standard):

7 referencing fields in Reporting sheet

You basically reference the cell from a different sheet so it will always refresh once you pull a different data set.

Next sub step is adding some more columns for pivoting purposes (just as ideas – you can always come up with your own columns).

The idea is to come up with the First channel responsible for the sale and then the Penultimate channel. You can also do last and what not yet penultimate is something we do not often analyse but can prove to be valuable (better say the channel before the decision itself).

Case:

  • Conversion path contains the following string …
  • Display > Direct > Paid Search > Organic Search
  • We want to extract Display (First channel) in one cell and Paid Search (Penultimate channel) in a different cell.

We need to create an extraction formula which will go through all the purchase paths (regardless which channels these contain and the number of touch points). Note that the formula will be different based on which dimensions you use (channels, vs adwords campaigns vs source / medium etc.). In your newly created Reporting sheet after you referenced the Dummy MCF sheet cells successfully click on the next empty cell and enter this formula:

8 extract string

The formula for First channel is:
=IFERROR(TRIM(REGEXEXTRACT(C2,”^[a-zA-Z()]+”)),”-“)

The formula for Penultimate channel is:
=IFERROR(TRIM(REGEXEXTRACT(TRIM(REGEXEXTRACT(C2,”[a-zA-Z() ]+ > [a-zA-Z() ]+$”)),”^[a-zA-Z() ]+”)),”-“)

Copy / paste the formulas throughout the rows (remember max rows is by default 10 000). These formulas will not work in Excel though.

3 Tie the Dummy Standard data with the new reporting tab based on transaction Id

In our Reporting tag where we did all our string manipulation we need to fill in on more column – product name in this case (it can be whatever you queried from CORE API – just remember you always need the transaction ID as it serves as a primary key).

Use the free column on the right and name it like Product Name and in the first data row enter the formula which will tie the product name based on transaction Id:

9 vlookup

Formula used to tie product to channel grouping:
=IFERROR(VLOOKUP(A2,’Dummy Standard’!$A$16:C$10015,2,FALSE),”No Product Info”)

The formula:

  • takes the value from the current sheet set in 1st column starting at A2 and
  • finds the same transaction Id value in Dummy Standard sheet
  • in the cell range from A16:C10015 and
  • returns value from the 2nd column from that sheet to our Reporting sheet.

And your data set is ready for pivoting (some reformatting may be required).

4 Pivoting

My advice, as Google Sheets pivoting is a bit limiting, is to export the Reporting sheet to CSV and import all to Excel (much easier to group and what not).

Screenshot_61

Select the data range > Create the pivot table and explore:

10 Pivot start