There are many types of bot traffic, some are very subtle and not noticeable by just looking at a graph, this type is consistent over a long period of time and typically makes up ~5% of traffic or more depending on vertical.

The other type is more ‘in your face’, and cannot be overlooked as there is a large spike in your analytics in a tight time period – this type is usually easier to trace to its root or a specific event that occurred.

Not all bot traffic is inherently bad, but we do not want to be analysing the performance of a website with data skewed by inhuman traffic.

An extreme example of identifying bot traffic

A large spike in traffic to investigate

Let us take a closer look at the above example and work out exactly what has happened here. First, I have zoomed in on the week that the spike in traffic happened to take an initial look at the top landing pages and their metrics:

Nothing immediately jumps out here, such as a page you have never seen before capturing a disproportionate amount of traffic, or a page’s metrics exhibiting inhuman behaviour. Some of the data looks a bit suspicious, so we need to drill down further.

Dimension: Source

Dimension: Source

I have added a ‘Source’ secondary dimension to see the breakdown of how users (or bots) are finding the website, and ‘Direct’ traffic immediately looks suspicious due to:

  • Almost 100% new session rate.
  • 96% bounce rate.
  • ~1 page per session.
  • Short average session duration (compared to the ‘normal’ of the website).

All these things are indicative of bot traffic, so now the question becomes what the hell is this and where is it coming from?

To answer this, we need to add further dimensions to the data to gain a better understanding.

Dimension: Network Domain [R.I.P]

Update: this dimension is no longer supported as of February 2020, the example in this article is from 2019.

Network Domain is useful for showing you where the traffic originated from, but this is often is not enough for a clear answer and you will need to try other dimensions.

Network Domain dimension

Now we can clearly see where the spike in traffic is coming from – everything that is originating from this network domain is hitting one page on the website and then immediately bouncing.

This is a good discovery, but the domain name provided is very obscure and raises more questions than it answers.

Note: you can do a domain whois lookup if you are really curious, and you can likely piece things together this way, but there is usually an easier way.

Browser Dimension

The Browser dimension can often provide some information because sometimes bot traffic uses their own custom browser optimised for web crawling.

Browser dimension

Oh no… Screaming Frog was tracked in Google Analytics! Every SEO’s worst nightmare – what has happened here?

At this point, I contacted Screaming Frog because I had never seen this before, and they explained that this most likely happened due to malformed GA tracking code implemented on the website, and they were right.

Unbelievably, there was a tiny mistake in a newly updated piece of tracking code which allowed Screaming Frog to slip through the anti-tracking net and become visible in GA.

How to block unwanted bot traffic

It is important to note that once the traffic appears in Google Analytics it cannot be removed, but it can be filtered out in numerous ways, one of which is by using an advanced filter:

Advanced filtering in GA

When it comes to weekly/monthly/quarterly reports, it should be straight forward to filter out specific unwanted bot traffic using formulas/scripts in your reports too – not ideal, but it works.

To permanently block future bot traffic from this source you can do so by adding a custom filter (Admin > Filters) and it contains the same filtering options as above.

You should also ensure you are making use of Google’s bot filtering feature:

Admin > View Settings > Bot Filtering

Summary

1. Zoom in on time period of suspicious activity
2. Go to: Behaviour > Site Content > Landing Pages
3. Eyeball suspicious metrics for clues
4. Add a secondary dimension of ‘Source’ for more clues
5. Add further dimensions such as Network Domain / Browser to further explore
6. Duplicate views and add new filters to block unwanted bots
7. Monitor/repeat