For websites to be indexed within the results pages of search engines, search engine web crawlers (often called a “spider” or “spiderbot”), must first explore their pages.
These crawlers provide essential information to search engines so that the engines can supply users with the most useful and accurate results.
In order for crawlers to efficiently investigate a website, however, this means that the site in question must be appropriately structured for navigation — this is where crawl depth comes into play.
What does crawl depth mean?
Put simply, crawl depth refers to the number of clicks, or pathways, that a page is away from the homepage of a website.
The homepage, therefore, has a crawl depth of zero, and when a crawler utilises a link to another page, this will have a crawl depth of one.
How close your page is to the homepage will depend on what kind of page it is, and how important it is to the website.
Websites with thousands of pages will, of course, have different crawl depths to a website with just a couple of hundred pages.
That said, any strategically important page should not have a crawl depth of five or more, as it would signal to the crawler that it is a page of less importance.
It’s also worth noting that a crawler will only investigate a certain number of layers, as at some point it will decide that it is no longer necessary to crawl any deeper.
When a site publishes new pages, whether as commercial or supporting content, it is essential to get them crawled as soon as possible.
How to avoid crawl depth issues
There are several strategies to implement and habits to avoid so that your site has a workable structure for both crawlers and users.
Ensure that you have at least one XML sitemap
XML sitemaps are used to show Google what URLs exist on a website and get crawled more than any other kind of sitemap (such as a video or image sitemap).
There can be many elements included in an XML sitemap, such as when a particular URL was last updated.
You can learn more about sitemaps and how to build them in this Google Search Console Help guide.
Inspect your pagination
Websites with a lot of content often use pagination so that they can quickly and easily provide content to users.
For instance, if a user visits a clothes site and searches for “medium white t-shirt”, through pagination, they will be provided with items within their specifications.
As a result, however, this can cause issues with crawling, as pagination can create deep pathways when either there are very few items within a page, or there is a long list of items.
You can avoid paginated related crawling issues by cutting down lists, offering more items per page, or by instructing crawlers to ignore low-quality pages.
To do the latter, you will need to access and modify your robots.txt file. Again, you can read about how to modify your robots.txt file in this Google Search Reference guide.
Limit dynamic URL crawling
A dynamic URL is designed to narrow down items within a site’s listing page, which will filter what information is displayed to users.
Typically used by eCommerce sites, dynamic URLs append parameters, which generate similar URLs.
This tactic can cause serious crawling issues when duplicates occur of important pages.
Although adding a canonical tag can stop the indexing of a page with a dynamic URL, it will not stop it from being crawled, so ensure to mark the links with a nofollow attribute.
Alternatively, you can block them through the parameter tool in Google Search Console and or through Bing Webmaster Tools.
On the off chance that your site needs URL parameters to serve content, only implement the above if you are confident it will not negatively affect your website.
You can read more about dynamic URLs and faceted navigation in this blog.
Check for excessive 301 redirects
When sites get migrated, it is sometimes the case that a batch of URLs will get linked to without a trailing slash. This can be an issue if the rest of the site uses trailing slashes.
If a user or crawler goes to such a URL, they will get 301 redirected.
An example of a URL with a trailing slash:
An example of a URL without a trailing slash:
Although a small number of URLs with or without a trailing slash isn’t necessarily a huge issue, if this is the case for URLs numbering the thousands, the problem simply compounds as Googlebot and others have to crawl more and more unnecessary URLs.
Always update links within your sent when URLs are changed so that you can limit the number of 301 redirects.
If you already have this issue, look into creating a rewrite rule to add or remove the slashes.
How often does Google crawl a website?
Removing crawl depth problems from your website can help Google index more pages on each visit, but you’re still subject to the limits of how often Google (and other search engines) will crawl your website.
Google will only give your site a finite amount of ‘attention’ and even then, only periodically. By understanding how often Google crawls your website, you can make sure your best content falls under that briefly flickering spotlight.
Unfortunately, there is no single answer to how often Google crawls websites. It’s a complex question and depends on Google’s finite resources, the number of new websites and pages published by yourself and other webmasters, and how aware Google is of your content.
By making Google aware of your content – and that you publish good quality content on a regular basis – you can encourage Googlebot to return more frequently to your site.
It’s a win-win situation for you, Google and the internet as a whole. Google get fresh, compelling content to index, you get your content indexed faster, and web users everywhere get to see that content sooner in their search results.
The URL Inspection Tool in Google Search Console is a useful place to look when trying to find out how often your website is crawled by Google at present.
By checking URLs using the tool, you should be able to see the date and time that each page was last crawled (click on ‘Coverage’ to expand this data) along with whether the crawl was successful and whether Google can index your content.
You’ll usually find Google crawled your site as if viewing it via a mobile device, as the search engine now adopts a ‘mobile-first’ approach to indexing, so the URL Inspection Tool can also flag up any pages that don’t come up to scratch for mobile viewers.
How do I increase crawl rates?
Google wants good content, published frequently, with good and naturally occurring inbound links pointing to it. Within your website itself, you can make sure pages are mobile-friendly, load quickly without errors, contain good, original content, and are linked prominently within your site hierarchy.
Make good use of the different performance and error reports available to you in Google Search Console and Bing Webmaster Tools, and fix any obvious and immediate problems like missing pages and broken links.
Look for anything slowing down your website. That could be a server hardware performance problem, bulky page code that is slow to load, or unnecessarily large multimedia files that you could compress to a much smaller file size.
Think like Googlebot
Imagine you are Googlebot and have just arrived at your homepage. Look at the hyperlinks on your homepage: Which pages do they link to? Are any important pages not linked to? Do your newly published pages and posts appear on your homepage as updated content?
By picturing what Googlebot sees, you can spot any gaps and work to fill them in. Remember, Googlebot has finite resources and will only spend a certain amount of time on your site, as well as visiting only a finite number of pages.
Optimising your crawl rate is about increasing some factors, while decreasing others, to avoid consuming Googlebot’s capacity unnecessarily, and to make best use of the time and clicks ‘budget’ available to you during each crawl.
Factors to Increase
Try to increase factors that show Google your site is updated more regularly, make it easy for Googlebot to discover newly published pages, and generate natural inbound links from social networks and third-party sites to boost discovery of important pages:
- How often you publish new content
- Inbound links (e.g. by posting new content on social networks)
- Content included on your sitemap or linked from your homepage
Factors to Decrease
Some factors have a negative impact on crawl rates, especially including things like site templates that are not responsive or mobile-friendly, along with Google’s ‘Core Web Vitals’ of Largest Contentful Paint, First Input Delay and Cumulative Layout Shift.
- Total page load time
- Server errors and missing pages
- Duplicate content across pages
In essence, by improving the user experience for humans, you also optimise your site to make the best use of its Googlebot crawl budget, so that it gets crawled more often and gets more pages into the search index each time.
What is the impact of page depth on ranking performance?
When thinking about how page depth affects your search ranking performance, it’s important to understand that ‘page depth’ is not about the location of a page in your site’s folder structure.
For example, your blog archive might have a relatively large number of levels:
Likewise, eCommerce product pages can be relatively deep into the folder tree, especially if you structure your URLs using folders for different categories, rather than dynamic URL parameters:
The crucial factor is that the search robots do not discover a page based on its location in the site hierarchy, but by the number of clicks it takes to get there.
If your latest blog post appears linked directly from your homepage, for example, it has a page depth of 1, regardless of how many folders it is nested within.
This is why it’s so important to create good and comprehensive navigation on your website, especially from your homepage and main categories or index pages, and from a sitemap that encompasses all of the most significant pages on your site.
Negative impact of page depth on search rankings
All of that being said, you’re likely to find Googlebot spends its limited time and resources crawling the shallowest pages on your site.
If the page depth increases beyond a maximum of 2-3 clicks, the chance steadily increases that your page will not be crawled at all, from about 25% at four clicks, to around 40% at seven or more.
The most obvious impact of this is that your page might not be crawled or indexed at all, resulting in zero presence in the search results at any rank.
However, even if the page is crawled and indexed, evidence suggests that higher page depth is associated with lower PageRank, leading Google to consider the page less valuable and authoritative, and to rank it lower as a result.
What is the impact of page depth on traffic?
Page depth can affect your inclusion in the search engine indexes, and your rankings in the search results, and both of those have obvious implications for the amount of organic search traffic you receive.
Similarly, fixing page depth problems can have a significant positive effect on your site traffic, as content is crawled and indexed for the first time, indexed pages are moved higher in the search results, and more visitors click through as a result.
If you have good quality content that is not performing well, this could be one factor in why that is the case – and could be holding a perfectly good page back from generating a high number of clicks.
A good way to spot this is if a page performs well when linked directly, for example from social networks or referring sites, but is not attracting good numbers of organic search visitors.
Finally, by improving your page depth, you can help Google to spot your newly published pages faster, which can in turn lead Googlebot to increase how often it crawls your website.
This virtuous circle will help you to get new pages crawled, indexed and ranked even faster in future, to maximise your potential organic search traffic with less lead or lag time.
Tools to use for crawl depth analysis
We’ve mentioned the Google Search Console URL Inspection Tool, one of the most direct ways to see when a page was last crawled, and to help flag up any pages that are not being successfully crawled or indexed.
Bing Webmaster Tools has its own URL Inspection Tool, but the Site Explorer report is particularly useful. It gives you an overview of your website’s folder tree, which you can expand to show all Bing-crawled URLs. Click on any URL for specifics about when it was first discovered and when it was most recently crawled.
A quick and easy way to test if a page has been successfully crawled and indexed is to use a search operator:
- site: https://salt.agency/blog/ on Google
- url: https://salt.agency/blog/ on Bing
These operators function slightly differently. On Google you’ll get results that contain the specified string of text anywhere in their URL. On Bing you’ll only get a single exact-match result if the full specific URL has been indexed, or no results at all if it hasn’t been indexed.
While this doesn’t give you much data, it’s a fast way to test a single URL (or all the child pages of a particular stub, on Google) if you just want to check whether a page has been crawled yet or not.
By doing this, you can detect pages that are taking longer than usual to be indexed, which allows you to troubleshoot common problems like crawl depth, page load speed and content quality, re-optimise the page and if necessary, re-submit it for fresh consideration using the URL Inspection Tool.