How to optimize websites to maximize crawl budget
Crawl budget isn’t usually the first thing on a person’s list when optimizing a website, but it can be a lifesaver for larger websites or ones that refresh often. To begin, let’s go ahead and learn more about what crawl budget is.
What is crawl budget?
At its core, crawl budget is the concept of how many pages Googlebot will crawl on your website in a single “crawl event” before stopping. Smaller sites don’t usually have an issue hitting their crawl budget limit, but larger sites might. Google creates these limits based on your site’s server capability to handle Googlebot and the popularity and authority your site has. It also depends on the ‘staleness’ of your content combined with how popular and searched for the content topic is.
A smart strategy for large websites that reach their crawl limit is to incorporate crawl budget optimization as part of your ongoing “hygiene SEO”.
Optimizing your site for crawl budget involves taking steps to encourage Google to crawl and index your site’s most important pages.
Three ways to do this are to discourage Googlebot from crawling unimportant pages and ones you don’t want to be indexed. This helps Googlebot find your important pages more easily and improves how popular and fresh your pages are in the eyes of Google.
How is crawl budget determined by Google?
The way Google determines crawl budgets for websites is very complex and fluctuates often. At its core, the determination of crawl budget comes down to two things: the crawl rate and the crawl demand.
When it comes to crawl rate, smaller sites are usually crawled less frequently than larger ones. Likewise, slower sites are crawled less frequently than faster sites. This is because Google can’t crawl as many pages on slower sites in a set amount of time. Your crawl rate isn’t set in stone, so improving your site’s speed, such as improving your server and hosting, can affect your crawl rate. In general, having a faster site is good practice for users and SEO.
On the other hand, crawl demand is determined by the popularity and ‘staleness’. Popular pages tell Google that they should crawl that site more often. This is because Google likes to provide fresh content to users and dislikes showing users stale content. Google tries to crawl ‘stale’ content to refresh where possible.
Large websites like those used for eCommerce have larger crawl budgets but can be negatively impacted because of their site structure. Because these sites have thousands of regularly updated landing pages, it can be difficult for Google crawlers to keep up. There are also often duplicate product pages, and eCommerce sites tend to use cookies that negatively impact the crawl rate since Google doesn’t see those as valuable pages. When you have a large website, it’s important to make sure the pages Google crawls are valuable.
How to determine your site’s crawl budget
The first step to figuring out your crawl budget is to look in your Google Search Console. You can find the Crawl Stats Report, which shows the number of pages Google crawls per day. The report gives examples of when some pages have last been crawled. While this gives a good sample for understanding your crawl budget, looking at your server log files is the best place to get a look at the big picture.
Your server log files can give information about what Googlebot (and other bots) are crawling and when. They can help you find any error pages, such as 404s, that bots are crawling with loads of other valuable information. When you add up all of the pages Googlebot is crawling on your site for the month, that is your Google crawl budget. This crawl budget can change from month to month, but it gives a good baseline. If the number of valuable pages on your site is higher than your monthly crawl budget, it’s essential to consider crawl budget optimization.
How to optimize for your site’s crawl budget
Create and optimize your sitemap
If you don’t have a sitemap already, it’s a good first step in optimizing your site for your crawl budget. It’s important to note that this shouldn’t be your only step, as it doesn’t guarantee that Googlebot won’t also crawl low-value pages. When it comes to best practices for sitemaps, here are a few tips:
- Make sure you only use canonical URLs in the sitemap.
- Only add pages you want to be indexed to the sitemap.
- Ensure your sitemap is kept in sync with your robots.txt file.
If you already have a sitemap, look at your log files to see if Google is crawling all of your high-value and high-priority pages. If it isn’t, consider reordering your sitemap to put those pages higher in the sitemap.
Implement robots tags
When you look at your log files, you might see that Google is crawling 404s. This can eat up your crawl budget if you have a number of 404s. To help combat this, you can add in noIndex and no-follow robots tags.
Reduce duplicate content
Like crawling 404s, crawling duplicate content can eat away at your precious crawl budget. There are a few different ways to improve this, depending on what is causing the duplicate content.
If you have URLs with parameters, Google sees them as separate pages, which wastes your crawl budget. You can add these URLs to Google Search Console to help Google understand that they’re related to a particular page and not their own standalone page.
If you have several pages with very similar content, you may want to consider consolidating and redirecting to related pages that get organic traffic. There’s a balance to be struck if you decide to implement this, as redirects also eat at crawl budget, so they should be used with care.
Leverage internal linking
Internal linking can help Google find your most important pages quicker when used correctly. Using internal links to your high-value pages can signal to Google that those pages are important and should be crawled. While it might be tempting to add links to your important pages to every other page, it’s important to know that those links should be relevant to the page and keywords you’re targeting.
Improve your site’s load times
JavaScript can clog up your site’s load times and slow down Googlebot’s crawl rate for many large sites. With crawl budget, every second is precious, so when JavaScript slows the load times down by several seconds, it’s detrimental to the number of important pages Google can crawl and index. Implementing dynamic rendering or server-side rendering can help speed up the load times and allow Googlebot and other bots to spend more time crawling your high-priority pages.
There are many ways to help reduce the number of high-value pages not crawled by Google for large websites. Carrying out these suggestions will help Google know which URLs you want crawled and cut down on the time Googlebot is delayed from crawling your website.