In this article, we’ll take you through our favourite tips, tricks and best practices for making your Gatsby sites as search engine friendly as possible.

Overview

If you’re already familiar with Gatsby, static site generators and JAMstack, feel free to skip ahead to the next section. If you’re new to these latest trends in web development or to Gatsby in particular, here’s a quick overview to get you up to speed.

What is a static site generator?

A static site generator is a tool that automatically generates HTML web pages from templates and raw data. With a traditional content management system (CMS), the pages are built on the server when the user visits the relevant URL – known as server-side rendering. By contrast, with a static site generator, the pages can be built ahead of time and can therefore be served up much more quickly, usually via a CDN. This makes for much faster load times.

What about JAMstack?

Building on the performance benefits of static sites, JAMstack provides an architecture for delivering dynamic, interactive web experiences while improving scalability, resilience, and performance. JAM stands for JavaScript, APIs and Markup – the key ingredients of a front-end web application using JAMstack.

JAMstack sites use static site generators to pre-render static HTML pages (generated from source files such as Markdown) and serve them up quickly. In order to deliver dynamic content, these pre-rendered pages also include JavaScript, which is executed client-side (say, on the user’s phone, tablet or laptop) when the page loads – this could be anything from displaying the user’s login status to showing whether they have “liked” content on the page or authored a comment.

The data required to populate the page – such as the user’s account details or activity history – is accessed via APIs. Using APIs avoids tight coupling to specific server-side functionality and makes it possible to leverage a whole host of third-party APIs for your site’s content.

A vibrant ecosystem of tools and frameworks has evolved to help you build JAMstack websites. These include frontend frameworks, like React.js, which make it easier to build modern, dynamic web apps. These frameworks can be used with a static site generators to generate pre-rendered static HTML pages with client-side interactivity.

Where does Gatsby fit in?

Gatsby itself is an open-source React framework and static site generator that aims to make it easier and more enjoyable for people to build highly performant, robust, and accessible websites. It achieves this by abstracting away some of the low-level building blocks of websites and providing a framework that allows you to start delivering web experiences
ridiculously fast.

Gatsby uses React.js to construct the UI elements of a web page and GraphQL (a type of API query language) to retrieve data from multiple different sources. One of the benefits of Gatsby is that it takes a “content mesh” approach, allowing you to pull in the data to display from multiple different sources, be that a headless CMS such as ContentStack, markdown files, third party APIs, pre-headless platforms like WordPress, or a traditional CMS database.

So, what’s the SEO problem?

Given that one of the benefits of static sites and JAMstack is their performance, you might assume that SEO is all taken care of. Search engines like fast load times, and so do users, making them more likely to visit your site and further improving your rankings. While that’s true, there’s a bit more to it than that.

As we’ve seen, static sites are based on templates, with some content being populated when the pages are built in advance (i.e. pre-rendered), and the rest being determined when a user loads the page and the JavaScript executes.
Search engines use bots to automatically crawl web pages and collect data from them in order to index and rank them. As bots do this work very quickly, it’s highly likely they won’t see any of the dynamic content that is only loaded after the page has been accessed. This means you need to ensure much of your content as possible, is included in the pre-rendered static pages, including your metadata.

Equally, that client-side loaded dynamic content can result in noticeably slower pages for real visitors to your site.

To provide the best experience for your users, you need to leverage all the tricks and best practices available to speed up load times and optimize client-side rendering. Fortunately, Gatsby provides plenty of options.

Prerequisites

Now we’ve covered the background, let’s get into the details of what you can do to ensure your Gatsby sites get the attention they deserve from search engines.

If you want to follow along and don’t already have a Gatsby site, you can set one up in a few minutes (yes, really) by following their tutorial.

Gatsby SEO hygiene basics

When it comes to optimizing your site for search engines, it pays to get the basics right. These points may seem minor, but are easily forgotten and can make the difference of a site being crawled and ranked or not.

Metadata

Metadata are core to SEO and can be the low hanging fruit when looking for gains. Metadata is the information contained in the <head> tag of each page, such as the page title, description, and social media tags, and is used both to rank pages and determine what is shown in the search results page.

We recommend you include the following metadata items on each page:

  • Meta title – The page title is shown on search results pages and should be unique for each page on your site.
  • Meta description – The description is also displayed on the search results page and is important for encouraging users to click the link and visit your site.
  • Twitter cards – Twitter cards are used to provide a rich preview of your site when it’s linked to on Twitter, which in turn helps to drive traffic to your site.
  • Open Graph cards – Like Twitter cards, Open Graph cards are used to provide a rich preview of your site on Facebook.
  • Canonical URL – By providing a canonical URL you can tell search engines which URL is the master copy of a page (e.g. https://example.com vs https://www.example.com) so they know which one to list in search results. Specifying a canonical URL also helps to avoid issues related to duplicate content.

With Gatsby, it’s good practice to store site metadata in gatsby-config.js. You can then create an SEO component that uses Gatsby’s React Helmet plugin to retrieve the metadata from gatsby-config.js for inclusion in site pages at build time. By adding the SEO components to your templates or page components, you ensure that the metadata is available as soon as a search engine crawler loads the page.

Let’s look at how to include metadata in your pages:

  1. If you created your site using gatsby-starter-default or gatsby-starter-blog, both react-helmet and gatsby-plugin-react-helmet will be installed by default. If not, follow the installation instructions to add them.
  2. Open gatsby-config.js (included in your site’s root directory if you created it from a template or using the gatsby new command) and add gatsby-plugin-react-helmet to the plugins list:
    
    module.exports = {
      plugins: ["gatsby-plugin-react-helmet"],
    }
    
  3. Staying in gatsby-config.js, add siteMetadata{} to module.exports and populate it with the relevant details for your site.
    
    module.exports = {
      siteMetadata: {
      title: "Title for your site",
      description: "Brief description of your site",
      image: "mylogo.png",
    }
    
  4. Create an SEO component in src/components to retrieve the metadata from siteMetadata object using StaticQuery.
    Adding constants for each item makes them easier to manipulate. Here, we’ve created the metaTitle, metaDescription and metaImage constants so we’ll be able to pass in a page-specific title, description and image as props on the page or query the values set in Gatsby-config.js.
    
    import React from "react";
    import { Helmet } from "react-helmet";
    import { StaticQuery, graphql } from "gatsby";
    
    export default const SEO = ({ metaTitle, metaDescription, metaImage }) => {
      return (<StaticQuery 
        query={graphql`
          query HeadingQuery {
            site {
              siteMetadata {
                title
                description
                image
              }
            }
          }
        `}
        render={({site}) => null}
      />);
    };
    

    If you add further metadata items in future, you’ll need to add them to the query and create variables for them.

  5. Use the react-helmet component to return the metadata elements you want to include in each page. Helmet renders the metadata in the <head> element server side, so that it’s immediately available to search bots crawling your pages.
    
    render={({ site }) => (
      <Helmet>
        <title></title>{metaTitle || site.siteMetadata.title}</title>
        <meta name="description" content={metaDescription || site.siteMetadata.description}/>
        <meta name="image" content={metaImage || site.siteMetadata.image}/>
      </Helmet>
    )}
    
  6. Add the SEO component to your page templates by including import {SEO} from "../components/seo" and including the SEO object in the return.
    In this example, we’ve passed in the metaDescription for the page:
    
    import * as React from "react"
    import SEO from "../components/seo";
    const IndexPage = () => {
      return (
        <main>
          <SEO metaDescription="This is a homepage"/>
          <h1>
          This is a homepage 🎉🎉🎉
          </h1>
        </main>
      );
    }
    export default IndexPage;
    

    When the page is built, the metadata will use the values from siteMetadata for the title and image, together with the page description.

  7. Once you’ve got these basics in place, you can reuse these values to populate Twitter and Open Graph tags. Update the SEO component return to include social media tags, such as:
    
    render={({ site }) => (
      <Helmet>
        <title>{metaTitle || site.siteMetadata.title}</title>
        <meta name="description" content={metaDescription || site.siteMetadata.description} />
        <meta name="image" content={metaImage || site.siteMetadata.image} />
        <meta property="og:title" content={metaTitle || site.siteMetadata.title} />
        <meta property="og:description" content={metaDescription || site.siteMetadata.description} />
        <meta property="og:image" content={metaImage || site.siteMetadata.image} />
        <meta property="og:type" content="website" />
        <meta name="twitter:title" content={metaTitle || site.siteMetadata.title} />
        <meta name="twitter:description" content={metaDescription || site.siteMetadata.description} />
        <meta name="twitter:image" content={metaImage || site.siteMetadata.image} />
      </Helmet>
    )}
    
  8. Finally, use Gatsby’s React Helmet Canonical URLs plugin to include <link rel=”canonical” /> in your page metadata at build time. Install the plugin and then return to gatsby-config.js and add gatsby-plugin-react-helmet-canonical-urls to the Gatsby React Helmet plugin:
    
    module.exports = {
      plugins: [
        "gatsby-plugin-react-helmet",
        {
          resolve: "gatsby-plugin-react-helmet-canonical-urls"
          options: {
            siteUrl: "<a href="https://myawesomewebsite/">https://myawesomewebsite</a>.🦄",
          },
        },
      ],
    };
    

Alternatively, you can use Helmet’s <link> prop to set the rel=”canonical” for your pages from the SEO component.

You can use a local build to test your title, description and canonical tag are included on your pages as expected. As you can’t test Twitter and Open Graph cards locally, you’ll need to deploy your changes in order to test them. Once deployed, head over to the Facebook Debugger and Twitter Validator Tool to check that your cards render as expected.

robots.txt

Your site’s robots.txt file tells search engine crawlers which pages or files on your site to index and which to ignore. It’s good SEO practice to disallow any pages containing duplicate content, as duplication negatively impacts your ranking. Preview and staging versions of your site deployed via Netlify can result in duplicate pages. You may also want to exclude pages that are only displayed after a user completes an action, such as confirmation pages.

You can use the Gatsby robots txt plugin to create a robots.txt file for your site automatically. Once installed, add the plugin to gatsby-config.js and specify the host of your site and the location of your sitemap (which you can generate automatically using Gatsby Sitemap plugin). You can create a single policy specifying allow and disallow lists for all search bots, or apply different policies for specific bots.


module.exports = {
  plugins: [
    `gatsby-plugin-react-helmet`,
    {
      resolve: 'gatsby-plugin-robots-txt'.
      options: {
        host: 'https://www.mygatsbysite.com',
        sitemap: 'https://www.mygatsbysite.com/sitemap.xml',
        policy: [{ userAgent: '*', allow: '/', disallow: '/confirmation', '/admin'}]
      }
    },
  ],
}

You can also set different policies for development, staging and production versions of your site. For more details, have a look at the Gatsby robots txt plugin instructions.

Bear in mind you can’t force crawlers to follow instructions in your robots.txt file –it’s up to each search engine provider to act responsibly and respect the requests. For that reason, it’s not enough to include pages you want hide on your disallow list – if you want to block access to content on your site, you need to password protect it. Furthermore, pages on the disallow list can still end up in search results if they are linked to from elsewhere.

Enhancing search results with structured data

Structured data enables search engines to better understand the content of a page by providing it in a machine-readable format, while remaining easy for humans to read and write. Google and other search engines use structured data to improve the appearance of search results by presenting additional relevant information, using rich snippets and knowledge graphs.

For example, when searching for flights, Google lists the available airlines, routes, and prices in a condensed listing, which helps users compare options quickly. Likewise, searching for a recipe returns rich snippets containing the user rating, recipe time, and ingredient list, together with a knowledge graph of nutritional information. Structured data is what makes this possible, and while including it doesn’t directly influence rankings, it can drive traffic to your page, which in turn boosts your rankings.

You can implement structured data in your Gatsby site using the Gatsby React Helmet plugin and SEO component we also used to include metadata on our pages.

In your SEO component, define the type of content (e.g. website, article, event, recipe) and list the relevant properties and values in JSON-LD format (the format preferred by Google):


export const JSONLD = () => {
  const jsonld = {  
  "@context": "<a href="https://schema.org/">https://schema.org</a>",
  "@type": "WebSite",
  "url": "<a href="https://myawesomewebsite/">https://myawesomewebsite</a>.🦄"
  };
  // Add the script to Helmet to output the JSON-LD formatted data in the head tag
  return (
    <Helmet>
      {jsonld && <script type="application/ld+json">{JSON.stringify(jsonld)}</script>}
    </Helmet>
  );
}

This is just a basic example to show how you could add structured data for a simple website. If your site contains multiple types of content, you can define additional sets of structured data (for example, for breadcrumbs, events, FAQs, or videos), and apply these dynamically according to the type of page.

For a full list of content types and their associated properties, head over to Google’s search documentation. You can also test your structured data using Google’s Structured Data Testing Tool.

Site speed

Together with availability of certain metadata tags, your site’s overall speed (and individual page speeds) is one of the factors that directly influences how your pages rank in Google. Site speed also had an indirect effect, as performance and load times impact your users’ experiences.

Having a slow site is likely to increase bounce rate and deter visitors from returning, which further damages your rankings. With users accessing websites from a range of devices, many of which will be relying on flaky Wi-Fi signals or low bandwidth from a cellular network, it’s more important than ever to ensure your pages load quickly and deliver a responsive experience.

A key advantage of Gatsby is that it’s designed to deliver lightning fast experiences, regardless of whether users are viewing your site on a mobile, tablet, laptop, or desktop. Gatsby uses the PRPL pattern to load your pages as fast as possible:

  • Push/Pre-load: First, Gatsby uses to request resources for the initial route requested.
  • Render: Gatsby then renders the static HTML version of the initial route, which was created when the site was built. By distributing those static pages to CDNs you can improve performance even more.
  • Pre-cache: Once the initial page has been displayed, Gatsby starts pre-caching resources required by other routes linked to from that page.
  • Lazy-load: Remaining routes are created on demand when the user clicks a link, using the pre-cached content.

In addition to its PRPL architecture, Gatsby offers a couple of other tools you can use to further boost your site’s performance.   

Images

Images, or more specifically, image load times, have long been a major contributor to slow web pages and frustrated users. Used well, images can really elevate the experience of your site, but when they stop users from being able to click through to the information they need because their connection is less than perfect or they’re running low on data, it will have a direct impact on your visitor numbers and rankings.

There are a range of best practices you can apply to create the visual impact you want while maintaining performance: resizing, creating alternate versions for different devices, file compression, blur-up with an SVG placeholder, and lazy loading to name a few. The downside, of course, is that applying all of these techniques takes time and it’s easy to forget them when making changes to your site at a later date. Fortunately, Gatsby solves this problem with the Gatsby Image Plugin.

To use the Gatsby Image plugin, install the plugin as normal and add it to gatsby-config.js. For static images (those which are the same each time the component or template is used), you just need to add the StaticImage component to the template and use <StaticImage> where you would have used <img>. For example:


import { StaticImage } from "gatsby-plugin-image"

export function Panda() {
  return <StaticImage src="../images/panda.png" alt="Panda cubs" />
}

You can then pass in props to change the layout, placeholder, transform options and other attributes.

For dynamic images, add the image to the page query and pass in the relevant arguments to gatsbyImageData. Then use the GatsbyImage component to display the image on the page. Attributes such as layout, size, and transform options are passed into the GraphQL query, whereas inline styles, alt text, and position in the container are passed as props to GatsbyImage. For more details, have a look at the Image Plugin Reference Guide.

Fonts

Remote font assets linked to from a remote stylesheet  can also slow down page loads. By using Gatsby Preload Fonts plugin, you can avoid this issue and preload font assets on the initial load of the page.

To use the plugin, install it and add it to Gatsby-config.js as normal. Then, run the gatsby-preload-fonts script each time you add new routes, new font assets or links to new font assets.

Pre-rendering for maximum performance

A key advantage of Gatsby is that it pre-renders much of the site content in advance. By generating the static HTML for each route when you build your site, you have pages that are ready to serve on request, with only the dynamic content still to load. This not only reduces load times and improves the experience for your users, but also ensures search engine crawlers can discover your content and avoids the SEO pitfalls experienced by client-side rendered sites.

Of course, to create dynamic and interactive web applications, we still need to incorporate client-side JavaScript that executes when the page loads. The process of executing the JavaScript, comparing the output to the static HTML and reconciling the two is known as (re)hydration.

Unfortunately, there is a potential pitfall here. If you’ve ever visited a site that you’re already logged in to, you may have noticed the page initially loads showing you a login link and then updates to show that you’re logged in, possibly accompanied by a UI flicker as the elements reposition. This is a result of a mismatch between the DOM (Document Object Model) of the statically rendered page and the DOM of the page expected after the JavaScript has executed.

React expects the DOM to be set when the page is rendered (which with Gatsby happens server-side at build time), and only expects to patch up differences in the text content when it hydrates containers on that page. This issue can be easy to miss in Gatsby, because the React rehydration warnings are not displayed in development environments.

To avoid changing the DOM when you hydrate your containers with client-side code, React recommends implementing a two-pass render, whereby the static render includes a placeholder for the dynamic content, and a re-render is triggered once the component has mounted. For small pieces of dynamic content, this technique can provide a much smoother experience. If you have large amounts of dynamic content, the second render can slow down page performance – but that’s always true of highly dynamic content.

404 Handling

A common issue with modern web apps is if a user enters an incorrect URL or follows a broken link. While they may see content that tells them the page was not found, the accompanying HTTP status code is still 200 (success). This is because the URL has been translated into an API route. As long as the API responds successfully and serves a page it will return a 200 status code.

From an SEO perspective, this is a problem. Search engine bots crawling your pages check the status code to determine if there’s a real page at each URL. Returning a success status causes the page to be treated as a normal page, ranked based on the (sparse) “page not found” contents and potentially listed in search results. To avoid this and ensure the page is treated as the error you intended, you need to return a 4xx status.

Luckily, Gatsby handles this for you. You can create a custom 404 page at src/pages/404.js with helpful content to redirect your users, and when a user hits a route that doesn’t exist, they are redirected to that page with a 404 status code. Simple!

Part two coming soon!