Over the past few months, the general understanding of Python within the SEO community has become more widespread thanks to its ability to automate both monotonous and labour intensive SEO tasks.

First released in 1991, with many subsequent updates, Python has an object-oriented approach designed to create a readable and logical code that can be easily scaled between small and larger scale projects.

Python has several uses, but for me, it can bring the most benefit to the automation of what are otherwise monotonous data collection tasks.

It is worth highlighting at this stage that using Python to automate these processes doesn’t exclude the need for human involvement and intelligence.

Typically, you can use Python for the following tasks:

  • Data extraction
  • Preparation
  • Analysis & visualization

It’s the data extraction part I’m going to focus on in this article, especially using a website analysis script called Python SEO Analyzer.

It’s also worth noting that other pieces of software can do some of these tasks, but if you’re in a position where you don’t have access to them — or for other reasons they can’t be used — these Python scripts can come in handy.

Automating data collection for a light SEO audit

As mentioned, in this article, we’re going to use a publicly available script called the Python SEO Analyzer to gather basic data for a lightweight audit, the data points collected are:

  • Page word count
  • Page title tag
  • Meta meta description
  • # Keywords on-page
  • Warnings
  • Missing title tag (if applicable)
  • Missing meta description (if applicable)
  • Missing image alt-text (if applicable)

You will need Python version 3.4+, and you’re ready to go (which can be installed easily through installing Homebrew and then through Pip).

After having installed the Python SEO Analyzer script, you can use the below commands to either crawl the website, or the XML sitemap to discover URLs to analyse.

I recommend doing both as not all URLs end up in the XML, and likewise, not all URLs are discoverable through a crawl:

seoanalyze https://salt.agency/

or

seoanalyze https://salt.agency --sitemap https://salt.agency/sitemap_index.xml

Another option is to generate HTML output from the analysis instead of using JSON — to do this you use the below command:

seoanalyze https://salt.agency/ --output-format-html

If you have already installed JSON, and want to the export the data, you use the command:


from seoanalyzer import analyse
output = analyse('https://salt.agency', 'https://salt.agency/sitemap_index.xml')
print(output)

Atypical output of raw data for processing and analysis

New to Python and programming languages?

If you’re new to Python and programming languages as a whole, I strongly recommend that you check out Paul Shapiro’s talk from TechSEO Boost 2018 titled: Just Enough to Be Dangerous – Programming Basics for SEOs, in which Paul explains some basic concepts and elements.

He then shows you how to bring them altogether to create your first script.

Also for further reading, this is an excellent guide on how to install Python3 on Mac.