Removing AMP pages from your Jekyll Sitemap

In this quick post we look at how to remove AMP pages from the sitemap generated by Jekyll

I love using Jekyll for generating this site, I feel like it gives me all the control I need without getting too distracted by low level details.

Recently I added AMP support to my site. With AMP a new mobile optimised page is generated. Google serves this to some mobile traffic when people search for stuff.

This means I have my standard post about Brighton Ruby and an AMP optimised post about Brighton Ruby (note: I have since removed AMP from the site). So far so good.

I tell one post about the other through a amphtml link tag.

Toby, I thought this post was about Sitemaps?

Heh, sorry – I’m getting there. The important thing about this preamble is that I now have two pages. /post/ and /amp/post/.

If you use Jekyll you probably use the excellent Sitemap Generator plugin for it.

During a brief SEO audit I noted that only half of my pages submitted to Google’s Search Console were getting indexed, furthermore there was about double the amount of pages getting submitted than I thought there should be.

A report showing I have double the amount of submitted URLs than were indexed
A report showing I have double the amount of submitted URLs than were indexed

A quick check of my sitemap.xml showed that each of my AMP pages had its own entry in the sitemap.

I hadn’t asked the Sitemap Generator to add these pages and I didn’t think to check at the time.

Looking through the documentation and source code I could see that any generated page will get added to the Sitemap. This is a nice feature for standard use cases, not so great for my AMP setup though!

Luckily Sitemap Generator always checks against a flag. If you set a particular post or layout to have sitemap: false in its Liquid meta information then it won’t get added to the sitemap.

Where you have the make the change will depend on how you are generating your AMP pages, for me I had a layout file called amp.html and in it at the very top I added;

  sitemap: false

Once I made this change and deployed my site, all my AMP pages were removed from my sitemap.

The final step I took was to go and resubmit my sitemap with Google, I’m sure it would have picked it up itself but it took 10 seconds to do and the report I took a screenshot of above immediately looked a lot healthier.

When to keep AMP pages in your sitemap

To improve crawl efficiency I wanted to remove my AMP pages from my sitemap, I was happy to do this because I knew there was a 1-to-1 correlation between pages with the amphtml tag and AMP pages.

If you have AMP pages which aren’t a copy of a standard HTML page, then Google suggests keeping the AMP page in your sitemap. In this case I would only suggest adding the pages that aren’t linked to from other locations.

Recent posts View all

Web Dev

Creating Docker images with M1 Macs

I ran into an issue with my Dockerfile when using it on a Linux machine, setting a platform fixed the issue

Web Dev

What does --no-owner mean in Postgres?

You have read a guide to doing Postgres exports or imports and seen --no-owner, this is what it means