Canonical and Base URLs
A guide to canonical and base URLs and why you should care about them
Canonical URLs
According to Wikipedia the word canonical in a Computer Science sense means “the usual or standard state or manner of something”, in terms of web development canonical URLs are the preferred URLs as picked by the web developer. So when a website has multiple URLs for the same content specifying a canonical URL tells search engines which link they should display to the public.
To use canonical URLs you should place something like the following in the head of your document;
<link rel="canonical" href="https://www.example.com/category/my_page.html">
Unfortunately this is only a signal, not a directive - so search engines don’t need to follow this, but from what I can see it appears to be a pretty strong signal.
What is cool is that the canonical link can point to another domain altogether - this could be useful if you don’t have access to do other forms of redirection.
Base URLs
Base URLs let the web developer specify a documents base URI explicitly, this ties in nicely with canonical links which can be relative if you wish and can help with duplicate content issues for linked content coming from your site.
To use base URLs you should place something like the following in the head of your document;
<base href="https://www.example.com/category/my_page.html">
At first pass this looks the exact same as the canonical URL we wanted to include, the difference with this is that then we do something like;
<img src="../img/myimg.png" alt="my test image" />
And this would relate to https://example.com/img/myimg.png - pretty cool eh?
Your canonical URL can be replicated throughout relative links in your site, which improves the chances that search engines will index the URL you want them to index.
I don’t care about SEO - why should I do this?
Some people just don’t care about ranking well in search engines, and that is fine, but I still think you should look into implementing these tags were appropriate on your website - here is why.
I am currently working on a legacy project that is a site that accepts multiple domain names, it then uses some database magic to display different sites depending on the domain. Simple enough. The only issue is that the original developers didn’t think about the need to provide a canonical URL, so b.com/a/b/c and x.com/a/b/c show the same content across different domains and unfortunately somewhere along the line Google has indexed ‘b’ as sitting under x.com instead of b.com - which just looks wrong (even if you don’t care about rank)
Granted this is just an edge base, but it is one of many that can be solved by the correct application of both canonical and base urls.
Further Reading:
Canonical Links
- https://support.google.com/webmasters/bin/answer.py?hl=en&answer=139394
- https://www.mattcutts.com/blog/seo-advice-url-canonicalization/
- https://en.wikipedia.org/wiki/URL_normalization
- https://www.seomoz.org/blog/canonical-url-tag-the-most-important-advancement-in-seo-practices-since-sitemaps