Each piece of content should only be accessible under a single URL, otherwise the problem called duplicate content arises. Should similar or identical content also appear on another page, for example, on a partner website, the second URL must point to the original URL.
If there is no such reference, at least one of the two sites will be punished by Google and taken off the index. To avoid this, you can make use of the so-called canonical URLs. Canonical URLs or canonical tags are indications in the source code of a website.
Why should website owners use canonical URLs?
But duplicate content is not the only danger – website owners should manually choose the canonical URL. There are many more reasons:
Avoiding keyword cannibalization
If two pages on the same domain are optimized for the same keyword, they might cannibalize each other. The search engine algorithm then independently decides which of the two pages is more relevant to a search query.
Differentiating multiple URLs for a page
If multiple URLs exist for a page, you must decide which URL should appear in the search results.
As a site operator, you may want to prioritize a specific URL
www.example.com/product/exampleproduct.Furthermore, it is possible that content is accessible through different URLs and/or domains: e.g. www.example.com / www.example.net / www.example.com/home.
Consolidating link signals for similar or duplicate pages
For search engines, it helps to consolidate the information available on several individual URLs into a single preferred URL.
The canonical URL for links from other websites pointing at
example.com/product/exampleproductshould be consolidated at
www.example.com/product/color/productin.html.It is possible that your server is configured to allow both HTTP and HTTPS variants and both with and without "www" for the same content.
Gathering correct metrics on a product or topic
Setting a canonical tag makes it easier to generate consolidated metrics for specific content.
Syndicated content management
If content is to be syndicated to other domains for publication, the canonical tag has to specify the preferred URL.
- Saving crawl time on duplicate pages
If you want to get the best possible Google rankings for your page, Googlebot should be advised to crawl only the latest version of any URL on your website.
How do site owners know which URL Google considers the canonical one?
The URL review tool in Google Search Console lets you check which page Google considers to be canonical. Note that for various reasons (such as the performance or content of a page), Google may select a non-canonical page other than the one you specified. This can be due to incorrectly marked language versions or incorrectly configured servers.
In this Google Webmasters video, John Mueller explains how Google chooses which page to rank when there is duplicate content:
Possible methods for applying the canonical tag
For all methods, Google recommends the specification of absolute URLs, i.e. the entire web address.
Method 1: Integration in the head element of the source code
The link element with the attribute "canonical" is placed in the head element of the source code and supplements the meta information of a document. The element refers to the default page, but is used only where one or more pages have identical content but only one should be considered the original source. An example would look like this:
The first URL forms the default resource in this case, the second contains a session ID to store user-related data, such as items placed in a shopping cart.
The canonical tag is integrated into the head element of the second page, which contains a reference to the original page. Search engines like Google then treat the original page preferentially and do not classify the content as duplicate.
This method allows you to map an infinite number of duplicate pages, but can also increase the page size.
Method 2: Editing the HTTP header
If the default resource is a PDF or another file type supported by Google, the Canonical Tag must be included in the header of the page. Here, however, the syntax differs and the integration requires knowledge of the Hypertext Transfer Protocol (HTTP).
Link: <http://www.beispiel.de/beispielseite.pdf>; rel="canonical"
This statement is to be understood as an instruction for the response of the HTTP protocol. When a client request arrives, the server responds that this page is the canonical URL.
If two versions of this URL were available, the canonical tag would have to be included in the header because the default resource is available as a PDF file.
This method has the advantages of not enlarging the page and allowing you to map an infinite number of duplicate pages. However, it can make mapping larger sites where URLs change frequently more difficult.
Method 3: 301 redirects
In order to signal Googlebot which content should be indexed, 301 redirects can be used as well. However, this possibility should only be considered if the redirected content is to be taken offline, as it will no longer be reachable.
The 301 status code indicates that a page has been permanently moved to a new location.
Method 4: AMP variant
If one of the given pages is an AMP page, you can use the AMP guidelines when specifying the canonical page.
Basically, canonical tags can be used on any subpage, having each page point at itself. This can be used to correct or prevent unexpected errors and incorrect links.
The tag references the document and indicates to search engines that this document should be the default for indexing.
These mistakes should be avoided when setting canonical tags
If the canonical tag is applied incorrectly, pages may be ignored by Google altogether. A canonical tag only makes sense if two or more pages actually contain identical or almost identical content.
Canonical tags for paginated sites using "rel-next" and "rel-prev"
Canonical tags are not useful for paginated pages that use "rel-next" and "rel-prev" because the contents are not actually the same.
Canonical tags on 404-pages
Websites that are tagged with a canonical tag should always be accessible.
Combined with "noindex", "disallow" and "nofollow"
Pages, which contain a canonical tag, should in no case additionally be given the attributes "noindex", "disallow" or "nofollow".
Multiple use of the canonical tag in meta tags
Canonical tags should not be pasted into the body area of a document or used multiple times in meta tags.
Incomplete link in canonical tag
If a relative path is specified as a canonical link destination, Google may interpret it incorrectly and the canonical tag may lose its effect. Therefore, the link in the canonical tag should always be complete.
Disregarding the syntax
Since the syntax is ignored, all characters should always be given when specifying the canonical tag. This also applies to the protocol: Google prefers secure HTTPS pages as canonical URLs. Therefore, the HTTP page should always point to the HTTPS page.
Canonical pointing to the homepage of a domain
If a canonical tag refers to the homepage of a domain, it would wrongly indicate that there were duplicates of a page. That would set the day wrong.
If canonical tags are used incorrectly, canonical chains or reciprocal links are created. Make sure canonical links do not refer to other canonicals.
The Google Search Console also allows webmasters to specify how Google should handle the parameters in a websites' URLs. But be aware that Google is not bound to use the page you specify as the canonical URL, but may classify it differently based on its performance, as mentioned above.