SEO

International Sites, Dupe Content and the Sitemap XML

Duplicate content in the world of SEO is a no no. You may be familiar with the on-going uproar caused by the series of animal themed updates Google has been rolling out over the past 18 or so months. Duplicate content was one of the targets of Google’s algorithm overlays during the Panda update that originally started back in February 2011. Since then, Google has been making monthly updates to this change and they think they’re pretty on par with how they’re handling it.

With this in mind, duplicated content is one of the (many) things at the front our minds right now as Search Marketers. Besides some of the inherent issues that some CMS’s can cause due to complicated URLs, filters, facets and all the rest, one for the most common issues and questions about duplicate content is internationalisation. This is what I wanted to focus on in this post.

French, English, German, Russian

Google generally isn’t fussed about duplicate content in multiple languages; they understand that this isn’t always possible providing the content is targeted at different countries. However, it’s not always that easy, depending on the website you’re trying to achieve this international visibility.

The ultimate question is “what about English content for the UK and English content for the States?” Sure both sides of the pond have their vocabulary differences; Mom/Mum, labor/labour to name a few. These language differences in spelling provide a strong indication to Google that specific content is targeted at specific countries, but it doesn’t end there.

Sub Folders and Sub Domains

Ideally, this is the way to go about things. Divide the content up into country specific folders or subdomains. That way, there is a clear indication of which is which and you can also control geographic targeting through Google’s Webmaster Tools. But, again, that’s not always possible either.

If you can’t implement anything

I recently had a situation where we needed to implement changes across a site to demonstrate that content was divided between the UK and US. Unfortunately, due to the CMS we were unable to implement the usual methods such as <head> changes using hreflang tags, sub folders, addresses in footers and everything in between. So we needed a solution that by passed that completely but still allowed us to show search engines that particular content was meant for particular geographies.

Enter Sitemap.XML

In March 2012 Google introduced a new way of being able to differentiate content geographically. This is through the use of the sitemap.xml file. The method employees the use of the rel=”alternate” and hreflang=”x” annotations, but within the XML file itself.

Let’s say we have a website called www.example.com (original I know) that has three pages targeting two countries: www.example.com/english.html and www.example.com/usa.html. Each has the same content, with a few subtle differences. You can tell Google using the sitemap.xml the equivalent pages in each country using the following syntax:

<url>
<loc>http://www.example.com/english.html</loc>
<xhtml:link
rel=”alternate”
hreflang=”en-us”
href=”http://www.example.com/usa.html”
/>
</url>

The part above says “hey, this is content for English speakers in the UK, but this other URL is for English speakers in the US”

Then for the main USA page entry, you use the syntax:

<url><loc>http://www.example.com/usa.html</loc>
<xhtml:link
rel=”alternate”
hreflang=”en-gb”
href=”http://www.example.com/english.html”
/>
</url>

The above says “this URL has content for English speakers in the US, but this alternative URL has content for English speakers in the UK”.

The same goes for content in other countries. For example, you could target German speakers in Switzerland using de-ch. Or target English speaking users in Australia using en-au. Cool stuff.

The first part of the element specifies the language code, the second part specifies the locale. This is all based on standards ISO 639-1 and ISO 3166-1 Alpha 2.

Automation

For a site with hundreds or thousands or even millions of multilingual pages that are crying out for a solution like this, automating the process would be the answer. However, I’m not currently aware of a solution that employs this – I would love to hear from anyone who does know of a sitemap.xml generator that accurately provides this solution.

Hope you find the above useful!

Dave is a professional Search Marketer working for UK SEO Company Vertical Leap. Dave has been doing SEO with VL for over 2 years now. He’s also a coffee and internet junkie who loves aviation and also writes on his own personal blog Square Squirrel. More articles by Dave Colgate
Home CSS Deals DesignBombs HTML HTML5 JavaScript jQuery Miscellaneous Mobile MySQL News PHP Resources Security Snippet Tools Tutorial Web Development Web Services WordPress