Internet
How to
Newspapers
Twitter
SEO advice
Categories: Bad reviews, Featured, How to, Newspapers, SEO advice. Tags: , , ,

Express newspaper creates an infinite number of URLs using rel = canonical

May 6, 2011 6 Comments

The Express newspaper has cocked up its implementation of the rel=canonical command SO BADLY that it has created an infinite number of duplicate webpages ... many of which now have links from elsewhere on the internet.

Buzz Lightyear

To infinite URLs - and beyond

Using rel = canonical properly

You use the rel=canonical command to tell Google that a given URL is actually a version of another URL - and that the search engine should treat the second version as if it was that main URL.

It's useful if you have multiple copies of a page in different directories, have lots of versions of the same page due to EG WordPress making 2 versions of every page, or allow anyone to rewrite your URLs so it looks like your insulting Pippa Middleton's sister.

Make a mistake with rel=canonical, however, and it can wipe your website off the face of the internet.

Using rel = canonical to make infinite URLs

The Express site's CMS is creating a duplicate version of every single page via the rel=canonical tag. And then a 3rd version, and then a 4th ... and it's never stopping until it gets to infinity.

Take a sample page like this one: http://www.express.co.uk/features/view/244786/AV-referendum-Why-we-must-vote-NO-to-the-new-voting-system

If you look at the HTML code, you can find:

<link rel="canonical" href="http://www.express.co.uk/features/view/244786/AV-referendum-Why-we-must-vote-NO-to-the-new-voting-systemAV-referendum-Why-we-must-vote-NO">

The CMS has miscoded the canonical URL to include the first bit of the URL relating to the individual page (the AV-referendum-Why-we-must-vote-NO bit) twice.

If you visit that supposedly canonical URL, you see this, with the page-specific bit in there three times.

<link rel="canonical" href="http://www.express.co.uk/features/view/244786/AV-referendum-Why-we-must-vote-NO-to-the-new-voting-systemAV-referendum-Why-we-must-vote-NOAV-referendum-Why-we-must-vote-NO">

Go to that URL, and you find it there 4 times. Etc.

I got bored at http://www.express.co.uk/features/view/244786/AV-referendum-Why-we-must-vote-NO-to-the-new-voting-systemAV-referendum-Why-we-must-vote-NOAV-referendum-Why-we-must-vote-NOAV-referendum-Why-we-must-vote-NOAV-referendum-Why-we-must-vote-NOAV-referendum-Why-we-must-vote-NOAV-referendum-Why-we-must-vote-NO

but this will never stop. Each time you visit the canonical URL, a new canonical URL is created.

All these URLs are working pages because the Express only looks at the number in the URL to decide what content to show. So http://www.express.co.uk/features/view/244786/AV-referendum-Why-we-must-vote-NO-to-the-new-voting-system is the same as http://www.express.co.uk/features/view/244786/vote-YES is the same as http://www.express.co.uk/features/view/244786/who-exactly-specced-this-CMS.

Dozens of URls for each Express story

Sometimes these duplicate canonical URLs aren't in Google's index (I guess as each one is cancelled out by the next one). Although you can find them. This search, for instance, has this URL showing up: http://www.express.co.uk/posts/view/242092/DEBATE-Is-Britain-a-soft-touch-for-benefit-spongers-DEBATE-Is-Britain-a-soft-touch-for-benefit-spongers-DEBATE-Is-Britain-a-soft-touch-for-benefit-spongers-DEBATE-Is-Britain-a-soft-touch-for-benefit-spongers-DEBATE-Is-Britain-a-soft-touch-for-benefit-spongers-DEBATE-Is-Britain-a-soft-touch-for-benefit-spongers-

Even worse, the first URL that appears for that search is the printable URL of the page with no adverts on!

Google's results

One paragraph, 55 results ...

And as that search, with 55 results, reveals, the Express has a massive problem with duplicate content.

The Express then makes the problem even worse ...

This is a problem it makes worse via its use of Tynt to add URLs when you copy and paste content. So if you copy and paste the first sentence from this URL: http://www.express.co.uk/features/view/244786/AV-referendum-Why-we-must-vote-NO-to-the-new-voting-system, what you end up with is this:

"BY the time you read this you will have probably already voted No to AV in today’s referendum.

Read more: http://www.express.co.uk/features/view/244786/AV-referendum-Why-we-must-vote-NO-to-the-new-voting-systemAV-referendum-Why-we-must-vote-NO#ixzz1LW2s00ge".

The Express uses Tynt to add the read more bit and the URL to what you've copied.

But, yes, the code they are adding contains the wrong URL with two versions of the page slug. Follow that link and copy a sentence and you end up with this:

"BY the time you read this you will have probably already voted No to AV in today’s referendum.

Read more: http://www.express.co.uk/features/view/244786/AV-referendum-Why-we-must-vote-NO-to-the-new-voting-systemAV-referendum-Why-we-must-vote-NOAV-referendum-Why-we-must-vote-NO#ixzz1LW31La3e"

Yup, another new URL created by the system that's designed to channel links to the main story.

You can see this in action on this page on the Daily Mail where someone has copied the opening para from some other bat shit story, and the Tynt URL is to http://www.express.co.uk/posts/view/244206/EU-wants-to-merge-uk-with-franceEU-wants-to-merge-uk-with-franceEU-wants-to-merge-uk-with-france#ixzz1LCIcD5jI.

This might explain why the Express can't rank in first place for a paragraph from its own story.

To sum up

The Express isn't appearing top of Google's results for searches using their own content and Google is serving up versions of its pages with no adverts on - all because Google can't work out which page is the correct one because the Express constantly points to yet another URL for every single page - even the made up ones.

My head hurts.

Image credit.

You might also like
  1. Cross-domain rel=canonical now supported by Google
  2. Can you use rel = canonical to fix duplicate comment problems caused by comment pagination in wordpress?
  3. Google’s indexed 64 fake Independent jelly-bean Kate-Middleton URLs
  4. A wireframe for a new Express homepage
  5. Express looking at wrong Twitter accounts in BBC attack

Share this post

Follow me on Facebook or Twitter

6 Comments »

Leave a comment!

Add your comment below, or trackback from your own site. You can also subscribe to these comments via RSS.

Be nice. Keep it clean. Stay on topic. No spam.

You can use these tags:
<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

This is a Gravatar-enabled weblog. To get your own globally-recognized-avatar, please register at Gravatar.