Internet
How to
Newspapers
Twitter
SEO advice
Categories: How to, Internet, Newspapers. Tags: , ,

NewsNow vs the Times: Right to crawl vs right to link

January 8, 2010 5 Comments

Update: The Sun and News of the World are now also blocking NewsNow.

Original post: I am sympathetic not sympathetic towards NewsNow over the newspapers' attempts to prevent it linking to them. The site has now been blocked from showing Times Online stories (more background here). But its campaign is a bit odd as it - and most of the media reporting the story - are confusing linking and crawling / indexing:

Newsnow's response to the Times

Newsnow's response to the Times

The right to link

Right2Link (founding sponsor NewsNow) asserts that:

We, the supporters of the right to link, declare:

  1. All should be free to create, forward and follow links — they are the signposts to content on the web.
  2. Linking should require no permission nor charge.
  3. The free circulation of publicly accessible information is threatened if individuals, organisations and search engines cannot continue to create, forward and follow links without undue restraint.

Crawling vs linking

What the Times has done, however, is block NewsNow from crawling its site using its robots.txt file. This is nothing to do with linking to its site.

In NewsNow's case, it requires crawling to do its linking, as it's an automatic news aggregator. So it needs to crawl / index the Times site to put its own site together.

But this afternoon's piece of pedantry from me is to point out that, for the rest of us, they are not the same. And nothing the Times has done today stops anyone from linking to them - the demand of the Right2Link campaign.

So a big red box about undermining access to public information seems irrelevant.

Update: What's more, NewsNow charges for some of its services so this has even less to do with public information ...

Oh wait, we are banned from linking, my mistake

Of course, if you look through the T&Cs of the Times site, it says:

Illegal and/or unauthorized use of the Services, including ... unauthorised framing of or linking to the Website is prohibited.

It's not very clear what unauthorised linking is, however. But I'd better not link to the T&Cs just in case! (This follows on from my posts last year in which I pointed out that most major newspapers and many other organisations forbade you from linking to them. SEO anyone?).

You might also like
  1. NewsNow vs the Times: I’ve decided I’m not sympathetic
  2. Sun blocks NewsNow from crawling its site
  3. Newspaper sites: don’t read or link to us …
  4. Comic relief: which news sites gave a proper link, and which didn’t?
  5. Sites that ban you from linking to them. Still. In 2010

Share this post

Follow me on Facebook or Twitter

5 Comments »

  • Your Mum says:

    Very nice article Malcolm, i am so proud of you.. have you cleaned your room? Oh and one other thing... if all of these media sites keep denying me and potentially thousands of others from linking to them who is monitoring all the incoming links?

  • [...] Right to crawl vs right to link » malcolm coles clueless newspapers say don’t link to us… [...]

  • Adam Newby says:

    Indeed, the right to crawl and the right to link are not the same, but are they so different? In order to publish a link, one must first obtain it. So even if one has a right to link freely, if one doesn't have the right to obtain those links, that right to link alone wouldn't be worth much. How does one obtain links? By starting somewhere on a site - probably the homepage - then navigating through the links. Looked at this way, crawling is just systematic linking.

    We recognise that with systematic linking, there is a risk of abusing a website's resources. Using robots.txt - a voluntary standard - to prevent this, is fair. However if sites start to use robots.txt in an arbitrary or discriminatory fashion, that's not so obviously fair. It's one thing NewsNow's user base not being able to reach Times Online content but if, for instance, as has been suggested, News Corp were to use robots.txt to block Google but allow Bing, that would be bad for users of the Internet in general. One of the strengths of any search engine is its neutrality: that the results have not been skewed to the commercial advantage of anyone to which it links.

    Also, although Times Online has imposed a crawler restriction via robots.txt, it should be emphasised that The Newspaper Licensing Agency Ltd's proposed 'licence' is about linking, not just crawling. This is shown by the existence of the "end user" licence, that intends to grant permission and impose charges on the customers of news monitoring organisations, for the apparent privilege of receiving or circulating links within their own organisations. Is this a slippery slope towards requiring all organisations pay a news tax for permission to "make commercial use" of newspaper content?

    On a separate note, the Times Online block applies only to NewsNow. Why would this be? If the argument is about paid-for services featuring links to The Times' content, then why has NI not targeted other organisations with paid-for services?

  • [...] like we missed the change going through, but blogger Malcolm Coles didn’t. As of this morning, NewsNow is locked out of Murdoch’s UK newspaper portfolio. [...]

  • Julian Burgess says:

    Glad to see an article which has got this correct. In my spare time I work on website which crawls a number of websites, I always obey the robots.txt, however it is perfectly possible to ignore it, or to move your crawler to a different IP address which isn't banned. When someone is crawling in a way which you object to for any reason then robots.txt is good way to indicate they should stop.

Leave a comment!

Add your comment below, or trackback from your own site. You can also subscribe to these comments via RSS.

Be nice. Keep it clean. Stay on topic. No spam.

You can use these tags:
<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

This is a Gravatar-enabled weblog. To get your own globally-recognized-avatar, please register at Gravatar.