Home Services Contact Info

Are You Confusing Search Engine Bots?

Posted in: Blogs,Search Engine Optimisation by Richard Hearne on April 3, 2007
Internet Marketing Ireland

There are two primary factors to getting a page ranked – discovery and relevancy. By and large, search engines are clever creatures, but the very best webmasters will always send out the right signals to gently guide the search engines, and in return receive great rankings for their content.

Search engines discover content using their bots (or ‘crawlers’), and determine relevancy (and by extension ranking) using advanced algorithms.

Discovery

The golden rule of SEO is that search engines cant rank a page they don’t know about. This is what makes discovery is so important. The most natural way for a search engine to discover a new resource is by crawling a link pointing at that content. So to get any new resource crawled quickly you should get a few links from other sites that are crawled regularly. (The major search engines have a sitemap initiative, but remember that without a solitary link Google will not index your content regardless of sitemap).

Relevancy

Getting your page crawled is less than half the battle. Now comes the hard part – ranking well. The second factor that determines whether your page ranks well is relevancy. Relevancy is determined by search engine algorithms which decide the order to display results to searchers. A number of on-site and off-site factors are incorporated into the relevancy determination which I’ll look at in a moment. (Trust could be also be dropped into the mix here, but I’m assuming that away for the moment).

How can you guide the search engines?

Webmasters actually have the greatest say in signalling for both discovery and relevancy. I use the term signalling because that’s really what SEO is all about – sending the right signal to the search engines.

To explain more about signals I’m going to have a look at another of the Irish Blog Award nominee sites which availed of the free site review offer.

First Partners

I met Paul Browne at the Irish Blog Awards a few weeks back. Paul writes regularly on his technology-themed First Partners blog:

First Partners Blog

Back to relevancy

The page title is probably one of the most important on-page elements used by search engines to determine the relevancy of your web pages. By and large you should target 1-3 keyword phrases, and bear in mind that most searches are around 3 words in length.

In the case of Paul’s blog homepage I notice that he is using dynamic titles which include the title of the most recent post. This in my view is a mistake – the homepage page title is about as sacred as it gets, and you don’t want it changing every day or so. I think Paul should concentrate on the main focus of his blog, whatever niche that might be, and use that in his blog homepage title.

The canonical URL problem (again)

I’m probably beginning to sound like a broken record. The canonical URL problem is a condition where your site or page is accessible by typing either of the following into your browser:

www.mysite.com

or

mysite.com

(notice this second case drops the www)

If you can reach your page via either URL AND the URL in the address bar does not change your site is suffering from the canonical URL problem.

In Paul’s case his site is accessible via both the www and non-www URLs. To fix this problem you need to redirect one URL to the other with a 301 redirect.

Don’t use 302 redirects for your homepage

When checking Paul’s blog I noticed that the FirstPartners.net homepage had a Toolbar PR0. This is odd given that the blog has PageRank 5. Then I noticed that the root page is redirecting to firstpartners.ie/rp/:

http://www.firstpartners.net/

GET / HTTP/1.1
Host: www.firstpartners.net
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-GB; rv:1.8.1.3) Gecko/20070309 Firefox/2.0.0.3
Accept: text/xml,application/xml,application/xhtml+xml,[... ]png,*/*;q=0.5
Accept-Language: en,en-us;q=0.5
Accept-Encoding: gzip,deflate
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
Keep-Alive: 300
Connection: keep-alive
Cookie: __utma=67859462.28111[... ]__utmc=67859462

HTTP/1.x 302 Found
Date: Mon, 02 Apr 2007 16:35:35 GMT
Server: Apache/2.0.52 (Red Hat)
Location: http://www.firstpartners.net/rp/
Content-Length: 304
Keep-Alive: timeout=15, max=100
Connection: Keep-Alive
Content-Type: text/html; charset=iso-8859-1
X-Pad: avoid browser bug

If the homepage is going to stay there then I suggest changing that to a 301 redirect. The most probable reason why the temporary homepage is currently PageRank 0 is that it has few if any backlinks. The backlinks Paul has accumulated point at www.firstpartners.net rather than www.firstpartners.net/rp/, and Google doesn’t realise that /rp/ is now the homepage. No 301 = No transferral of links and trust

And some advice for the blog?

I had a few ideas when I looked at Paul’s blog. I found that the page weight was a little too beefy, with the blog homepage weighing in at 800KB+ on one occasion last week. I also thought that Paul could cut the number of posts published per page to a more manageable number. And I even considered whether NOFOLLOWing some of the internal links (e.g. the cloud) might help.

But I can safely scrap all that advice for one simple suggestion: give each and every blog post a unique META description.

When I looked at all the pages in the supplemental index it was instantly apparent that Paul wasn’t using META descriptions:

First Partners supplemental index

You can see that Google is picking up boilerplate content for every snippet. I’d be willing to bet that at least some of the 265 pages in supplemental will pop out if they have a unique description META.

I did spend a short amount of time looking at the backlink profile for the blog and the majority of links use the anchor “Paul Browne – Technology in plain English”. I reckon Paul probably ranks well for his name (he had a thread about his on-line doppleganger but I couldn’t find it). I think some diversification of the link anchor could pay off – non-diverse backlink anchors may actually raise a flag that could damage your site.

So find that niche and push it in your titles and anchors. In Paul’s case that niche should be highly relevant to his company’s products and services. I’ll leave the idea generation to Paul.

So to recap my advice

  1. Fix the blog homepage title
  2. Sort the canonical URL
  3. Change the root page 302 redirect
  4. Assign unique META descriptions to each blog post

Previous posts in this series:

Seo for Blogspot (Blogger) blogs: Helping The Blogspot Bloggers – A Tough Test To SEO Blogspot
PhotoBlog SEO: SEO For Thin Content Sites – Making A P h o t o B l o g More Visible
Corporate Blogging SEO: Putting Some Fizz Into Bubble Brothers – Beware Of Corporate Blogspot Blogs

You should subscribe to the RSS Feed here for updates.
Or subscribe to Email Updates now:

8 Comments »

  1. Richard,

    Thanks for the advice. They’re all simple things (that you’ve said many times before :-) and should be easy to fix.

    Paul

    Comment by Paul Browne - Technology in Plain English — April 3, 2007 @ 9:33 am

  2. Hi Paul

    I tend to find that the same issues come up again and again on sites I check. There are a few other things that you could do (like NOFOLLOWing some of the internal links to category pages and the like), but I reckon just the main 4 or 5 fixes should make a difference.

    Sorry for the delay in getting it done. I’m hoping to get at least 3 more done this week though for anyone else who is waiting.

    Rgds
    Richard

    Comment by Richard Hearne — April 3, 2007 @ 9:40 am

  3. Something I’ve been meaning to ask- is there a canonical issue with blogspot? I know there’s no www. in the URL but I’ve noticing links for http://greeninkpen.blogspot.com/ and also http://greeninkpen.blogspot.com/index.html . Does this make a difference to Google?

    Comment by Green Ink — April 3, 2007 @ 9:49 am

  4. Hello Green Ink

    Technically what you describe is duplicate content – two URLs which both resolve tot he same content. Google will simply filter one and show the other.

    You need not be concerned with this unless you find that your external links are split between / and /index.html.

    Try to make sure that when you’re placing links that they point at the former. I think that the Blogspot banner actually points at /index.html by default but I’m not sure if you can change this manually.

    Not a lot you can do with Blogspot unfortunately, but not too much to worry about in this case (fortunately).

    Rgds
    Richard

    Comment by Richard Hearne — April 3, 2007 @ 10:24 am

  5. Richard, why would you nofollow internal linkage? Is it simply to encourage the crawler to focus on one page at a time? I was led to believe that deep linking to your own content for keywords was a positive thing.

    And also what relevancy does the page title of this blog post -
    Are You Confusing Search Engine Bots? – have to do with the blog post? I was expecting a post about the different types of bots. Just curious.

    Cormac

    Comment by Cormac Moylan — April 3, 2007 @ 10:58 am

  6. Hi Cormac

    NOFOLLOWs can help when you have multiple categories/tags that might result in duplicate content. Lets say you were using tags and categories – each might have slightly different content but probably a fair proportion of duplicate content.

    Consider the archives – in aggregate they will contain the same posts as the categories so you could NOFOLLOW those links and still retain a path to every post. I wouldn’t advise this all the time, but when you have a large number of pages in the supplemental index it may pay off.

    Obviously I didn’t choose a great title for the post! But basically I think that the SEs will have a hard time ranking Paul’s blog for a primary phrase when his main title change every time he posts something new. Also the 302 redirect from his root page wasn’t helping. In all I felt that his site might well be confusing the bots, hence the title. I wish I had a bit more time to review the posts, but things are quite hectic these days.

    Sorry for any confusion caused :grin:

    Rgds
    Richard

    Comment by Richard Hearne — April 3, 2007 @ 12:05 pm

  7. Ah, I never associated this blog title with Paul’s title issues.
    Cheers for hitting that one home :)

    Comment by Cormac Moylan — April 3, 2007 @ 1:45 pm

  8. My fault really – the post was written over a number of sessions during a one week period. Generally I write better stuff if I complete in a single session… roll on cloning and I’ll get 2 of me :mrgreen:

    Rgds
    R

    Comment by Richard Hearne — April 3, 2007 @ 1:58 pm

Comments Feed TrackBack

Leave a comment