Home Services Contact Info

Spiderability is the First Step to Search Engine Nirvana

Posted in: Blogs,Search Engine Optimisation by Richard Hearne on July 4, 2007
Internet Marketing Ireland

I have a post brewing about taking things for granted. We’re all guilty of it at some time or other.

The most needy review to date

I’ve been (slowly) working my way through some blog SEO posts to help those folk who took up my offer of some free consulting. Today I’m looking at a site that really will benefit from some basic SEO tips. In fact, I think this is definitely the site most needy of help I’ve looked at to date (although the SEO for Blogspot folk are fighting hard for that particular crown).

Basic Seo starts with letting the bots in

Search engines rely on bots (or ‘spiders’) to crawl the Internet and collect the information you publish on your websites. These bots are basically slimmed down web browsers whose modus-operandi is extremely simple – crawl content, find links, save content, follow links, crawl content, find links…. That’s there one and only job. Ok, they do a few other things while they’re at it, but that gives the basic gist of what the bots do.

Bots don’t like Javascript

So the we’ve learned that search engine bots are slimmed down web browsers. One of the things they don’t do is Javascript. So here’s the first rule of spiderability:

1. Don’t use Javascript to create navigational components.

Use good old plain HTML. That’s what it’s for so make use of it.

META refresh gives a strong spam signal

A few years back spammers started to use META refreshes to spam the search engines. META refresh is a small piece of code that is inserted into the <head> element of a HTML document. It basically tells your browsers to go to a new location after a predetermined number of seconds. Spammers used this because the bots didn’t actually enforce the rule when they found it – they simply crawled the content on the page and returned that to the search engine for indexing. The spammer could simply place nice search engine friendly text on the initial page which would rank nicely and as soon as a human visitor came along and visited this page they would be instantly redirected to the spammers real page. The search engines didn’t like this. So the second rule is very simple:

2. Don’t use META refresh when the proper and upstanding thing to do is issue the correct header response.

Every page includes a HTTP header detailing the manifest of the page. This header includes a HTTP response code that tells the browser what to do with the page. The well known 301 redirect is simply a code that is passed as a header response that tells the browser (accessing agent) that the location of the resource requested has changed permanently, and to go to the new location. It is fairly trivial to send header responses – on Apache based systems redirects are relatively easy to set up using the .htaccess file with mod_rewrite.

Some Irish language lessons

So perhaps it would be appropriate to introduce the site that I’m looking at today: siopaeile.com. ‘Shiopa eile’ is Irish for ‘another shop’ and the blog in question is the brainchild of Paul O Mahony.

Siopa Eile shopping blog
Siopa Eile Shopping Blog

I think the most interesting thing (from my point of view) about siopaeile.com is that it is not indexed in Google. Siopaeile.com is therefore getting zero traffic from Google. Given that Google is often the number #1 referrer for many websites, Paul is really starting with a blank page. So here’s the advice I would give to Paul in order to better optimise his site.

The SEO tips

The siopaeile.com blog can be found in a subdirectory called ‘blog’. Currently the root index page contains a nasty META refresh into that directory. Here’s what the root page returns:

<html>
<meta HTTP-EQUIV=”REFRESH” content=”0; url=http://siopaeile.com/blog/”>
<p>
Please wait as you get redirected to <a href=”http://siop[... ]“>http://siopaeile.com/blog</a>.
</p>
</html>

[NB - edits my own]

Now I would say that as a matter of urgency this needs to be change:

Tip no. #1:

If nothing will be published in the root directory then move the entire blog up into the level. In general deeper means less important, so the closer to the root the more important the content appears to the search engines. I would normally say that all pages should be redirected after the move, but in this case nothing is indexed. Any inbound links should be redirected to their new homes though.

Moving the blog into the root may require some higgery-pokkery in WordPress, but I think it would be well worth it.

If it is not possible to move the blog into the root directory then I would suggest removing the META refresh and adding a 301 redirect into the .htaccess file.

The .htaccess file can be found in the root directory. You can use an FTP program (such as the free FileZilla) to grab this file and re-upload once you’ve finished editing. Here’s the code that needs to be included:

<IfModule mod_rewrite.c>
Options +FollowSymlinks
RewriteEngine on
RewriteCond %{HTTP_HOST} !^siopaeile\.com [NC]
RewriteRule ^/$ http://siopaeile.com/$1 [R=301]
# that redirects www.siopaeile.com/ to siopaeile.com/
# and www.siopaeile.com/blog/ to siopaeile.com/blog/
# but wouldn’t redirect www.siopaeile.com/ to siopaeile.com/blog/
# so fall through to redirect root to /blog/
RewriteRule ^/$ http://siopaeile.com/blog/$1 [R=301,L]
# that redirects root / to siopaeile.com/blog/
<IfModule mod_rewrite.c>

BIG FAT WARNINGI know enough to get around .htacess and mod_rewrite, but I always tell folk to test the code very well. For me personally mod_rewrite is one of the most difficult aspects of my job, and very often I have to experiment to get the code right. Get it wrong and your sever is likely going to bang out 500 errors to beat the band.

So why no indexing in Google?

I have to be honest here. I first started writing this post some time ago. I even emailed Paul to ask him about the META refresh (he probably thinks I’m either quite mad or incredibly useless for taking this long to actually write about his site…).

When I first looked at his backlinks in Yahoo! there seemed to be none (but take it from me – never, ever put your entire faith into Yahoo’s SiteExplorer tool). That would normally explain the issue – Google wont index a site unless it finds at least one external link to that site. That external link must be FOLLOWed (i.e. without a rel="nofollow" attribute), and it may be the case that unless the link originates on a semi-trusted site it will be ignored.

Well Yahoo! is reporting quite a few links now. The few links that I checked were from Paul commenting on other people’s blogs. Commenting and interacting with others is a great way to get attention and traffic. But you will not get any Search Engine benefits if the link you acquire from other sites is NOFOLLOWed. Unfortunately for Paul this was the case on the pages I checked.

Tip #2:

Paul needs good text-rich anchored FOLLOWed links (like shopping blog), preferably from on-theme websites (unlike mine).

Use the tools Google gives you

Google has been far and away the most progressive Search Engine in terms of informing webmasters about their sites status. The Webmaster Console can give valuable data to a webmaster enabling you to diagnose all sorts of issues. In Paul’s case the console will likely not yield much information (Google appears to be completely oblivious to his site). The console may give up one useful piece of info in instances where your site is not appearing in Google – Penalty notification.

If your site is under any penalty you will be notified within the console. That’s pretty cool because if you inadvertently broke the guidelines (and got caught :mrgreen:) this tool not only informs you, but it also allows you to file a re-inclusion request after you’ve fixed up the offending material.

Tip #3:

Make use of Google Webmaster Console to appraise your site condition and diagnose any issues with crawlabilty and HTTP errors.

Just a quick note: I’m not suggesting that Paul’s site is under a penalty. My gut tells me it’s a spiderability and link issue.

So any other tips?

Well I would strongly suggest reading some of my previous posts in this ‘series’. Many contain tips that can be followed on any site:

  1. Getting you site out of supplemental index (Krishna De)
  2. Page titles and SEO (First Partners)
  3. SEO for Blogspot (Blogger) sites (multiple sites)
  4. SEO for photoblogs (McAwilliams)
  5. Corporate Blogging SEO (Bubble Brosthers)
  6. PageRank Flow, Comment Feeds in Supplementals, NoFollow & robots.txt (BifSniff)

Apart from the above, there are many other tweaks that Paul can make. I would certainly include post titles (they seem to be missing from the homepage) along with links straight through to the actual post. I would also consider NOFOLLOWing all the links to the social media site and Technorati tags. I noticed a few other actionable items, but the first and foremost priority is letting the search engine bots into the site and getting the pages properly indexed. Hope that helps Paul.

To the others who are still on my list

I have to admit when I made my offer I sort of knew it was a little risky. I thought I’d spend a short amount of time on each site and zip through the reviews. I’m finding that I’m actually spending multiple hours on each site, and being just a mere one-man-show means that I often have to drop the review mid-sentence to work with clients (and I haven’t been short of work thankfully). But I do promise that everyone will get a review. So here are the blogging heavy-hitters that still await their reviews:

  1. http://www.argolon.com/
  2. http://blog.roam4free.ie/
  3. http://www.mneylon.com/blog
  4. http://www.headrambles.com
  5. http://www.mediangler.com

Slow but steady progress (emphasis on the slow – sorry guys)

You should subscribe to the RSS Feed here for updates.
Or subscribe to Email Updates now:

11 Comments »

  1. The link for ‘SEO for blogspot is empty…

    Comment by Colm — July 4, 2007 @ 4:13 pm

  2. TY Colm – fixed now.

    Rgds
    Richard

    Comment by Richard Hearne — July 4, 2007 @ 4:50 pm

  3. Richard

    I’m sure you’ll get round to us all eventually, though I was wondering why you didn’t actually hyperlink to the blogs at the end of your post

    Michele

    Comment by Michele — July 4, 2007 @ 6:56 pm

  4. You want the truth, the whole truth and noth…

    I copy pasted that from a previous post. I should really have stuck in the links and perhaps pinged you. But then again it means that you’ll get a post all to yourself when I get around to your blog. In fact I think I’ll be building myself up for when it’s your particular turn :grin: I might have to pull out some stops for that little sojourn.

    Rgds
    Richard

    Comment by Richard Hearne — July 4, 2007 @ 7:10 pm

  5. Richard

    Oh dear! That sounds really bad!

    I’m dreading it now :(

    Michele

    Comment by Michele — July 4, 2007 @ 7:13 pm

  6. Many thanks for your efforts Richard! Give me a few days to assimilate;-)

    Comment by Paul — July 4, 2007 @ 8:47 pm

  7. You’re very welcome Paul

    Sorry it took so long to get something actually published. I just hope that their will be something useful in the post for you.

    Best rgds
    Richard

    Comment by Richard Hearne — July 4, 2007 @ 9:11 pm

  8. It’s probably worth noting also that a handful of links to the blog would go a long way in getting the thing indexed.

    I agree the meta refresh has to go, absolutely.

    Comment by Harvey — July 5, 2007 @ 9:03 pm

  9. Hi Harvey

    Yep that’s very true. I probably could have been more explicit on that point, but I hope that point was implicit in my post. Thanks for dropping b and commenting.

    Best rgds
    Richard

    Comment by Richard Hearne — July 5, 2007 @ 11:13 pm

  10. Ok Have had a chance to digest now. Regarding Meta Refresh, I asked my hosting service was there anyway that I could have Siopaeile.com redirecting to Siopaeile.com/blog. This was the quick fix solution. Can’t really complain as I’m getting the blog hosted for free, but…
    I do actually have over half a dozen links from popular bloggers, so I’m going to check out the webmaster console and see what that throws up.

    Comment by Paul — July 6, 2007 @ 10:27 pm

  11. Hey Paul

    You need to get rid of that as #1 priority. You should have an ftp login and password. Or perhaps a hosting control panel. You need t go looking for a file called .htaccess and write the rules I gave into it.

    If you want me to help out quickly shoot me a mail and I’ll try to take a look.

    Rgds
    Richard

    Comment by Richard Hearne — July 6, 2007 @ 10:31 pm

Comments Feed TrackBack

Leave a comment