Robots.txt Guide for WordPress – Avoid Duplicate Content
Today, I got an Instant Message on msn from a regular reader. They suggested that I should write a decent article on Robots.txt because he was searching and could not find a good one. So I decided that would make a good topic on Balkhis SEO Section. First what you should do is view my Robots.txt. Now you can copy and paste the entire thing for all I care. But it wouldn’t make sense if you don’t understand what it is doing.
The main purpose of Robots.txt is controlling the Search Engine bots. This file single handedly controls what Search Engine bots can index and what they can’t index. This file plays an important role in avoiding duplicate content.
Hint:
You use Disallow: to Disallow files. (Disallow: /page/)
You use Allow: to Allow files (Allow: /about/)
The main thing you need to know in the robots.txt is that the $ sign at the end means file extension. So like I have on Balkhis /*.css$.
Now lets go onto analyze some of the important parts of my feeds that deals with Duplicate Content.
Disallow: /category/ – This code prevents a whole heck of duplicate content. Because your category contains the exact same thing as a single post page does. So you don’t want bots to see this.
Disallow: /page/ – I have mentioned this multiple times that archives are duplicate content. Pretty Obvious. So add this one as well.
Disallow: /tag/ – I don’t know if you are using tag or not. Just add it incase if you ever do decide to use it. I have tags on my Archive page and my search page, so I have it there. Because content categorized by tags are still the same content.
Disallow: */feed/ – Personally I feel that users should pay more attention to my blog rather than my feeds. So I have all feeds blocked.
Disallow: /*.js$
Disallow: /*.inc$
Disallow: /*.css$
I don’t think that spiders should be allowed to see any of your javascripts, css, or include files let alone letting them index it. So block these off as well.
Disallow: /*? – This code doesn’t index any url that have a ? mark in it. So use this one also.
Now I hope you know what my robots.txt is doing. Now feel free to use it as a sample one for your site.






















Hey, I am Syed Balkhi, The guy who is behind Balkhis Inc. I entered the industry back in 2002 not knowing a single thing. I barely spoke English at that time. In the past six years, my language barrier has been eliminated. Aside from English, now I also speak html, and php. Along with the languages I have also managed to master a few arts. Art of web-designing started when I first entered. Messing around with photoshop, I learned how to create my first web design. Now I founded a web designing firm Uzzz Productions. After running numerous amount of websites in various niche, I have mastered the art of web-development. Now I am compiling a resource of what I already know, and what I am learning on this blog. This resource is to help me if I ever need a guide to look back to, and it is help my fellow webmasters.



