WordPress is now one of the most popular cms solution out there majority of blogs and website out there uses it. It is particularly a very seo-friendly platform which you can use to achieve big result any time any day, here is a quick tips on how you can easily add better robots.txt configuration / settings to your wp-powered cms website or blog.
Adding robots.txt settings to a wordpress blog / website
This is one of the most ignored part by majority of wp users who just placed their hope on the fact that wordpress is a near-perfect cms solution that deals with search engine friendly metrics and as such do not need any form of SEO tweaks to serve better, This was the kinda mentality I nurtured initially when I started out on the wp platform till I stumbled on the robots.txt settings of the old mashable.com which elevated my personal curiosity for a perfect wordpress robots.txt settings. After searching till now 2013 I realized there is no such thing as perfect robots.txt configuration for any website at all irrespective of the platform it is built on since each website needs be configured individually based on how we want them indexed by search engines. Adding a working robots.txt to any blog / website including those on the wordpress platform will greatly improve the site’s performance on any popular search engine out there since you will be instructing the search engines on how to index the blog / website, If you ask for my opinion on this I will authoritatively explain this as one of those wordpress seo performance tweaks out there you wouldn’t want to miss.
How to add robots.txt settings / configuration to a wordpress blog / website :
This is dead easy for any one to implement doesn’t matter if you are a newbie wordpress user or pr0-wp user all you need do is this;
- open a blank notepad
- copy and paste this simple robots.txt configuration to it bearing in mind that each site must be configured individually based on how you want them indexed in search engines
here is the text configs to be copied for a wordpress blog
User-agent: IRLbot
Crawl-delay: 3600
User-agent: Alexibot
Disallow: /
User-agent: Aqua_Products
Disallow: /
User-agent: asterias
Disallow: /
User-agent: b2w/0.1
Disallow: /
User-agent: BackDoorBot/1.0
Disallow: /
User-agent: BlowFish/1.0
Disallow: /
User-agent: Bookmark search tool
Disallow: /
User-agent: BotALot
Disallow: /
User-agent: BotRightHere
Disallow: /
User-agent: BuiltBotTough
Disallow: /
User-agent: Bullseye/1.0
Disallow: /
User-agent: BunnySlippers
Disallow: /
User-agent: CheeseBot
Disallow: /
User-agent: CherryPicker
Disallow: /
User-agent: CherryPickerElite/1.0
Disallow: /
User-agent: CherryPickerSE/1.0
Disallow: /
User-agent: Copernic
Disallow: /
User-agent: CopyRightCheck
Disallow: /
User-agent: cosmos
Disallow: /
User-agent: Crescent Internet ToolPak HTTP OLE Control v.1.0
Disallow: /
User-agent: Crescent
Disallow: /
User-agent: DittoSpyder
Disallow: /
User-agent: EmailCollector
Disallow: /
User-agent: EmailSiphon
Disallow: /
User-agent: EmailWolf
Disallow: /
User-agent: EroCrawler
Disallow: /
User-agent: ExtractorPro
Disallow: /
User-agent: FairAd Client
Disallow: /
User-agent: Flaming AttackBot
Disallow: /
User-agent: Foobot
Disallow: /
User-agent: Gaisbot
Disallow: /
User-agent: GetRight/4.2
Disallow: /
User-agent: Harvest/1.5
Disallow: /
User-agent: hloader
Disallow: /
User-agent: httplib
Disallow: /
User-agent: HTTrack 3.0
Disallow: /
User-agent: humanlinks
Disallow: /
User-agent: InfoNaviRobot
Disallow: /
User-agent: Iron33/1.0.2
Disallow: /
User-agent: JennyBot
Disallow: /
User-agent: Kenjin Spider
Disallow: /
User-agent: Keyword Density/0.9
Disallow: /
User-agent: larbin
Disallow: /
User-agent: LexiBot
Disallow: /
User-agent: libWeb/clsHTTP
Disallow: /
User-agent: LinkextractorPro
Disallow: /
User-agent: LinkScan/8.1a Unix
Disallow: /
User-agent: LinkWalker
Disallow: /
User-agent: LNSpiderguy
Disallow: /
User-agent: lwp-trivial/1.34
Disallow: /
User-agent: lwp-trivial
Disallow: /
User-agent: Mata Hari
Disallow: /
User-agent: Microsoft URL Control - 5.01.4511
Disallow: /
User-agent: Microsoft URL Control - 6.00.8169
Disallow: /
User-agent: Microsoft URL Control
Disallow: /
User-agent: MIIxpc/4.2
Disallow: /
User-agent: MIIxpc
Disallow: /
User-agent: Mister PiX
Disallow: /
User-agent: moget/2.1
Disallow: /
User-agent: moget
Disallow: /
User-agent: Mozilla/4.0 (compatible; BullsEye; Windows 95)
Disallow: /
User-agent: MSIECrawler
Disallow: /
User-agent: NetAnts
Disallow: /
User-agent: NICErsPRO
Disallow: /
User-agent: Offline Explorer
Disallow: /
User-agent: Openbot
Disallow: /
User-agent: Openfind data gatherer
Disallow: /
User-agent: Openfind
Disallow: /
User-agent: Oracle Ultra Search
Disallow: /
User-agent: PerMan
Disallow: /
User-agent: ProPowerBot/2.14
Disallow: /
User-agent: ProWebWalker
Disallow: /
User-agent: psbot
Disallow: /
User-agent: Python-urllib
Disallow: /
User-agent: QueryN Metasearch
Disallow: /
User-agent: Radiation Retriever 1.1
Disallow: /
User-agent: RepoMonkey Bait & Tackle/v1.01
Disallow: /
User-agent: RepoMonkey
Disallow: /
User-agent: RMA
Disallow: /
User-agent: searchpreview
Disallow: /
User-agent: SiteSnagger
Disallow: /
User-agent: SpankBot
Disallow: /
User-agent: spanner
Disallow: /
User-agent: suzuran
Disallow: /
User-agent: Szukacz/1.4
Disallow: /
User-agent: Teleport
Disallow: /
User-agent: TeleportPro
Disallow: /
User-agent: Telesoft
Disallow: /
User-agent: The Intraformant
Disallow: /
User-agent: TheNomad
Disallow: /
User-agent: TightTwatBot
Disallow: /
User-agent: toCrawl/UrlDispatcher
Disallow: /
User-agent: True_Robot/1.0
Disallow: /
User-agent: True_Robot
Disallow: /
User-agent: turingos
Disallow: /
User-agent: TurnitinBot/1.5
Disallow: /
User-agent: TurnitinBot
Disallow: /
User-agent: URL Control
Disallow: /
User-agent: URL_Spider_Pro
Disallow: /
User-agent: URLy Warning
Disallow: /
User-agent: VCI WebViewer VCI WebViewer Win32
Disallow: /
User-agent: VCI
Disallow: /
User-agent: Web Image Collector
Disallow: /
User-agent: WebAuto
Disallow: /
User-agent: WebBandit/3.50
Disallow: /
User-agent: WebBandit
Disallow: /
User-agent: WebCapture 2.0
Disallow: /
User-agent: WebCopier v.2.2
Disallow: /
User-agent: WebCopier v3.2a
Disallow: /
User-agent: WebCopier
Disallow: /
User-agent: WebEnhancer
Disallow: /
User-agent: WebSauger
Disallow: /
User-agent: Website Quester
Disallow: /
User-agent: Webster Pro
Disallow: /
User-agent: WebStripper
Disallow: /
User-agent: WebZip/4.0
Disallow: /
User-agent: WebZIP/4.21
Disallow: /
User-agent: WebZIP/5.0
Disallow: /
User-agent: WebZip
Disallow: /
User-agent: Wget/1.5.3
Disallow: /
User-agent: Wget/1.6
Disallow: /
User-agent: Wget
Disallow: /
User-agent: wget
Disallow: /
User-agent: WWW-Collector-E
Disallow: /
User-agent: Xenu's Link Sleuth 1.1c
Disallow: /
User-agent: Xenu's
Disallow: /
User-agent: Zeus 32297 Webster Pro V2.9 Win32
Disallow: /
User-agent: Zeus Link Scout
Disallow: /
User-agent: Zeus
Disallow: /
User-agent: Adsbot-Google
Disallow:
User-agent: Googlebot
Disallow:
User-agent: Mediapartners-Google
Disallow:
User-agent: *
Disallow: /cgi-bin/
Disallow: /wp-admin/
Disallow: /wp-includes/
Disallow: /wp-content/plugins/
Disallow: /wp-content/cache/
Disallow: /wp-content/themes/
Disallow: /wp-login.php
Disallow: /wp-register.php
Disallow: /feed/
Disallow: /comments/
Disallow: /author/
Disallow: /archives/
Disallow: /20*
Disallow: /trackback/
Sitemap: http://myblogurl.com/index.xml
enter your mobile site url [ User-agent: Googlebot-Mobile
Allow: /] after the last / here if it is different from your parent blog url . eg [ User-agent: Googlebot-Mobile
Allow: /] changed to [ User-agent: Googlebot-Mobile
Allow: /?mobile] using mine as example .
you can remove the lines where you have;
Disallow: /archives/
Disallow: /author/
Disallow: /feed/
Disallow: /20*
Disallow: /images*
If you want them indexed by search engines, they were added in a bid to prevent duplicate contents in search engine results . For others who want only their post topics and contents indexed by search engines you can add other lines in your robots.txt to prevent it eg.
Disallow: /tag/
Disallow: /category/
Disallow: /next/
For others whose server load is suffering and so wishes to reduce the rate or frequency which search engine visits their blog, you can tweak this lines
User-agent: IRLbot
Crawl-delay: 3600
with any figure of your choice in seconds. Note that 3600 is equal to 1hr, so you can simply use this format to either reduce the frequency or increase it to days, weeks months etc. depending on how frequent your blog is updated.
here are the text configs to be copied for a wordpress cms powered website :
User-agent: IRLbot
Crawl-delay: 3600
User-agent: *
Disallow: /cgi-bin/
Disallow: /wp-admin/
Disallow: /wp-includes/
Disallow: /wp-content/plugins/
Disallow: /wp-content/themes/
Disallow: /wp-content/cache/
Disallow: /wp-login.php
Disallow: /wp-register.php
Sitemap: http://mywbsiteurl.com/index.xml
tweak accordingly following the previous mentions above. Also note that you can actually upload your images in two folder for sites that has images that are private and the ones that should be indexed adding the folder that contains the private images to your disallowed list.
- Save your notepad content as robots.txt
- log in to your control panel and locate the file manager ==> upload the notepad content to your root directory where you have the wordpress installed
- close and enjoy your new active robots.txt settings
Please remember to replace Sitemap: http://myblogurl.com/index.xml with your blog / website current sitemap link.
If you do not have access to your cpanel and wishes to add the robots.txt settings to your wordpress cms powered blog or website using a wordpress plugin, you can easily do so by visiting the wordpress plugin repository for WP Robots Txt that gives you the super privilege to add and edit your wp robots.txt configuration from your wordpress admin dashboard.
Also see:
- Why skip Installing this 7 most important wordpress Plugins? and
- How to add Numeric Pagination to Mobilepress Default Theme
The robot.txt and word press, both are well known elements in the world of online business. By using these source a web based business become easy to control and also get desired task from business promotion.
Nice tutorial, I think am gonna tweak mine as well so I can curb all these bots visiting my site. Thumbs up! Man
wow Obasi! your knowledge is incredible! You know so much about coding! I hope to understand coding more in the future to help with my blogs:))
You really know your coding … did you go to school for this or are you self taught?
Thanks for dropping by Cheryl, you really did found this very wowing! but this is nothing much but a little research on malicious bots that are known to cause harm on any website coupled with the regular settings on wordpress official codex. Thanks for the compliments though
Oh wow … this content is so above where I am at, will have to come back to this when it sounds less Latin-like :)
Now Kasey, your last lines got me really cracking
Wow such great advice, thanks so much for sharing this knowledge. I”m off to update my blogs right now! I have a question: is WordPress a generally good program to use? I am using it and find it very good, but I have also heard that it is more difficult to place paid advertising on it.
Thanks for dropping by Emily, This part [ it is more difficult to place paid advertising on it ] is most funniest piece I have read in a very long time frankly speaking I will pledge to the fact that wp is the best platform for accepting and placing private advertisement. Premium Plugins like ads-rotator even makes that pretty easier
that’s okay Don, hope you replaced Sitemap: http://myblogurl.com/index.xml with your own sitemap url ?
yea…. this is a great tutorial dude …. i just changed mine to this….. am loving it in here
How lovely would it be if it can work on blogger platform… Love this !!!
Thanks for dropping by bro, it can actually work on blogger with just a little more tweak. But the question is “why bother when master google is taking care of it for you” ?
Nice one bro, but i dont know most of those bots. This is great, i think am going to use it.
Yeah there are many other wicked bots out there a hacker friend of mine actually gave me some tools with most of this bots including the “User-agent: HTTrack 3.0” that downloads your website files without your knowledge