WordPress Robots.txt Guide: What It Is and How to Use It (2022)

Ever heard the term robots.txt and wondered how it applies to your website? Most websites have a robots.txt file, but that doesn’t mean most webmasters understand it. In this post, we hope to change that by offering a deep dive into the WordPress robots.txt file, as well as howit can control and limit access to your site. By the end, you’ll be able to answer questions like:

  • What is a WordPress Robots.txt?
  • How Does Robots.txt Help My Website?
  • How Can I Add Robots.txt To WordPress?
  • What Sorts Of Rules Can I Put In Robots.txt?
  • How Do I Test My Robots.txt File?
  • How Do Big WordPress Websites Implement Robots.txt?

There’s a lot to cover so let’s get started!

What Is a WordPress Robots.txt?

Before we can talk about the WordPress robots.txt, it’s important to define what a “robot” is in this case. Robots are any type of “bot” that visits websites on the Internet. The most common example is search engine crawlers. These bots “crawl” around the web to help search engines like Google index and rank the billions of pages on the Internet.

So, bots are, in general, a good thing for the Internet…or at least a necessary thing. But that doesn’t necessarily mean that you, or other webmasters, want bots running around unfettered. The desire to control how web robots interact with websites led to the creation of the robots exclusion standard in the mid-1990s. Robots.txt is the practical implementation of that standard – it allows you to control how participating bots interact with your site. You can block bots entirely, restrict their access to certain areas of your site, and more.

That “participating” part is important, though. Robots.txt cannot force a bot to follow its directives. And malicious bots can and will ignore the robots.txt file. Additionally, even reputable organizations ignore some commands that you can put in Robots.txt. For example, Google will ignore any rules that you add to your robots.txt about how frequently its crawlers visit. If you are having a lot of issues with bots, a security solution such as Cloudflare or Sucuri can come in handy.

Why Should You Care About Your Robots.txt File?

For most webmasters, the benefits of a well-structured robots.txt file boil down to two categories:

  • Optimizing search engines’ crawl resources by telling them not to waste time on pages you don’t want to be indexed. This helps ensure that search engines focus on crawling the pages that you care about the most.
  • Optimizing your server usage by blocking bots that are wasting resources.

Robots.txt Isn’t Specifically About Controlling Which Pages Get Indexed In Search Engines

Robots.txt is not a foolproof way to control what pages search engines index. If your primary goal is to stop certain pages from being included in search engine results, the proper approach is to use a meta noindex tag or another similarly direct method.

This is because your Robots.txt is not directly telling search engines not to index content – it’s just telling them not to crawl it. While Google won’t crawl the marked areas from inside your site, Google itself statesthat if an external site links to a page that you exclude with your Robots.txt file, Google still might index that page.

John Mueller, aGoogle Webmaster Analyst, has also confirmed that if a page has links pointed to it, even if it’s blocked by robots.txt, might still get indexed. Below is what he had to say in a Webmaster Central hangout:

One thing maybe to keep in mind here is that if these pages are blocked by robots.txt, then it could theoretically happen that someone randomly links to one of these pages. And if they do that then it could happen that we index this URL without any content because its blocked by robots.txt. So we wouldn’t know that you don’t want to have these pages actually indexed.

Whereas if they’re not blocked by robots.txt you can put a noindex meta tag on those pages. And if anyone happens to link to them, and we happen to crawl that link and think maybe there’s something useful here then we would know that these pages don’t need to be indexed and we can just skip them from indexing completely.

So, in that regard, if you have anything on these pages that you don’t want to have indexed then don’t disallow them, use noindex instead.

How To Create And Edit Your WordPress Robots.txt File

By default, WordPress automatically creates a virtual robots.txt file for your site. So even if you don’t lift a finger, your site should already have the default robots.txt file. You can test if this is the case by appending “/robots.txt” to the end of your domain name. For example, “https://kinsta.com/robots.txt” brings up the robots.txt file that we use here at Kinsta:

WordPress Robots.txt Guide: What It Is and How to Use It (1)

Because this file is virtual, though, you can’t edit it. If you want to edit your robots.txt file, you’ll need to actually create a physical file on your server that you can manipulate as needed. Here are three simple ways to do that…

(Video) Wordpress Robots.txt Best Practices - How to Optimize WordPress Robots.txt In 2021

How to Create And Edit A Robots.txt File With Yoast SEO

If you’re using the popular Yoast SEO plugin, you can create (and later edit) your robots.txt file right from Yoast’s interface. Before you can access it, though, you need to enable Yoast SEO’s advanced features by going to SEO → Dashboard → Features and toggling on Advanced settings pages:

WordPress Robots.txt Guide: What It Is and How to Use It (2)

Once that’s activated, you can go to SEO → Tools and click on File editor:

WordPress Robots.txt Guide: What It Is and How to Use It (3)

Assuming you don’t already have a physical Robots.txt file, Yoast will give you an option to Create robots.txt file:

WordPress Robots.txt Guide: What It Is and How to Use It (4)

And once you click that button, you’ll be able to edit the contents of your Robots.txt file directly from the same interface:

WordPress Robots.txt Guide: What It Is and How to Use It (5)

As you read on, we’ll dig more into what types of directives to put in your WordPress robots.txt file.

How to Create And Edit A Robots.txt File With All in One SEO

If you’re using the almost-as-popular-as-Yoast All in One SEO Pack plugin, you can also create and edit your WordPress robots.txt file right from the plugin’s interface. All you need to do is go to All in One SEO → Tools:

WordPress Robots.txt Guide: What It Is and How to Use It (6)

Then, toggle the Enable Custom Robots.txt radio button so it’s on. This will enable you to create custom rules and add them to your robots.txt file:

WordPress Robots.txt Guide: What It Is and How to Use It (7)

How to Create And Edit A Robots.txt File via FTP

If you’re not using an SEO plugin that offers robots.txt functionality, you can still create and manage your robots.txt file via SFTP. First, use any text editor to create an empty file named “robots.txt”:

WordPress Robots.txt Guide: What It Is and How to Use It (8)

Then, connect to your site via SFTP and upload that file to the root folder of your site. You can make further modifications to your robots.txt file by editing it via SFTP or uploading new versions of the file.

What To Put In Your Robots.txt File

Ok, now you have a physical robots.txt file on your server that you can edit as needed. But what do you actually do with that file? Well, as you learned in the first section, robots.txt lets you control how robots interact with your site. You do that with two core commands:

  • User-agent – this lets you target specific bots. User agents are what bots use to identify themselves. With them, you could, for example, create a rule that applies to Bing, but not to Google.
  • Disallow – this lets you tell robots not to access certain areas of your site.

There’s also an Allow command that you’ll use in niche situations. By default, everything on your site is marked with Allow, so it’s not necessary to use the Allow command in 99% of situations. But it does come in handy where you want to Disallow access to a folder and its child folders but Allow access to one specific child folder.

You add rules by first specifying which User-agent the rule should apply to and then listing out what rules to apply using Disallow and Allow. There are also some other commands like Crawl-delay and Sitemap, but these are either:

  • Ignored by most major crawlers, or interpreted in vastly different ways (in the case of crawl delay)
  • Made redundant by tools like Google Search Console (for sitemaps)

Let’s go through some specific use cases to show you how this all comes together.

How To Use Robots.txt To Block Access To Your Entire Site

Let’s say you want to block all crawler access to your site. This is unlikely to occur on a live site, but it does come in handy for a development site. To do that, you would add this code to your WordPress robots.txt file:

(Video) 🤖 How To Create a Robots.txt File For SEO Using WordPress? - A Beginners Guide

User-agent: *Disallow: /

What’s going on in that code?

The *asterisk next to User-agent means “all user agents”. The asterisk is a wildcard, meaning it applies to every single user agent. The /slash next to Disallow says you want to disallow access to all pages that contain “yourdomain.com/” (which is every single page on your site).

How To Use Robots.txt To Block A Single Bot From Accessing Your Site

Let’s change things up. In this example, we’ll pretend that you don’t like the fact that Bing crawls your pages. You’re Team Google all the way and don’t even want Bing to look at your site. To block only Bing from crawling your site, you would replace the wildcard *asterisk with Bingbot:

Sign Up For the Newsletter

Want to know how we increased our traffic over 1000%?

Join 20,000+ others who get our weekly newsletter with insider WordPress tips!

Subscribe Now

User-agent: BingbotDisallow: /

Essentially, the above code says to only apply the Disallow rule to bots with the User-agent “Bingbot”. Now, you’re unlikely to want to block access to Bing – but this scenario does come in handy if there’s a specific bot that you don’t want to access your site.This site has a good listing of most service’s known User-agent names.

How To Use Robots.txt To Block Access To A Specific Folder Or File

For this example, let’s say that you only want to block access to a specific file or folder (and all of that folder’s subfolders). To make this apply to WordPress, let’s say you want to block:

  • The entire wp-admin folder
  • wp-login.php

You could use the following commands:

User-agent: *Disallow: /wp-admin/Disallow: /wp-login.php

How to Use Robots.txt To Allow Access To A Specific File In A Disallowed Folder

Ok, now let’s say that you want to block an entire folder, but you still want to allow access to a specific file inside that folder. This is where the Allow command comes in handy. And it’s actually very applicable to WordPress. In fact, the WordPress virtual robots.txt file illustrates this example perfectly:

User-agent: *Disallow: /wp-admin/Allow: /wp-admin/admin-ajax.php

This snippet blocks access to the entire /wp-admin/ folder except for the /wp-admin/admin-ajax.php file.

(Video) Complete Guide of Robots.txt file in SEO | Robots.txt tutorial

How To Use Robots.txt To Stop Bots From Crawling WordPress Search Results

One WordPress-specific tweak you might want to make is to stop search crawlers from crawling your search results pages. By default, WordPress uses the query parameter “?s=”. So to block access, all you need to do is add the following rule:

User-agent: *Disallow: /?s=Disallow: /search/

This can be an effective way to also stop soft 404 errors if you are getting them. Make sure to read our in-depth guide on how to speed up WordPress search.

How To Create Different Rules For Different Bots In Robots.txt

Up until now, all the examples have dealt with one rule at a time. But what if you want to apply different rules to different bots? You simply need to add each set of rules under the User-agent declaration for each bot. For example, if you want to make one rule that applies to all bots and another rule that applies to just Bingbot, you could do it like this:

User-agent: *Disallow: /wp-admin/User-agent: BingbotDisallow: /

In this example, all bots will be blocked from accessing /wp-admin/, but Bingbot will be blocked from accessing your entire site.

Struggling with downtime and WordPress problems? Kinsta is the hosting solution designed to save you time! Check out our features

Testing Your Robots.txt File

To ensure it’s setup correctly, you can test your WordPress robots.txt file using Google’s robots.txt Tester tool (formerly part of Google Search Console).

Simply navigate to the tool and scroll to the bottom of the page. Enter any URL into the field, including your homepage, then click the red TEST button:

WordPress Robots.txt Guide: What It Is and How to Use It (9)

You’ll see a green Allowed response if everything is crawlable.

You could also test each individual URL you have blocked to ensure they are, in fact, blocked and/or Disallowed.

Beware of the UTF-8 BOM

BOM stands for byte order mark and is basically an invisible character that is sometimes added to files by old text editors and the like. If this happens to your robots.txt file, Google might not read it correctly. This is why it is important to check your file for errors. For example, as seen below, our file had an invisible character and Google complains about the syntax not being understood. This essentially invalidates the first line of our robots.txt file altogether, which is not good! Glenn Gabe has an excellent article on how a UTF-8 Bom could kill your SEO.

WordPress Robots.txt Guide: What It Is and How to Use It (10)

Googlebot is Mostly US-Based

It’s also important not to block the Googlebot from the United States, even if you are targeting a local region outside of the United States. They sometimes do local crawling, but the Googlebot is mostly US-based.

Googlebot is mostly US-based, but we also sometimes do local crawling. https://t.co/9KnmN4yXpe

(Video) What is Robots.txt? How to insert Robots.txt in Wordpress through Yoast SEO Plugin.

— Google Search Central (@googlesearchc) November 13, 2017

What Popular WordPress Sites Put In Their Robots.txt File

To actually provide some context for the points listed above, here is how some of the most popular WordPress sites are using their robots.txt files.

TechCrunch

WordPress Robots.txt Guide: What It Is and How to Use It (11)

In addition to restricting access to a number of unique pages, TechCrunch notably disallows crawlers to:

  • /wp-admin/
  • /wp-login.php

They also set special restrictions on two bots:

  • Swiftbot
  • IRLbot

In case you’re interested, IRLbot is a crawler from a Texas A&M University research project. That’s odd!

The Obama Foundation

WordPress Robots.txt Guide: What It Is and How to Use It (12)

The Obama Foundation hasn’t made any special additions, opting exclusively to restrict access to /wp-admin/.

Angry Birds

WordPress Robots.txt Guide: What It Is and How to Use It (13)

Angry Birds has the same default setup as The Obama Foundation. Nothing special is added.

Drift

WordPress Robots.txt Guide: What It Is and How to Use It (14)

Finally, Drift opts to define its sitemaps in the Robots.txt file, but otherwise, leave the same default restrictions as The Obama Foundation and Angry Birds.

Use Robots.txt The Right Way

As we wrap up our robots.txt guide, we want to remind you one more time that using a Disallow command in your robots.txt file is not the same as using a noindex tag. Robots.txt blocks crawling, but not necessarily indexing. You can use it to add specific rules to shape how search engines and other bots interact with your site, but it will not explicitly control whether your content is indexed or not.

For most casual WordPress users, there’s not an urgent need to modify the default virtual robots.txt file. But if you’re having issues with a specific bot, or want to change how search engines interact with a certain plugin or theme that you’re using, you might want to add your own rules.

We hope you enjoyed this guide and be sure to leave a comment if you have any further questions aboutusing your WordPress robots.txt file.

Save time, costs and maximize site performance with:

  • Instant help from WordPress hosting experts, 24/7.
  • Cloudflare Enterprise integration.
  • Global audience reach with 34 data centers worldwide.
  • Optimization with our built-in Application Performance Monitoring.

All of that and much more, in one plan with no long-term contracts, assisted migrations, and a 30-day-money-back-guarantee. Check out our plans or talk to sales to find the plan that’s right for you.

(Video) How to Use the Robots.txt File with WordPress - WebDesy.com

FAQs

WordPress Robots.txt Guide: What It Is and How to Use It? ›

Robots. txt is a text file that website owners can create to tell search engine bots how to crawl and index pages on their site. You can have multiple lines of instructions to allow or disallow specific URLs and add multiple sitemaps.

How do I use robots txt in WordPress? ›

How to create a robots. txt file in Yoast SEO
  1. Log in to your WordPress website. When you're logged in, you will be in your 'Dashboard'.
  2. Click on 'Yoast SEO' in the admin menu.
  3. Click on 'Tools'.
  4. Click on 'File Editor'. ...
  5. Click the Create robots. ...
  6. View (or edit) the file generated by Yoast SEO.

What is robots txt and what is it used for? ›

A robots. txt file tells search engine crawlers which URLs the crawler can access on your site. This is used mainly to avoid overloading your site with requests; it is not a mechanism for keeping a web page out of Google. To keep a web page out of Google, block indexing with noindex or password-protect the page.

Where do I put robots txt file WordPress? ›

Robots. txt usually resides in your site's root folder. You will need to connect to your site using an FTP client or by using your cPanel's file manager to view it.

How do I read a robots txt file? ›

txt file should be viewed as a recommendation for search crawlers that defines the rules for website crawling. In order to access the content of any site's robots. txt file, all you have to do is type “/robots. txt” after the domain name in the browser.

What is robots txt WordPress? ›

Robots. txt is a text file which allows a website to provide instructions to web crawling bots. Search engines like Google use these web crawlers, sometimes called web robots, to archive and categorize websites. Mosts bots are configured to search for a robots.

How do I use robots txt in my website? ›

Basic guidelines for creating a robots. txt file
  1. Create a file named robots. txt.
  2. Add rules to the robots. txt file.
  3. Upload the robots. txt file to your site.
  4. Test the robots. txt file.

Is robots txt necessary for SEO? ›

No, a robots. txt file is not required for a website. If a bot comes to your website and it doesn't have one, it will just crawl your website and index pages as it normally would.

Is robots txt important for SEO? ›

It's important to update your Robots. txt file if you add pages, files, or directories to your site that you don't wish to be indexed by the search engines or accessed by web users. This will ensure the security of your website and the best possible results with your search engine optimization.

How create robots txt in SEO? ›

How to Set up a Robots. txt File
  1. Create a Robots. txt File. You must have access to the root of your domain. ...
  2. Set Your Robots. txt User-agent. ...
  3. Set Rules to Your Robots. txt File. ...
  4. Upload Your Robots.txt File. Websites do not automatically come with a robots. ...
  5. Verify Your Robots. txt File is Functioning Properly.
Jun 25, 2021

Where is the robots txt file on a website? ›

Crawlers will always look for your robots. txt file in the root of your website, so for example: https://www.contentkingapp.com/robots.txt . Navigate to your domain, and just add " /robots. txt ".

How do I add a sitemap to robots txt? ›

txt file which includes your sitemap location can be achieved in three steps.
  1. Step 1: Locate your sitemap URL. ...
  2. Step 2: Locate your robots.txt file. ...
  3. Step 3: Add sitemap location to robots.txt file.
Jun 25, 2019

What is robots txt scraping? ›

Robots. txt is a file used by websites to let 'search bots' know if or how the site should be crawled and indexed by the search engine. Many sites simply disallow crawling, meaning the site shouldn't be crawled by search engines or other crawler bots.

How do you test if robots txt is working? ›

Test your robots. txt file
  1. Open the tester tool for your site, and scroll through the robots. ...
  2. Type in the URL of a page on your site in the text box at the bottom of the page.
  3. Select the user-agent you want to simulate in the dropdown list to the right of the text box.
  4. Click the TEST button to test access.

How do I get rid of robots txt in WordPress? ›

To unblock search engines from indexing your website, do the following:
  1. Log in to WordPress.
  2. Go to Settings → Reading.
  3. Scroll down the page to where it says “Search Engine Visibility”
  4. Uncheck the box next to “Discourage search engines from indexing this site”
  5. Hit the “Save Changes” button below.
Jun 27, 2019

How do you test if robots txt is working? ›

Test your robots. txt file
  1. Open the tester tool for your site, and scroll through the robots. ...
  2. Type in the URL of a page on your site in the text box at the bottom of the page.
  3. Select the user-agent you want to simulate in the dropdown list to the right of the text box.
  4. Click the TEST button to test access.

How do I unblock robots txt in WordPress? ›

To unblock search engines from indexing your website, do the following:
  1. Log in to WordPress.
  2. Go to Settings → Reading.
  3. Scroll down the page to where it says “Search Engine Visibility”
  4. Uncheck the box next to “Discourage search engines from indexing this site”
  5. Hit the “Save Changes” button below.
Jun 27, 2019

Where is robots txt code for domain? ›

Crawlers will always look for your robots. txt file in the root of your website, so for example: https://www.contentkingapp.com/robots.txt . Navigate to your domain, and just add " /robots. txt ".

How can I edit robots txt in all in one SEO? ›

To get started, click on Tools in the All in One SEO menu. You should see the Robots. txt Editor and the first setting will be Enable Custom Robots.

Videos

1. How To Create Robots.txt File In WordPress
(FixRunner.com)
2. What is robots.txt file | how to create robots.txt file in wordpress | html site
(freelancerdebasish)
3. What is robots.txt and what is it used for?
(SISTRIX)
4. How to Optimize WordPress Robots.txt In 2022 | Robots.txt Tutorial For WordPress #SEO #Robots.txt
(StudioHawk)
5. How to create a Robots.txt File in WordPress and Add Sitemap Using Yoast SEO Plugin
(BEST SEO UG)
6. What is Robots.txt File | How to Create the Perfect Robots.txt File for SEO
(Redoan Kawsar)

Top Articles

Latest Posts

Article information

Author: Fr. Dewey Fisher

Last Updated: 01/01/2023

Views: 6127

Rating: 4.1 / 5 (62 voted)

Reviews: 85% of readers found this page helpful

Author information

Name: Fr. Dewey Fisher

Birthday: 1993-03-26

Address: 917 Hyun Views, Rogahnmouth, KY 91013-8827

Phone: +5938540192553

Job: Administration Developer

Hobby: Embroidery, Horseback riding, Juggling, Urban exploration, Skiing, Cycling, Handball

Introduction: My name is Fr. Dewey Fisher, I am a powerful, open, faithful, combative, spotless, faithful, fair person who loves writing and wants to share my knowledge and understanding with you.