Custom Robots.txt File
Custom Robots.txt File

Hi, welcome to the Custom Robots.txt For Blogger Tutorial. Who does not want to appear on top of google and other search engines? Of course you want to do everything possible to make sure crawlers don’t pass by your inventory. In this tutorial, we learn How To Create And Add Custom Robots.txt File In Blogger. Making good settings and create add Custom Robots.txt File in blogger will really help you in search rankings.

Before we jump in, I would like to talk about what is robot.txt file is and why you should add custom robots.txt file in your blogger blog.

What Is Custom Robots.txt File?

A robots.txt file lives at the root of your site. So, for site www.example.com, the robots.txt file lives at www.example.com/robots.txt. robots.txt is a plain text file that follows the Robots Exclusion Standard. A robots.txt file consists of one or more rules. Each rule blocks (or or allows) access for a given crawler to a specified file path in that website.

Here is a simple robots.txt file with two rules, explained below:

# Group 1
User-agent: Googlebot
Disallow: /nogooglebot/

# Group 2
User-agent: *
Allow: /

Sitemap: http://www.example.com/sitemap.xml

Explanation:

  1. The user agent named “Googlebot” crawler should not crawl the folder http://example.com/nogooglebot/ or any subdirectories.
  2. All other user agents can access the entire site. (This could have been omitted and the result would be the same, as full access is the assumption.)
  3. The site’s Sitemap file is located at http://www.example.com/sitemap.xml

Basic Robots.txt Guidelines

You can use almost any text editor to create a robots.txt file. The text editor should be able to create standard UTF-8 text files. Don’t use a word processor; word processors often save files in a proprietary format and can add unexpected characters, such as curly quotes, which can cause problems for crawlers.

Syntax

  • robots.txt must be an UTF-8 encoded text file (which includes ASCII). Using other character sets is not possible.
  • A robots.txt file consists of one or more group.
  • Each group consists of multiple rules or directives (instructions), one directive per line.
  • A group gives the following information:
  • Who the group applies to (the user agent)
  • Which directories or files that agent can access, and/or
  • Which directories or files that agent cannot access.
  • Groups are processed from top to bottom, and a user agent can match only one rule set, which is the first, most-specific rule that matches a given user agent.
  • The default assumption is that a user agent can crawl any page or directory not blocked by a Disallow: rule.
  • Rules are case-sensitive. For instance, Disallow: /file.asp applies to http://www.example.com/file.asp, but not http://www.example.com/FILE.asp.

The following directives are used in robots.txt files:

User-agent: [Required, one or more per group] The name of a search engine robot (web crawler software) that the rule applies to. This is the first line for any rule. Most Google user-agent names are listed in the Web Robots Database or in the Google list of user agents. Supports the * wildcard for a path prefix, suffix, or entire string. Using an asterisk (*) as in the example below will match all crawlers except the various AdsBot crawlers, which must be named explicitly. (See the list of Google crawler names.) Examples:

# Example 1: Block only Googlebot
User-agent: Googlebot
Disallow: /

# Example 2: Block Googlebot and Adsbot
User-agent: Googlebot
User-agent: AdsBot-Google
Disallow: /
 
# Example 3: Block all but AdsBot crawlers
User-agent: * 
Disallow: /

Disallow: [At least one or more Disallow or Allow entries per rule] A directory or page, relative to the root domain, that should not be crawled by the user agent. If a page, it should be the full page name as shown in the browser; if a directory, it should end in a / mark. Supports the * wildcard for a path prefix, suffix, or entire string.

Allow: [At least one or more Disallow or Allow entries per rule] A directory or page, relative to the root domain, that should be crawled by the user agent just mentioned. This is used to override Disallow to allow crawling of a subdirectory or page in a disallowed directory. If a page, it should be the full page name as shown in the browser; if a directory, it should end in a / mark. Supports the * wildcard for a path prefix, suffix, or entire string.

Sitemap: [Optional, zero or more per file] The location of a sitemap for this website. Must be a fully-qualified URL; Google doesn’t assume or check http/https/www.non-www alternates. Sitemaps are a good way to indicate which content Google should crawl, as opposed to which content it can or cannot crawl. Learn more about sitemaps. Example:

Sitemap: https://example.com/sitemap.xml
Sitemap: http://www.example.com/sitemap.xml

Useful robots.txt rules

Here are some common useful robots.txt rules:

RuleSample
Disallow crawling of the entire website. Keep in mind that in some situations URLs from the website may still be indexed, even if they haven’t been crawled. Note: this does not match the various AdsBot crawlers, which must be named explicitly.User-agent: * Disallow: /
Disallow crawling of a directory and its contents by following the directory name with a forward slash. Remember that you shouldn’t use robots.txt to block access to private content: use proper authentication instead. URLs disallowed by the robots.txt file might still be indexed without being crawled, and the robots.txt file can be viewed by anyone, potentially disclosing the location of your private content.User-agent: * Disallow: /calendar/ Disallow: /junk/
Allow access to a single crawlerUser-agent: Googlebot-news Allow: / User-agent: * Disallow: /
Allow access to all but a single crawlerUser-agent: Unnecessarybot Disallow: / User-agent: * Allow: /
Disallow crawling of a single webpage by listing the page after the slash:User-agent: * Disallow: /private_file.html
Block a specific image from Google Images:User-agent: Googlebot-Image Disallow: /images/dogs.jpg
Block all images on your site from Google Images:User-agent: Googlebot-Image Disallow: /
Disallow crawling of files of a specific file type (for example, .gif):User-agent: Googlebot Disallow: /*.gif$
Disallow crawling of entire site, but show AdSense ads on those pages, disallow all web crawlers other than Mediapartners-Google. This implementation hides your pages from search results, but the Mediapartners-Google web crawler can still analyze them to decide what ads to show visitors to your site.User-agent: * Disallow: / User-agent: Mediapartners-Google Allow: /
Match URLs that end with a specific string, use $. For instance, the sample code blocks any URLs that end with .xls:User-agent: Googlebot Disallow: /*.xls$

Custom Robots.txt File In Blogger

Each Site hosted in blogger has its own default custom robots.txt file that’s something look like that:

User-agent: Mediapartners-Google

Disallow:

User-agent: *

Disallow: /search

Disallow: /b

Allow: /

Sitemap: https://www.yourblogurll.blogspot.com/feeds/posts/default?orderby=updated 

Blogger Custom Robots.txt File Examples

Example 1

User-agent: Mediapartners-Google
Disallow: 

User-agent: *
Disallow: /search
Allow: /

Sitemap: https://www.yourblogurl.blogspot.com/sitemap.xml

Example 2

User-agent: Mediapartners-Google
Disallow: 

User-agent: *
Disallow: /search
Allow: /

Sitemap: https://www.yourblogurl.blogspot.com/feeds/posts/default?orderby=updated

Example 3

User-agent: Mediapartners-Google
Disallow: 

User-agent: *
Disallow: /search
Allow: /

Sitemap: https://www.yourblogurl.blogspot.com/atom.xml?redirect=false&start-index=1&max-results=500
Sitemap: https://www.yourblogurl.blogspot.com/atom.xml?redirect=false&start-index=500&max-results=1000

Custom Robots.txt File In Blogger Key Words

Let’s Understand Custom Robot.txt Totally:

This code above is divided into different segments. Let us first study each of them and later we will find out how to discover custom robots.txt files in blogger sites. 

User-agent: Mediapartners-Google
This code is for Google Adsense robots which help them to serve superior ads on your blog. Either you’re using Google Adsense on your blog or not simply abandon it as it really is.

User-agent:*
In default, our blog’s tags hyperlinks are limited to found by search crawlers that mean that the webmaster won’t index our tags page hyperlinks because of the below code.

Disallow: /search
That means the hyperlinks having keyword lookup only after the domain name will be ignored. See below example that’s a link of tag page called SEO. 

And if we eliminate Disallow: /search in the above code then crawlers may get our entire blog to index and crawl all its articles and website pages. Here Allow: / describes the Homepage which means internet crawlers can crawl and index our blog’s homepage.

How To Disallow Particular Post: Now suppose if we want to exclude a specific article from indexing then we could put in below lines from the code. Disallow: /yyyy/mm/post-url.html Here This yyyy and mm denotes the publishing month and year of the post respectively. For instance, if we’ve published a post in the year 2020 per month of March then we have to use the following format. Disallow: /2020/03/post-url.html To make this task easy, it is possible to merely copy the post URL and eliminate the domain from the beginning.

How To Disallow Particular Page: If we need to disallow a particular page then we can use exactly the same method as previously. Simply copy the page URL and remove the site address from it that will look like this: Disallow: /p/page-url. html 

Sitemap:http://yoururll.blogspot.com/feeds/posts/defaultorderby=UPDATED
This code denotes the site of the website. Should you put in the site of your site here then you’re able to maximize the speed of this blog. It means if the crawlers crawl your robots.txt file then they can find a route to crawl the site of your site also. Should you include your site sitemap in robots.txt document then net crawlers will easily crawl all of the pages and articles without missing one.

How To Custom Robots.txt File In Blogger

Step 1: Login to your blogger dashboard>>Settings. When you click on settings, scroll down the bar until you locate “Crawlers and indexing” right under Crawlers and indexing, you will see “Enable custom robots.txt“. Turn on the button beside it so that you can add the custom robots.txt file to your blog. See the infographic below for more information.

Custom Robots.txt For Blogger

Step 2: Just after the “Enable custom robots.txt” of which button you’ve turned on, you will see “custom robots.txt“, go ahead and click on it. See the infographic below for more information.

Custom Robots.txt For Blogger

Step 3: After you click on “custom robots.txt” a small box will pop up with a field. Copy the custom robots.txt file suitable for you below and paste inside the box. Don’t forget to change the https://yourblogurl.blogspot.com with your blog address or a custom domain. Save your file when you’re done!

Robots.txt Type 1

Use this custom robots.txt file if you sitemap is an xml sitemap of which corresponds to the one submitted in google search webmaster tools or google search console. In other words, the sitemap you submitted in google search console should be the same that is included in the custom robots.txt file.

User-agent: Mediapartners-Google
Disallow: 

User-agent: *
Disallow: /search
Allow: /

Sitemap: https://www.yourblogurl.blogspot.com/sitemap.xml

Robots.txt Type 2

Use this custom robots.txt file if your sitemap is an atom sitemap of which corresponds to the one submitted in google search webmaster tools or google search console. In other words, the sitemap you submitted in google search console should be the same that is included in the custom robots.txt file. And also use this if your post is not more than 500 posts.

User-agent: Mediapartners-Google
Disallow: 

User-agent: *
Disallow: /search
Allow: /

Sitemap: https://www.yourblogurl.blogspot.com/atom.xml?redirect=false&start-index=1&max-results=500

Robots.txt Type 3

Use this custom robots.txt file if your sitemap is an atom sitemap of which corresponds to the one submitted in google search webmaster tools or google search console. In other words, the sitemap you submitted in google search console should be the same that is included in the custom robots.txt file. And also use this if your post is more than 500 posts heading to thousand. If your post are more than 1000, you can use the same file by adding another sitemap line to it to extend beyond 1000 posts.

User-agent: Mediapartners-Google
Disallow: 

User-agent: *
Disallow: /search
Allow: /

Sitemap: https://www.yourblogurl.blogspot.com/atom.xml?redirect=false&start-index=1&max-results=500
Sitemap: https://www.yourblogurl.blogspot.com/atom.xml?redirect=false&start-index=500&max-results=1000
Custom Robots.txt For Blogger

Don’t forget to change the https://yourblogurl.blogspot.com with your blog address or a custom domain. Save your file when you’re done!

How to Check Custom Robots.txt File For Your Blog

You can check Custom Robots.txt File on your blog by adding /robots.txt at the end of your blog URL in the web browser. For example:

http://www.yourblogurl.blogspot.com/robots.txt

Once you visit the robots.txt file URL you will see the entire code which you are using in your custom robots.txt file. See the infographic below:

Custom Robots.txt For Blogger

I tried to cover as much as I could about how to add custom robots.txt file in blogger or BlogSpot blog. By adding Custom Robots.txt File In Blogger it helps to increase your website’s organic traffic. And If you enjoyed this blog article, share it with your friends who wish to add custom robots.txt file in blogger and want to improve site traffic. If you have any doubts regarding this, feel free to ask in the below comment section!

LEAVE A REPLY

Please enter your comment!
Please enter your name here