x

Instant Quote & Review
Name:

Email:

Phone:

Website:
Subscribe to digital marketing and promotional information

x

Get In Touch
Name:

Email:

Phone:

Website:
Message:

Subscribe to digital marketing and promotional information

x

Get Your Free Ebook
Name:

Email:

Phone:

Website:
Message:

Subscribe to digital marketing and promotional information

Get in Touch




Subscribe to digital marketing and promotional information

Google Is Telling Webmasters To Remove Noindex From Robots.txt

Google Is Telling Webmasters To Remove Noindex From Robots

As of the 1st September 2019, Google will stop supporting any unsupported directives within robots.txt files relating to indexing. ‘Noindex’ directives, in particular, have been targeted, with the search giant recently announcing that all directives relating to this should be removed from robots.txt and implemented correctly.

This move comes after Google posted to their Webmaster Central Blog formalising the Robots Exclusion Protocol Specification, introducing the REP as an official internet standard for developers and webmasters across the web. They stated that “developers have interpreted the protocol somewhat differently over the years,” going on to describe how the REP hasn’t updated to cater for modern internet requirements either. For this reason, Google has worked with webmasters, search engines and the original author of the REP to update the rules and clarify a number of different points. This has since been submitted to the IETF (Internet Engineering Task Force) and appears to be awaiting review.

Boost your Performance with SEO Agency Essex

Utilising their experience with robots.txt, these proposed standards have clarified a few uncertain points, particularly relating to indexing. The rules are generally the same, but Google states that the new clarification “defines essentially all undefined scenarios for robots.txt parsing and matching, and extends it for the modern web.”

Following this, our Lead Developer stated that:

Absolute Digital Website Building and What we do

What Are The New Robots Exclusion Protocol Specification Points?

With the above in mind, Google has listed the following as the most notable points from the proposed new standard:

  1. Any URL-based transfer protocol, not just HTTP, can use robots.txt, including GTP or CoAP
  2. There will be new maximum file size, requiring developers to parse at least the first 500 kibibytes of a robots.txt file. This will alleviate strain on servers and ensures connections aren’t open for extended periods of time.
  3. Website owners will gain better flexibility to update robots.txt via a new maximum caching time of 24 hours or cache directive value. This ensures that crawlers aren’t overloading websites.
  4. In cases where the robots.txt file is no longer available, Google or any other crawl bot will not crawl previously known disallowed pages for a reasonable period of time.

Spider on Laptop representing Crawling

How Should I Stop Google From Crawling My Site?

Following Google’s notification to webmasters that they should stop using robots.txt to index pages, you’d be forgiven for wondering how you should disallow crawling instead. Thankfully, there are a number of ways you can do so correctly in advance of the September 1st deadline. Google stated that the usage of rules relating to crawl delay, nofollow and noindex were “contradicted by other rules in all but 0.001% of all robots.txt files on the internet”, before stating that these mistakes were actually harming the websites and their visibility.

Their suggested alternative options include:

  • Add noindex to your robots meta tags. This is supported in both HTML and HTTP response headers.
  • 404 and 410 HTTP status codes can be used to drop the URLs from the index once these codes have been processed.
  • Password protect account-related, paywall or subscription-related content using a log-in page. This will usually remove these URLs from Google’s index.
  • Complete a disallow in robots.txt. You can still use robots.txt to stop a URL from being indexed using a disallow rule instead.
  • Remove the URL using the Search Console This is quick, simple, and reduces the risk of webmaster error.

Help your website make a statement with SEO Agency Essex

It is uncertain as to whether Google will provide more direct clarification ahead of the IETP publishing the new standards, however, their given information thus far has provided webmasters with a starting point to improve the robots.txt files.

For help with your files or for organising your website with SEO and CRO in mind, get in touch with an expert member of our team, today.

To find out more about the digital marketing services available at Absolute Digital,
get in touch on 0800 088 6000 or fill in the contact form below.

Get in Touch





Subscribe me to the latest digital marketing updates and promotional information

Covid-19 Update: We are remaining fully operational in light of the COVID-19 outbreak. Get in touch if you have any enquiries. We’re here to support you.
Loading...
Google Rating
4.7
Based on 90 reviews