What is Crawl Budget And Why Does It Matter for SEO?
Every day, Google crawls over 80 billion pages across the web so, as you can imagine, resource can be pretty stretched. For this reason, every site is generally allocated a specific ‘budget’, although this can change day to day depending on what needs to be crawled, the demand for indexing and plenty of other factors. Everything from the size of your website, to the health, popularity and links to your site can all have an effect on how much ‘budget’ your site is given.
Despite the fact that this can determine how often Google views your website, it’s something that is often overlooked by webmasters across the online world. We’ve spoken to our SEO and development experts to get the lowdown on what crawl budget is, why it matters for SEO and how you can optimise it for a better presence online.
What Is Crawl Budget?
Crawl budget isn’t actually an official Google-produced phrase. In fact, it was initially invented by the SEO industry as an umbrella term to cover a number of different systems that the search engine uses when it comes to crawling through your website. In simple terms, however, your website’s crawl budget indicates how many pages GoogleBot will crawl and index within a specific timeframe. In numeric terms, this means that if Google were to crawl your website 1000 times within a single month, then your crawl budget for that month would be 1000.
As with any budget, you want to ensure you’re making the most of what Google are effectively giving you. You’ll need to make sure the most important pages are being crawled and indexed properly and that you have the ‘budget’ available for Google to crawl any new information or fresh content that you post throughout the given time period.
While your crawl budget isn’t considered an official ranking factor, the way you use the crawl budget can have an effect on the success of your SEO efforts. Google favours high-quality, informative and relevant content, with a particular liking for fresh results when they’re needed and for this reason, optimising your crawl budget is a must if you want to ensure the search engine finds and indexes the pages you’re updating and uploading. Without this, you could miss out on crucial opportunities.
What Do Google Have To Say?
Google have posted their own advice relating to ‘crawl budget’, though do warn that they “don’t have a single term that would describe everything that “crawl budget” stands for externally”. Despite not directly recognising or adopting the term for themselves, Google’s Webmaster Blog posted an update on January 16th 2017 titled “What Crawl Budget Means for Googlebot”. Defining crawl budget as “the number of URLs Googlebot can and wants to crawl”, the blog gives insight into how Google views, crawls and indexes pages.
Within the blog, they state that in most cases, webmasters won’t have to worry about their crawl budget. Small sites with less than 1000 pages aren’t likely to be affected heavily by limits, however bigger websites such as ecommerce businesses or information directories may need to optimise their websites to prioritise crawling. This can also be beneficial for websites that automatically generate pages (e.g. from URL parameters).
The blog also breaks down crawl budget into two main parts:
- Crawl Rate Limit
Crawl Rate Limit is Google’s way of limiting the fetching rate that any one website has. It takes into account how many parallel connections the bots can use when crawling and how long they have to wait between each fetch action. This rate is ultimately affected by crawl health, which includes how quickly a page can be crawled, and by limits set via Google Search Console by the website owner.
- Crawl Demand
Crawl Demand typically refers to the demand for Google’s bots to crawl a website. If the limit given to a website isn’t reached and there is no further demand for indexing, then the activity as a whole will be low. Popular URLS are usually crawled more often due to high crawl demand, however URLs that could be considered nearly stale may also be re-crawled to freshen them. Things like site-wide events can increase crawl demand, particularly for things like site moves or new pages.
How Can I Optimise My Crawl Budget?
If you do have a large website, post content regularly or simply want to make the most of the crawl budget you have, optimisation is the way to do it. Depending on the current state of your site, this could be a case of a few quick fixes, or a longer, more difficult process. Generally, however, Google lists the following factors as most likely to be affecting your crawl budget:
- Faceted navigation and session identifiers
- On-site duplicate content
- Soft error pages
- Hacked pages
- Infinite spaces and proxies
- Low quality and spam content
If you’re looking to optimise your crawl budget, there are a number of things you can do not only for a better crawl rate, but to improve the overall health of your website.
Regular website maintenance should be a standard part of your everyday management, but can also play a huge part in ensuring proper crawling and indexing of your website. Generally speaking, all of your pages should return one of two codes – 200 (OK) or 301 (redirect). Any other codes, particularly 4XX and 5XX codes, need to be dealt with to properly allocate your crawl budget. Google Search Console can be beneficial for finding these errors, though in some cases, you may need to work out the source the error yourself.
Block Parts Of The Site
If you have parts of your website that don’t need to be indexed on Google, you need to implement the correct blocks to prevent crawl budget being used. This can usually be done with the right robots.txt files. Ecommerce sites that offer filters or information pages with category pages could benefit from these blocks, but be careful with what you do and don’t block – you don’t want to block any important functions or pages.
Reduce redirect chains
When Googlebot crawls your website, it puts aside any redirect URLS to be crawled and may not do this immediately. If your redirect chain is longer than one URL, you’ll be wasting your crawl budget on ‘dead’ URLs that just redirect onto another. Make sure you’re shortening down redirect chains to as little as possible, or removing them completely where possible to ensure that you’re making the most of your crawl budget.
A strong backlink profile is valuable not only for your crawl budget, but for helping your site to rank more highly in general. The trick, however, is to do this naturally. The more links to your website, the more authoritative and trustworthy Google views your website as being. For crawl budget, links to your website can be another way to encourage Google to crawl more often, particularly if it deems your content as being popular.
Improve Site Speed
The longer Googlebot takes to crawl a page, the lower your crawl budget can be. Scanning a page takes time and the slower it loads, the more time it takes – even Google themselves have stated that “making a site faster improves the users’ experience while also increasing crawl rate.” The faster the pages load, the more pages Google can scan in the time it allocates to your website each day. Improve your site speed to improve your crawl budget.
While backlinks are undoubtedly valuable when they’re natural, internal linking is just as valuable when it comes to showing Google what you consider to be your most important pages. If your homepage links to a particular service page, for example, Googlebot is likely to crawl that service page as a priority over older blogs that may not be linked to from any of your core pages. By carefully structuring your internal links, you can effectively guide the bots through your website to the pages you want indexed.
Avoid Or Link Any Orphan Pages
If you have any pages within your website that are considered ‘orphan’, it’s time to link them somehow. Whether it’s through a sitemap, your homepage, the footer or header, orphan pages that don’t have any internal or external links pointing to them need to be linked, or deleted altogether. Google’s bots often find it difficult to find these orphan pages, and so to make the most out of the budget and prevent the crawlers from taking too much time trying to find it, a simple link can optimise your budget considerably.
Manage Any Duplicate Content
Every time Googlebot crawls a duplicate page, it’s wasting one ‘unit’ of your crawl budget. Google themselves even list duplicate content as a major issue when it comes to poor management of crawl budget, and rightly so. Each time Google crawls or indexes a duplicate page, it’s wasting it’s own resources, as well as yours. Take the time to make sure that each page on your website is unique and that any and all duplicate content or pages are deleted or reworked.
How Can I Keep Track Of My Crawl Budget?
While you can’t keep a direct eye on your crawl budget, so to speak, you can see use your Crawl Stats report on Google Search Console to see how many pages Googlebot is crawling each day. From here, you can then work out the average crawl budget by taking the average number of pages crawled per day and multiplying that number by 30 for the monthly volume. There are cases where Google won’t be maxing out your crawl budget so these numbers will be less than the allocated budget, but this will only occur on websites that don’t have high crawl demand.
The Crawl Stats report on Search Console is invaluable, however, for keeping track of what has been crawled and whether changes need to be made. First, however, you need to understand the data being given to you. If your chart is relatively even with an upward trend as your site increases in size, this is considered a good result. Sudden dips or spikes, however, are cause for attention.
A sudden crawl rate drop, according to Google Support, could be a result of the following:
- Broken or unsupported HTML or content that Google cannot analyse or view
- Slow response to requests that cause Googlebot to throttle back it’s requests to prevent overload
- An increase in server error rates can make Googlebot throttle back to avoid overloading
- You have reduced your maximum crawl rate on Search Console
- Your website is considered low quality or doesn’t see frequent changes.
Similarly, a sudden spike in the crawl rate of your site could mean that the website is being overwhelmed. You should do the following:
- Check that Google is accessing your site, and not another requester.
- Return 503 HTTP result codes to Googlebots requests to urgently block it in extreme cases.
- Amend robots.txt files to block pages
- Set a lower maximum crawl rate in Search Console while you find the source of the issue.
- Block ‘infinite’ pages or features with robots.txt to prevent Googlebot from trying to crawl it all.
- Make sure the correct error response codes are implemented for redirects or deleted pages.
Once you’ve gotten an understanding of the Crawl Rate report, you can start to track when things may not be going quite right. It’s also beneficial to see how Google is responding if you update your website, move it to a new URL or add new content. If you want to check this data against the source, you can head to your server logs to see how often crawlers are visiting your site in order to compare the two data forms.
Crawl budget, while not considered an official Google term, is still important to take into account when it comes to optimising your website. From fixing any errors to prevent wasted resources, to improving your linking structure to ensure Google is crawling the right pages, optimising the budget you do have can be relatively simple – you just need to work out what is going wrong.
For help with exactly that, feel free to get in touch with a member of our team for more information, or to get a full site audit from our experts. Contact us on 0800 088 6000 today.