Skip to main content

Auditing Indexability & Crawlability with Sitebulb

Follow the steps in this document to audit your Indexability and Crawlability with Sitebulb

Updated over 2 weeks ago

Ensuring that your key pages are indexable is key to their visibility in search results. Auditing the indexability and crawlability of your pages is likely the first step in your Technical SEO auditing workflow.

Sitebulb provides you with all the relevant data to efficiently audit indexability and crawlability on your website. Follow the article below for a step-by-step guide on finding key insights and opportunities for improvement with Sitebulb.

You will find key Indexability and Crawlability data points in the following Audit areas:

Review Indexable, Noindex, and Nofollow pages

Navigate to the Indexability report to find key metrics about the indexability status of all your pages. From the report overview, you can jump directly into your lists of Indexable, Not Indexable, Nofollow, and Disallowed URLs.

Review these lists to ensure that your key pages are indexable and noindex and nofollow directives are used intentionally and appropriately.

Alternatively, you can find this data by navigating to the URL Explorer, where all URL data collected by Sitebulb can be found in bulk.

Within the URL Explorer, you can use the top menu to jump into pre-filtered URL Lists of the relevant indexability data by navigating to Indexability > Directives

Here you will find lists of your Indexable, Not Indexable, Noidex, Noindex Nofollow, Nofollow, and Disallowed pages.

Canonicalisation

Canonicalisation data is also part of your Indexability report. Scroll down to the Canonicals graph for a breakdown of how canonicalization is implemented across your site.

Check that key indexable pages are canonicalized to self, and that other canonical directives are correct. Review any pages missing canonical tags.

Alternatively, you can find this data by navigating to the URL Explorer, under Indexability > Canonicals.

Here, you will find all your internal pages filtered down in URL lists, based on their canonical status: self-referencing, to internal URL, to external URL, or missing.

You will also find lists here to help you identify any canonicalized pages that do not point to valid URLs, as in those canonicalized to malformed, noindex, not found, error, or disallowed URLs.

Other canonical checks

It is important that any pages with similar content that you do not want indexed are canonicalised, such as Parametrised URLs and Duplicate URLs.

Within Sitebulb, you can identify these pages and analyze their canonicalization status.

Parametrised URLs

Find parametrised URLs manually by filtering your Internal URLs list for relevant parameters using the Adv. Filters feature.

Now look to the Indexability column to establish whether these URLs are indexable or correctly canonicalised.

Alternatively, look for Sitebulb’s Internal Hints flagging up parametrised URLs:

Duplicate URLs

Sitebulb has a dedicated Duplicate Content Report to help you identify pages with duplicate and similar content.

To find and analyse the indexability status of your duplicate pages, navigate to the Duplicate Content report, and view the duplicate Content URLs list:

Once you have identified pages with duplicate content, analyse the list of URLs and their respective Indexability status to find your points of action.

Other Indexing and Serving Rules

Sitebulb also detects a set of other, more specific indexing and serving rules, which you will find tabulated further down the Indexability Overview, within the Indexability and serving rules data table:

Correct implementation of Meta Robots Tags

Next, you will want to check that the meta robots directives on your pages are correctly implemented, to ensure that they can be successfully picked up by search engines and LLM bots.

First, that means ensuring that meta robots directives are correctly contained in the <head> section of your pages. You do not need to check this manually, as Sitebulb will automatically flag this issue and affected pages with the following Indexability Hint:

Sitebulb will also flag wider issues affecting the <head> sections of your pages, as a broken head will mean that your meta robots directives may not be picked up correctly. The relevant indexability Hints include:

Robots.txt rules

Your robots.txt file is essential in controlling what areas of your websites are accessible to search engines and other crawlers, like LLMs.

By default, Sitebulb reports on the content of your robots.txt file as part of your Indexability report. Use it to check that relevant search engine user agents are not disallowed from crawling the site, and that essential content, scripts, and stylesheets are accessible and not disallowed.

Where resources are disallowed by robots.txt, Sitebulb will flag these issues under the following Hints:

Disallowed Pages

By default, Sitebulb will respect all robots directives, unless otherwise specified in your Robots Directives Audit Settings.

If you wish to analyse how robots' directives impact the crawlability of your pages, you can choose to save disallowed URLs to your Audit by ticking the ‘Save disallowed URLs and make them available in reports’ setting under the Politeness tab within the Robots Directives setting:

If you have enabled the ‘Save disallowed URLs’ setting, you will find your disallowed URLs listed as a key metric in your Indexability Report:

View the disallowed URLs list to analyse your pages, and look for the Disallowed directives data column to understand why they were disallowed:

[]

XML Sitemap

A complete, up-to-date XML Sitemap ensures that all your key pages can be found and crawled by search engines, LLMS, and other relevant crawlers, no matter the state of your internal linking.

To crawl and analyse the content of your XML Sitemaps with Sitebulb, you will first need to ensure that XML Sitemaps are selected as a Crawl Source in your Audit Settings. By default, Sietbulb will only crawl the pages found through internal links, so make sure you enable these crawl sources before starting your Audit.

The resulting audit will contain a dedicated XML Sitemaps report, allowing you to analyze the content of your XML Sitemaps. You can cross-reference this information with your crawl data to ensure that key pages are contained in your XML Sitemaps and review included noindexed, disallowed, canonicalized, or broken URLs.

Next Steps

This Article works in conjunction with Sitebulb's Technical SEO Auditing template. Continue your technical SEO auditing journey by following the step-by-step articles below.

  • Auditing Internal Linking

  • Auditing On Page Elements

  • Auditing Performance & Mobile Friendly

  • Auditing Structured Data

  • Auditing Internationalization

  • Auditing Security Issues

Auditing Indexability & Crawlability - Video Guidance

Did this answer your question?