Auditing Indexability & Crawlability with Sitebulb | Sitebulb Support

Ensuring that your key pages are indexable is key to their visibility in search results. Auditing the indexability and crawlability of your pages is likely the first step in your Technical SEO auditing workflow.

Sitebulb provides you with all the relevant data to efficiently audit indexability and crawlability on your website. Follow the article below for a step-by-step guide on finding key insights and opportunities for improvement with Sitebulb.

Finding key Indexability and Crawlability data

Analyzing Indexable, Noindex, and Nofollow pages

Navigate to the Indexability report to find key metrics about the indexability status of all your pages. From the report overview, you can jump directly into your lists of Indexable, Not Indexable, Nofollow, and Disallowed URLs.

Review these lists to ensure that your key pages are indexable, and that noindex and nofollow directives are used intentionally and appropriately.

Alternatively, you can find this data by navigating to the URL Explorer, where all URL data collected by Sitebulb can be found in bulk.

Within the URL Explorer, you can use the top menu to jump into pre-filtered URL Lists of the relevant indexability data by navigating to Indexability > Directives

Here you will find lists of your Indexable, Not Indexable, Noidex, Noindex Nofollow, Nofollow, and Disallowed pages. As you jump into each data point, you’ll see the data columns in the URL Lists change to provide you with the most relevant data. Remember that you can customize your URL lists by using the Adv.Filter and Add/Remove Columns options.

Auditing Canonicalized Pages

Canonicalization data is also part of your Indexability report. Scroll down to the Canonicals graph for a breakdown of how canonicalization is implemented across your site.

Check that key indexable pages are canonicalized to self, and that other canonical directives are correct. Review any pages missing canonical tags.

Alternatively, you can find this data by navigating to the URL Explorer, under Indexability > Canonicals.

Here, you will find all your internal pages filtered down in URL lists, based on their canonical status: self-referencing, to internal URL, to external URL, or missing.

You will also find lists here to help you identify any canonicalized pages that do not point to valid URLs, as in those canonicalized to malformed, noindex, not found, error, or disallowed URLs.

Auditing the Validity of Canonicals

To review the validity of your canonicals, navigate to the Indexability Hints list. Sitebulb will flag common issues here, like canonicals pointing to noindex URLs, broken URLs, and so on.

Relevant Indexability Hints include:

You can also find this data by navigating to the URL Explorer and using the top menu to find canonicalisation metrics under Indexability > Canonicals.

Auditing the Indexability status of Duplicate Pages

Sitebulb has a dedicated Duplicate Content Report to help you identify pages with duplicate and similar content.

To find and analyse the indexability status of your duplicate pages, navigate to the Duplicate Content report, and view the duplicate Content URLs list:

Once you have identified pages with duplicate content, analyse the list of URLs and their respective Indexability status to find your points of action.

Other canonical checks

It is important that any pages with similar content that you do not want indexed are canonicalised, such as Parametrised URLs and Duplicate URLs.

Within Sitebulb, you can identify these pages and analyze their canonicalization status.

Auditing the Indexability of Parametrized and Paginated URLs

Find parametrised URLs manually by filtering your Internal URLs list for relevant parameters using the Adv. Filters feature.

Now look to the Indexability column to establish whether these URLs are indexable or correctly canonicalized, or marked as noindex.

Alternatively, look for Sitebulb’s Internal Hints flagging up parametrised URLs to find the relevant pages:

Auditing less common Indexing and Serving Rules

Sitebulb also detects a set of other, more specific indexing and serving rules, which you will find tabulated further down the Indexability Overview, within the Indexability and serving rules data table:

Verify the validity of meta robots implementation

Next, you will want to check that the meta robots directives on your pages are correctly implemented, to ensure that they can be successfully picked up by search engines and LLM bots.

First, that means ensuring that meta robots directives are correctly contained in the <head> section of your pages. You do not need to check this manually, as Sitebulb will automatically flag this issue and affected pages with the following Indexability Hint:

Meta robots found outside of <head>

Sitebulb will also flag wider issues affecting the <head> sections of your pages, as a broken head will mean that your meta robots directives may not be picked up correctly. The relevant Indexability Hints include:

Finally, look out for hnts indicating that your meta robots directives may be mismatched, and therefore invalid:

Reviewing Robots.txt content

Your robots.txt file is essential in controlling what areas of your websites are accessible to search engines and other crawlers, like LLMs.

In your Indexability overview, you’ll the Robots.txt Configuration table, which allows you to verify that all major search engines are able to crawl the root directory.

By default, Sitebulb reports on the content of your robots.txt file as part of your Indexability report. Use it to check that relevant search engine user agents are not disallowed from crawling the site, and that essential content, scripts, and stylesheets are not disallowed.

Analyze pages and resources blocked by robots.txt

By default, Sitebulb will respect all robots directives, unless otherwise specified in your Robots Directives Audit Settings.

If you wish to analyse how robots' directives impact the crawlability of your pages, you can choose to save disallowed URLs to your Audit by ticking ‘Save disallowed URLs’ under the Politeness tab within the Robots Directives settings:

If you have enabled the ‘Save disallowed URLs’ setting, you will find your disallowed URLs listed as a key metric in your Indexability Report:

View the disallowed URLs list to analyse your pages, and look for the ‘Robots.txt Disallowed Directives’ data column to understand why they were disallowed:

Where resources are disallowed by robots.txt, Sitebulb will also flag these issues under the following Indexability Hints:

The issue will also be flagged up as part of your Indexability report, as an error message:

Review XML Sitemaps content

A complete, up-to-date XML Sitemap ensures that all your key pages can be found and crawled by search engines, LLMS, and other relevant crawlers, no matter the state of your internal linking.

To crawl and analyse the content of your XML Sitemaps with Sitebulb, you will first need to ensure that XML Sitemaps are selected as a Crawl Source in your Audit Settings. By default, Sitebulb will only crawl the pages found through internal links, so make sure you enable these crawl sources before starting your Audit.

The resulting audit will contain a dedicated XML Sitemaps report, allowing you to analyze the content of your XML Sitemaps.

You can cross-reference this information with your crawl data, by reviewing the ‘Sitemap URLs’, ‘Only in Sitemaps’ and ‘Not in Sitemaps’ metrics to ensure that key pages are contained in your XML Sitemaps.

To review whether your Sitemaps include any non-indexable, disallowed, canonicalized, or broken URLs, scroll to the XML Sitemaps table for a breakdown of the status of all URLs found in each Sitemap.

Review Meta Robots and On-Page elements modified by JS

Ensuring that meta robots directives and key on-page elements are part of the response HTML is essential to the efficient and correct indexing of your pages.

When auditing with the Chrome Crawler, Sitebulb’s Response vs. Render report identifies changes to your pages made by JavaScript, focusing on the key elements: Meta robots, Canonical, Title, Meta Description, Internal Links, and External Links.

The data is presented in 6 pie charts, which break down the distribution of pages where each of these elements is unchanged, created, modified, duplicated, or deleted by JavaScript:

Click on any of the segments to jump into a list of the corresponding URLs and analyse the differences between the response and rendered versions of the elements.

Ensure page URLs follow best practice

To ensure that your URLs follow best practice recommendations, review the Internal reports hints. Sitebulb will flag page URLs containing whitespaces, uppercase characters, ASCII characters, double slashes, repetitive elements, tracking, or ID parameters.

Look out for the following Internal Hints:

If any of your pages have triggered these hints, they will appear in your Internal Hints list. Click View URLs to view and analyse the relevant pages.

Next Steps

This Article works in conjunction with Sitebulb's Technical SEO Auditing template. Continue your technical SEO auditing journey by following the step-by-step articles below.

Auditing Internal Linking
Auditing On Page Elements
Auditing Performance & Mobile Friendly
Auditing Structured Data
Auditing Internationalization
Auditing Security Issues
Auditing Indexability & Crawlability Video Guidance

How to audit canonical tags

Auditing Internal Linking with Sitebulb

Auditing International Implementation with Sitebulb

Auditing Structured Data with Sitebulb

Setting up your Technical SEO Auditing crawl