Skip to main content

Ensuring Sitebulb can find all URLs

How to ensure that Sitebulb finds every page on your website while crawling.

Updated yesterday

By default, Sitebulb uses 'Crawl Website' as the main crawl source, which means that the crawler will find and follow internal links on your pages in order to discover new pages.

However, if your site has poor architecture or some pages that are not linked internally, Sitebulb may fail to find all URLs with this default setup.

If you're running into this issue, you'll need to add extra crawl sources to help Sitebulb find all your relevant pages. Follow the steps below for more information on how to do this.

Add XML Sitemaps as a Crawl source

Add your XML Sitemap(s) as an extra crawl source to ensure all key pages are crawled, especially if they are not linked internally.

Extract URLs from Google Analytics and Google Search Console

To do this, you will first need to enable and connect to Google Analytics and Google Search Console in your Audit Settings.

You can now enable the 'Extract and crawl new URLs' feature in each of these tabs, respectively, in order to include URLs found through these sources in your crawl.

This is the best way to find orphaned URLs that may not be linked internally or included in your XML sitemap.

Include Subdomains

By default, subdomain URLs are treated as external (although they will be reported as subdomains).

If relevant sections of your website are hosted on separate subdomains, you can opt to include these in your Internal reports by adjusting your subdomain settings to 'Audit and report'.

Add Seed URLs

The URLs in your Seed list function as additional Start URLs. For any URLs included in the Seed List, Sitebulb will also parse the HTML on these pages and extract links, in addition to the Start URL and any other pages crawled.

This is particularly useful if areas of your site are isolated (i.e., not linked internally).

These steps will help Sitebulb find and crawl all your internal URLs effectively.

Did this answer your question?