Skip to main content
How to crawl a URL List

Set Sitebulb to crawl in list mode to analyse a list of URLs

Updated over a month ago

By default, Sitebulb is set to 'crawl' a website from the start URL you supply - which means finding all the links on the page and following these in turn. However, in some circumstances, you may wish to only crawl a specific set of URLs on a website without following the links.

In such cases, you would need to set a URL List as your crawl source.

Start a new Audit

To crawl a URL list, start a new audit in your desired Project or set up a new Project and set your start URL to match the domain of the URLs in your list.

Keep in mind that Sitebulb will only crawl URLs that match the subdomain of the start URL provided (so you can't just upload a massive list of URLs from lots of different sites).

Navigate to Crawl Sources

Once you get to the Audit Settings screen, navigate to 'Crawl Sources'.

'Crawl Sources' is the only audit setting in Sitebulb that is not optional. Sitebulb needs at least one crawl source, otherwise it cannot crawl!

The default setting is for Sitebulb to crawl the website, so 'Crawl Website' will always be selected by default. However, it can be configured to also crawl XML Sitemap URLs, and/or a provided URL List.

Crawl Sources

Choose 'URL List' as your crawl source

To get Sitebulb to 'crawl' based on a list, check the 'URL List' option.

URL List selected as a clarwl source

Typically, URL Lists are used when you DON'T also crawl the website to crawl a specific area or section of the site. If you wish to do this, make sure to uncheck the 'Crawl Website' option at this point, to ensure only your selected URL list is used as a source for your audit.

Add a URL list file

To add a URL List, simply upload a .csv or .txt file from your local computer.

URL List Crawl Source

When using a URL List as your crawl source, Sitebulb isn't strictly crawling, as links from the pages will not be followed, but the data will be collected and analyzed for all URLs contained in the list.

Did this answer your question?