Before you can even set up a project, Sitebulb carries out a domain resolution test to ensure that your start URL returns a 200 HTTP response and can be crawled.
If that is not the case, you'll see the following message alongside alternative start URLs provided by Sitebulb.
"The start URL you entered does not return a 200 (OK) HTTP response. Select the most appropriate start URL from the options below. If none of these options look right, try changing the User Agent and/or the Language above."
The approach to solving this issue and successfully starting your crawl will depend on the HTTP status code returned by the server.
301, 308 Permanent Redirects
301 Moved Permanently may be the most common status you’ll see at this point. It indicates that the start URL you have specified has been moved.
You may see this error due to a number of reasons:
The start URL you have set has been permanently moved to a different location
How to fix it
Hover over the ‘?’ symbol for more details, or use the Single Page Analysis to find out where the redirects point to and use the correct final URL as your start URL.
If Sitebulb has identified the final crawlable (200) URL, you will also see it listed here, and you can click ‘Save and Continue’ to proceed to the next step.
302, 307 Temporary Redirects
The 302 and 307 status codes indicate temporary redirects. These may be used in different circumstances:
Geo-targeting - the server may be using geolocation redirects to redirect the user based on their IP address
A/B testing - the website server may be set up to redirect batches of users in order to test different versions of the page
Security configurations
How to fix it
Change the language settings
If the site is geo-targeting, check the selected default language isn't causing the problem.
Allowlist Sitebulb
If the site is geo-targeting via IP addresses and redirects, or conducting A/B testing, you will need to set up allowlisting and exclusion rules for your IP or user agent to exclude Sitebulb from these redirect rules.
Use a VPN
If the site is geo-targeting via IP address and you are unable to get your IP address allowlisted, a workaround is to connect to a VPN (e.g., ExpressVPN) and select the country you want to crawl from.
401 Unauthorized
The 401 Unauthorised HTTP response indicates that the website may require authentication in order to be accessed and crawled.
How to fix it
Add forms authentication
You can add forms authentication details to your audit settings either when setting up a new project or under the Authentication settings.
Add HTTP Authentication
Alternatively, you can choose to continue from the New Project access - Sitebulb will prompt you to add HTTP Authentication:
You are likely to encounter this error code when dealing with staging websites. You can find out more about authenticating Sitebulb for crawling staging sites here.
403 Forbidden
403 Forbidden is a standard HTTP status code. It indicates that the web server understands the request but refuses to authorise it due to permission issues. Client-side misconfigurations often cause this HTTP error.
What may cause this error:
Your outgoing IP Address is blocked by the website firewall (WAF) or server
Your User Agent is blocked by the website firewall (WAF) or server
You are using a Proxy, and the server is returning the Forbidden error due to the wrong credentials or configuration
The URL on the server has misconfigured file permission settings
How to fix it
Change the user agent
The user agent you are using may be blocked by the server. Changing the user agent is potentially the quickest solution, and therefore the first thing to try. Your website server may already have allowlisting in place for some UAs, so try different options from the list.
You can use the Single Page Analysis tool to efficiently check if the Strat URL is crawlable with different UAs.
Allowlist your IP or a custom User Agent to crawl Sitebulb
Most likely, you will need to put allowlisting in place in order to crawl successfully - follow the instructions in this document to do so.
404 Not Found
The URL was not found on the website, and the server is returning a 401 or 404 HTTP Status code.
How to fix it
Check that the URL is correct in your browser.
Please contact support if the URL is definitely correct and, when you open it in your browser, it does not return a 401 or 404 HTTP Status code.
5XX Server Errors
HTTP 5XX status codes are generic responses that catch unexplained server or website errors. They are most likely unrelated to Sitebulb, your machine, or your internet connection.
5XX errors are difficult to troubleshoot because a range of issues can trigger them and they are often transient.
How to fix it
First, check that your URL is correct and try again.
If the issue persists, you will need to ask your developer or server admin to review their logs to diagnose it.
5xx errors can also occur if you have been crawling the site regularly and the server has started rejecting the requests. Allowlisting or crawling more slowly may resolve this issue, though you would still need to wait until the server no longer returns a 5XX.

