Before you can even set up a project, Sitebulb carries out a domain resolution test, to ensure that your start URL returns a 200 HTTP response and can be crawled.
If that is not the case, you'll see the following message alongside alternative start URLs provided by Sitebulb.
The start URL you entered does not return a 200 (OK) HTTP response. Select the most appropriate start URL from the options below. If none of these options look right, try changing the User Agent and/or the Language above.
The approach to solving this issue and successfully starting your crawl will depend on the HTTP status code returned by the server.
301, 308 Permanent Redirects
301 Moved Permanently may be the most common status you’ll see at this point. It indicates that the start URL you have specified has been moved.
You may see this error due to a number of reasons:
The start URL you have set has been permanently moved to a different location
How to fix it
Hover over the ‘?’ symbol for more details, or use the Single Page Analysis to find out where the redirects point to and use the correct final URL as your start URL.
If Sitebubul has identified the final crawlable (200) URL, you will also see it listed here, and you can simply click ‘Save and Continue’ to move on to the next step.
302, 307 Temporary Redirects
The 302 and 307 status indicate temporary redirects. These may be used in different circumstances:
Geo-targeting - the server may be using geolocation redirects to redirect the user based on their IP address
A/B testing - the website server may be set up to redirect batches of users in order to test different versions of the page
Security configurations
How to fix it
Change the language settings
If the site is geo-targeting, check the selected default language isn't causing the problem.
Whitelist Sitebulb
If the site is geo-targeting via an IP and redirects, or carrying out A/B testing, you will need to set up whitelisting and exclusion rules for your IP or user agent in order to exclude SItebulb from these redirect rules.
Use a VPN
If the site is geo-targeting via IP address and you are unable to get your IP address whitelisted, a workaround would be to connect to a VPN (e.g. ExpressVPN) and select the country you want to crawl from.
401 Unauthorized
The 401 Unauthorized HTTP response indicates that the website may require authentication in order to be accessed and crawled.
How to fix it
Add forms authentication
You can add form authentication details to your audit settings either at the point of setting up a new project or under Crawler Settings.
Add HTTP Authentication
Alternatively, you can choose to continue from the New Project access - Sitebulb will prompt you to add HTTP Authentication:
You are likely to encounter this error code when dealing with staging websites. You can find out more about authenticating Sitebulb for crawling staging sites here.
403 Forbidden
403 Forbidden is a standard HTTP status code. It indicates that the web server understands the request but refuses to authorize it due to permission issues. Client-side misconfigurations often cause this HTTP error.
What may cause this error:
Your outgoing IP Address is blocked by the website firewall (WAF) or server
Your User Agent is blocked by the website firewall (WAF) or server
You are using a Proxy and the server is returning the Forbidden error due to the wrong credentials, or configuration
The URL on the server has misconfigured file permission settings
How to fix it
Change the user agent
The user agent you are using may be blocked by the server. Changing the user agent is potentially the quickest solution, and therefore the first thing to try. Your website server may already have whitelisting in place for some UAs, so try different options from the list.
You can use the Single Page Analysis tool to efficiently check if the Strat URL is crawlable with different UAs.
Whitelist your IP or a custom User Agent to crawl Sitebulb
Most likely, you will need to put whitelisting in place in order to crawl successfully - follow the instructions in this document to do so.
404 Not Found
The URL was not found on the website and the server is returning a 401 or 404 HTTP Status code.
How to fix it
Check that the URL is correct in your browser.
Please contact support if the URL is definitely correct, and when you open it in your browser it does not return a 401 or 404 HTTP Status code.
5XX Server Errors
HTTP 5XX status codes are generic responses that catch unexplainable server or website errors. They are most likely unrelated to Sitebulb, your machine, or your internet connection.
5XX errors are difficult to troubleshoot because a range of issues can trigger them and they are often transient.
How to fix it
First, check that your URL is correct and try again.
If the issue persists, you will need to ask your developer or server admin to look at their logs to diagnose this issue.
5xx errors can also occur if you have been crawling the site regularly and the server has started rejecting the requests. Whitelisting or crawling slower may resolve this issue, although you would still have to wait until the server is no longer returning a 5XX.