Skip to main content

Problems starting an audit: Pre-audit and domain resolution failures

Before you can even set up a project, Sitebulb carries out a domain resolution test to ensure that your start URL returns a 200 HTTP response and can be crawled.

If that is not the case, you'll see the following message alongside alternative start URLs provided by Sitebulb.

"The start URL you entered does not return a 200 (OK) HTTP response. Select the most appropriate start URL from the options below. If none of these options look right, try changing the User Agent and/or the Language above."

The approach to solving this issue and successfully starting your crawl will depend on the HTTP status code returned by the server.

301, 308 Permanent Redirects

301 Moved Permanently may be the most common status you’ll see at this point. It indicates that the start URL you have specified has been moved.

You may see this error due to a number of reasons:

The start URL you have set has been permanently moved to a different location

  • The site redirects to the preferred version of the page (HTTP to HTTPS, non-www. to www., or trailing slash preferences)

How to fix it

  • Hover over the ‘?’ symbol for more details, or use the Single Page Analysis to find out where the redirects point to and use the correct final URL as your start URL.

  • If Sitebulb has identified the final crawlable (200) URL, you will also see it listed here, and you can click ‘Save and Continue’ to proceed to the next step.

302, 307 Temporary Redirects

The 302 and 307 status codes indicate temporary redirects. These may be used in different circumstances:

  • Geo-targeting - the server may be using geolocation redirects to redirect the user based on their IP address

  • A/B testing - the website server may be set up to redirect batches of users in order to test different versions of the page

  • Security configurations

How to fix it

Change the language settings

If the site is geo-targeting, check the selected default language isn't causing the problem.

Allowlist Sitebulb

If the site is geo-targeting via IP addresses and redirects, or conducting A/B testing, you will need to set up allowlisting and exclusion rules for your IP or user agent to exclude Sitebulb from these redirect rules.

Use a VPN

If the site is geo-targeting via IP address and you are unable to get your IP address allowlisted, a workaround is to connect to a VPN (e.g., ExpressVPN) and select the country you want to crawl from.

401 Unauthorized

The 401 Unauthorised HTTP response indicates that the website may require authentication in order to be accessed and crawled.

How to fix it

Add forms authentication

You can add forms authentication details to your audit settings either when setting up a new project or under the Authentication settings.

Add HTTP Authentication

Alternatively, you can choose to continue from the New Project access - Sitebulb will prompt you to add HTTP Authentication:

You are likely to encounter this error code when dealing with staging websites. You can find out more about authenticating Sitebulb for crawling staging sites here.

403 Forbidden

403 Forbidden is a standard HTTP status code. It indicates that the web server understands the request but refuses to authorise it due to permission issues. Client-side misconfigurations often cause this HTTP error.

What may cause this error:

  • Your outgoing IP Address is blocked by the website firewall (WAF) or server

  • Your User Agent is blocked by the website firewall (WAF) or server

  • You are using a Proxy, and the server is returning the Forbidden error due to the wrong credentials or configuration

  • The URL on the server has misconfigured file permission settings

How to fix it

Change the user agent

The user agent you are using may be blocked by the server. Changing the user agent is potentially the quickest solution, and therefore the first thing to try. Your website server may already have allowlisting in place for some UAs, so try different options from the list.

You can use the Single Page Analysis tool to efficiently check if the Strat URL is crawlable with different UAs.

Allowlist your IP or a custom User Agent to crawl Sitebulb

Most likely, you will need to put allowlisting in place in order to crawl successfully - follow the instructions in this document to do so.

404 Not Found

The URL was not found on the website, and the server is returning a 401 or 404 HTTP Status code.

How to fix it

Check that the URL is correct in your browser.

Please contact support if the URL is definitely correct and, when you open it in your browser, it does not return a 401 or 404 HTTP Status code.

5XX Server Errors

HTTP 5XX status codes are generic responses that catch unexplained server or website errors. They are most likely unrelated to Sitebulb, your machine, or your internet connection.

5XX errors are difficult to troubleshoot because a range of issues can trigger them and they are often transient.

How to fix it

First, check that your URL is correct and try again.

If the issue persists, you will need to ask your developer or server admin to review their logs to diagnose it.

5xx errors can also occur if you have been crawling the site regularly and the server has started rejecting the requests. Allowlisting or crawling more slowly may resolve this issue, though you would still need to wait until the server no longer returns a 5XX.

Did this answer your question?