If your or your client’s website has anti-crawling/scraping technology in place, you may need to get Sitebulb allowlisted in order to crawl and audit their website. You can choose to allowlist Sitebulb in several ways, depending on your needs.
Allowlist by IP address
You can allowlist the IP address that Sitebulb uses to send requests to the server during crawling.
Sitebulb Desktop
When using Sitebulb Desktop, Sitebulb sends requests from your local machine's IP address, so you will need to allowlist it.
Bear in mind that, depending on your country of residence, domestic IP addresses can change frequently, so this may not be the most effective solution if you work on a domestic network.
Sitebulb Cloud
The IP address of your Sitebulb Cloud server is constant over time, so allowlisting it will be a good long-term solution, which you can share with your clients.
Finding your IP address
Whether you are on Sitebulb Cloud or Desktop, you can easily find your IP address within the Sitebulb software.
Find your IP address by navigating to Settings > Logging.
On Sitebulb Desktop, it will look like this:
On Sitebulb Cloud it will look like this:
Allowlisting by User Agent
You can also guarantee that Sitebulb can crawl your sites by allowlisting the User Agent set for your crawl.
By default, Sitebulb audits will use the Sitebulb Mobile or Sitebulb Desktop User Agent to make requests to your server, both of which have the same User Agent value. For more about the Sitebulb User Agent, read our dedicated documentation.
Finding the User Agent Value
Whichever User Agent you choose to crawl with, you will need to provide your client or dev team with the User Agent value to Allowlist on the server.
To find the user agent string, navigate to the Robots Directives tab in your Audit Settings:
Please note that if you choose to allowlist the Sitebulb User Agent, you will need to regularly update your allowlisting rules as the string is updated with the latest version of Chrome during new releases.
As such, we recommend setting and allowlisting a custom user agent if you choose this method of allowlisting.
Allowlisting a Custom User Agent
To set up and allowlist a Custom User Agent you will need to:
Ask your client or dev team to allowlist a Custom User Agent
Ask your dev team to provide you with the relevant Custom User Agent string: for example: 'SEO Agency [random character string]' - the random character string serves the function of making the user agent value unique and virtually impossible to guess.
Set up your crawl and select 'Custom User Agent' in your Robot Directives settings, then add the UA value in the ‘Custom Browser User Agent’ box.
Custom Robots.txt UA
By default, Sitebulb’s Robots Politeness rules are set to respect robots directives. It is likely your Custom User Agent is not listed in your robots.txt file. As such, if you want Sitebulb to treat your robots.txt file as a specific agent, such as Google Smartphone, you can set this here.
Providing custom headers
Finally, you can also authorise Sitebulb access to your website using Custom Headers. These will be provided by your client or dev team, and you can set your custom header name and value in the corresponding field under Authentication settings.
