If your or your client’s website has anti-crawling/scraping technology in place, you may need to get Sitebulb whitelisted in order to crawl and audit your websites. You can choose to whitelist Sitebulb in several ways, depending on your needs.
Whitelist by IP address
You can whitelist the IP address from which Sitebulb is sending requests to the server while crawling.
Sitebulb Desktop
When using Sitebulb Desktop, Sitebulb will be sending requests from the IP address of your local machine, so you will need to whitelist this IP address.
Bear in mind that depending on your country of residence, domestic IP addresses can change regularly, so this may not be the most effective solution if you work from a domestic network.
Sitebulb Cloud
The IP address of your Sitebulb Cloud server is constant over time, so whitelisting it will be a good long-term solution, which you can share with your clients.
Finding your IP address
Whether you are on Sitebulb Cloud or Desktop, you can easily find your IP address within the Sitebulb software.
Find your IP address by navigating to Settings > Logging.
On Sitebulb Desktop it will look like this:
On Sitebulb Cloud it will look like this:
Whitelisting by User Agent
You can also guarantee that Sitebulb can crawl your sites by whitelisting the User Agent set for your crawl.
By default, Sitebulb audits will use the Sitebulb Mobile or Sitebulb Desktop User Agent to make requests to your server, both of which have the same User Agent value. For more about the Sitebulb User Agent, read our dedicated documentation.
Finding the Use Agent Value
Whichever User Agent you choose to crawl with, you will need to provide your client or dev team with the User Agent value to whitelist on the server.
To find the user agent string, navigate to the Robots Directives tab in your Audit Settings:
Please note that if you choose to whitelist the Sitebulb User Agent, you will need to regularly update your whitelisting rules as the string is updated with the latest version of Chrome during new releases.
As such, we recommend setting and whitelisting a custom user agent if you choose this method of whitelisting.
Whitelisting a Custom User Agent
To set up and whitelist a Custom User Agent you will need to:
Ask your client or dev team to whitelist a Custom User Agent
Ask your dev team to provide you with the relevant Custom User Agent string: for example: 'SEO Agency [random character string]' - the random character string serves the function of making the user agent value unique and virtually impossible to guess.
Set up your crawl and select 'Custom User Agent' in your Robot Directives settings, then add the UA value in the ‘Custom Browser User Agent’ box.
Custom Robots.txt UA
By default, Sitebulb’s Robots Politeness rules are set to respect robots directives. It is likely your Custom User Agent is not listed in your robots.txt file. As such, if you want Sitebulb to read your robots.txt file as a specific agent, like Google Smartphone, you can set this here.
Providing custom headers
Finally, you can also authorize Sitebulb access to your website using Custom Headers. These will be provided by your client or dev team, and you can set your custom headers name and value within the respective area under Crawler Settings.