If you have been experiencing issues when trying to crawl websites using the Chrome Crawler, please follow the steps below to diagnose and (hopefully) fix the problem.
1. Restart your computer
An astonishing amount of software-related issues can be resolved simply by restarting your computer - it is almost always the first thing to try when trying to fix issues.
And yes, we know it's annoying to have to shut all your programs down and interrupt your work, but it's the most straightforward of these resolution steps, so please make sure you do it.
If you're still experiencing issues after restarting, please follow the advice below.
2. Check the error message
There are quite a lot of different error messages you can see relating to our Chrome Crawler, most of which are pretty similar.
There is one, however, that is different; it occurs on the 'New Project' screen and it looks like this:
If you see the error above, you should be able to fix the problem simply be reinstalling the software (please download from here).
The other most common errors you will either see on the audit setup screen, or in the audit overview after it has attempted/failed to complete the audit:
The Chrome Crawler is missing or corrupt.
The Chrome Crawler is not running. Please restart your computer.
The Chrome Crawler did not, or could not process the URL. Please restart your computer.
The Chrome Crawler failed to restart.
If you see these messages, it is most likely a crawl configuration issue or an issue with the website itself (although it could also be an internet issue, a proxy issue or a firewall issue...!).
Keep following the steps below to help diagnose and fix these problems.
Please see the list at the bottom of this page for all possible Chrome Crawler error messages.
3. Check if the issue is restricted to one website
You may have a problem with the Chrome Crawler in general, or it may just be a problem with the one site you are trying to crawl.
You can confirm by running a few websites through the Single Page Analysis tool - which should come back with information as normal. Just use the default settings, don't adjust anything.
If it does come back with data (like the image above), this means the Chrome Crawler is working in general, but there is a site-specific issue that you may be able to resolve by adjusting the configuration.
If you get a failure at this stage - where the Single Page Analysis does not work for any website, skip to Step 8, otherwise continue:
4. Check the problem website with Single Page Analysis
Now input the start URL for the problem website you were trying to crawl into the Single Page Analysis tool.
You should also see a failure message:
An error at this stage is expected, since we already know that Sitebulb is having problems trying to crawl the website.
5. Check your VPN and Proxy settings
At this point, it is worth ruling out connection issues before we move on to troubleshooting audit settings.
Sitebulb needs a consistent connection to an IP address to perform various checks during the setup and auditing process. If this connection unexpectedly changes or is interrupted, you are likely to see errors.
If you are accessing the internet through a VPN or Proxy, turn it off and try running the audit on your local machine.
6. Run through different configuration options
From the Single Page Analysis screen, click the little cog on the right-hand side to open up the Advanced Settings. This is quite a long pane so there is a scroll bar on the right which you will need to scroll to access some of the configuration options.
You will need to try different options, and different combinations of options. Once you have changed one of the configuration options, press the green Check button at the top to test it again.
The first things to try should be:
User Agent - try a different 'non-crawler' User Agent, such as 'iPhone' or 'Chrome Windows'
Cookies - try both enabled and disabled
If it does start working, you should have found your solution.
So let's say that changing the User Agent to 'Google Smartphone' returns information and a 200 status code, as expected, you would then need to go back into your project settings and update the User Agent there.
More Advanced Settings
Assuming it is still not working, open up the Advanced Settings again, and this time scroll down a little more, so you can also test:
Flatten Shadow DOM - try both enabled and disabled
Flatten Iframes - try both enabled and disabled
Incognito (Session Isolation) - try both enabled and disabled
Enable Service Workers - try both enabled and disabled
Again, you will need to try different options, and different combinations. Once you have changed one of the configuration options, press the green Check button at the top to test it again.
Hopefully, by this stage, you should have established which options need to be toggled on/off in order for the Single Page Analysis to return results as expected.
7. Test new settings in a Project
Once you have worked out which settings need to be tweaked, you'll need to go and set up a new project in Sitebulb with your amended settings - using the start URL of the problem website.
We would always suggest starting a new project rather than just adjusting an existing project.
Depending on which setting(s) you need to change in order to get the Single Page Analysis tool to work, you may need to adjust settings in a couple of different places.
The 'New Project' screen allows you to adjust the Device, User Agent and Cookies (Yes/No) - but to see these options you need to hit Advanced Settings in the bottom right:
The remaining Chrome settings are available once Sitebulb has performed the pre-audit checks (which happens after you hit Save and Continue on the New Project page).
You first need to navigate to the Crawler Settings, from the left-hand menu:
Then scroll down to find the Advanced Chrome Crawler Settings, which include all the toggles for Flatten Shadow DOM, Flatten Iframes, Incognito and Enable Service Workers.
Again, you want to replicate the settings that worked for you on the Single Page Analysis tool.
Then, go ahead and see if Sitebulb will crawl the site (fingers crossed!).
8. Check that your anti-virus isn't blocking Sitebulb
If the Single Page Analysis is not working for any website, it means that there is a broader problem with the Chrome Crawler on your machine. Anti-virus software blocking Sitebulb could be one cause for this.
What we refer to as our 'Chrome Crawler' utilizes a version of headless Chromium, which is basically a developer-friendly, open-source version of the Chrome browser we all use every day.
Anti-virus software can be both aggressive and inconsistent, particularly when it comes to something like headless Chromium, which certainly CAN be used in malware or adware (even though Sitebulb absolutely does not do anything dodgy).
8.1 Antivirus software like AVG
Open your anti-virus software and check blocked apps
Go into your anti-virus settings and find 'Blocked Apps' (or the software's equivalent).
For example, this is what we see in AVG:
Check quarantined files
You might also find Sitebulb or some of its files in 'quarantine':
Exclude Sitebulb from anti-virus blocks
If Sitebulb is being blocked by anti-virus software, go ahead and remove any blocks, and add the entire Sitebulb folder as an exception:
The Sitebulb folder will have the following path:
On Mac:
~/Users/<USER>/Library/Application Support/Sitebulb
OR
~/Users/<USER>/.local/share/Sitebulb
On Windows:
c:\\ProgramData\Sitebulb
OR
c:\\Users\<USER>\AppData\Local\Sitebulb
This should then look something like this:
This should stop Sitebulb from being targeted by anti-virus software in the future.
8.2 Windows Defender
Windows Defender can also misidentify executable files like headless Chromium within the Sitebulb folder as a threat.
Check that Windows Defender is up to date
First, check that Windows Defender is up to date and install any pending updates.
Check Quarantined Files
Navigate to Windows Security > Virus & threat protection > Current threats > Protection history to check recent threats identified by Windows Security.
If Windows has misidentified Sitebulb's headless Chromium file files as a threat, you will most likely find chrome-headless-shell.exe under quarantine, where you can restore it.
Headless Chromium file path example:
c:\Users\<USER>\AppData\Local\Sitebulb\Browsers\ChromeHeadlessShell\Win64-128.0.6613.119\chrome-headless-shell-win64\chrome-headless-shell.exe
This file is not a threat and it is essentially to Sitebulb's pre-audit and to be able to run the Chrome Crawler and audit websites that depend on JavaScript.
Exclude Sitebulb files from Windows Defender
To prevent Windows Defender from interfering with Sitebulb files, whitelist the Sitebulb folder.
Your Sitebulb File path will look something like this.
c:\\ProgramData\Sitebulb
OR
c:\\Users\<USER>\AppData\Local\Sitebulb
If you are not sure, you can find the exact location of the Sitebulb folder in your machine by navigating to Your Account > Logging. Just make sure Sitebulb is shut down before you go ahead with this and the following step.
Reinstall Sitebulb
In our experience, one of the most common things that anti-virus software does is actually delete the installed Chromium .exe file out of the Sitebulb folder. So after you whitelist the Sitebulb file, please reinstall the latest version of Sitebulb.
Do not worry, you will not lose any of your old audits or anything - this is just like applying an update.
Check your site on the Single Page Analysis tool
Once you have reinstalled Sitebulb, open it up, head back to the Single Page Analysis tool, and try again.
If it works...huzzah! You've fixed it. Now you should be able to go back and run your audit again.
If it doesn't work, move on to step 9.
9. Check your firewall isn't blocking Sitebulb
Similar to anti-virus, your firewall could be blocking Sitebulb from making outgoing connections (which it needs, in order to crawl websites).
Check that Sitebulb is in your 'allowed' list
Navigate to firewall settings and check that Sitebulb is marked as 'allowed'. If not, add it as an allowed app.
Check ports
Also check that port 10401 and 10402 are allowed - as Sitebulb needs these ports to communicate.
Check your site on the Single Page Analysis tool
Once you have adjusted your firewall settings, open up Sitebulb, head back to the Single Page Analysis tool, and try again.
If it works...huzzah! You've fixed it. Now you should be able to go back and run your audit again.
If it doesn't work, move on to step 10:
10. Contact Sitebulb support
If you've tried everything listed above, but Sitebulb STILL will not crawl properly, it is probably something we have never seen before. In which we'll need to work with you to get to the bottom of the issue (which we will!).
The best place to report problems is through the messenger within the software.
You can also reach us through our support email address: [email protected].
Please provide the following information:
The website you are trying to audit
Screenshots of the error(s) you are seeing
Which steps you have taken to try and resolve it
Your Sitebulb log files (find out how to get these from here)
We'll look into it and figure out what we need to do to make it work!
Sitebulb's Chrome Crawler error messages and warnings
For reference, these are all of the error messages you may see related to the Chrome Crawler:
The Chrome Crawler is missing or corrupt.
The Chrome Crawler is not running. Please restart your computer.
The Chrome Crawler did not, or could not process the URL. Please restart your computer.
The Chrome Crawler failed to restart.
The Chrome Crawler could not process the URL.
The Chrome Crawler timed out whilst processing the URL.
The Chrome Crawler failed to return a response.
The Chrome Crawler took too long to respond.
The Chrome Crawler timed out, took too long to respond.
The Chrome Crawler canceled, took too long to respond.
The Chrome Crawler timed out whilst processing the URL.
Chrome Crawler navigation error: ERROR DETAILS.
The last one will look something like this: