Sitebulb’s standard Chrome Crawler settings are set to mimic the behavior of Googlebot and accommodate the rendering and loading time of most websites while maintaining crawl speed and efficiency.
These default settings allow Sitebulb to crawl most websites successfully. However, from time to time, you may come across websites with unique setups that require these settings to be adjusted in order to be crawled successfully.
This troubleshooting guide talks you through what to check if your pages are not being rendered correctly. Some of the issues that indicate this may be the case are:
Missing On-page elements (H1, Title tags)
Missing content (page wordcount lower than expected or zero, no outgoing links found, visibly missing content in the returned screenshots)
Cloudflare block, re-captcha, and other security features blocking page rendering
A note on testing Settings
It is often not evident what is causing the rendering errors, so you may not know what settings you need to adjust. In order to test most of the suggestions below efficiently, you can run your page URLs through the Single Page Analysis tool, and check the results returned with different settings.
Adjusting Crawler Settings
In most cases, rendering issues can be addressed by simply adjusting your project’s settings. The steps below outline the most common issues and settings adjustments to allow you to troubleshoot and crawl successfully.
Block Ad and Tracking Scripts
This setting is enabled by default since third-party URLs are likely to bloat the audit, and they are generally not valuable for technical auditing.
However, some websites depend on third-party scripts to load. When this is the case, the ‘Block third-party URLs’ setting needs to be disabled in order to crawl successfully.
It is worth noting that when this setting is disabled, it will potentially affect the analytics on the site, as page views will get triggered by the crawler.
Infinite Scroll
If your website lazy-loads content and images, you will need to mimic the behavior of ‘scrolling’ down the page in order to ensure that all of the relevant content is loaded.
Enable Infinite Scroll to instruct Sitebulb to scroll down the page 10,000 pixels, to load in lazy loaded content and images.
Please bear in mind that if your website requires other user interaction, such as clicks, to load content, Sitebulb won't be able to pick this up.
Enable Cookies
By default, Sitebulb does not save cookies as it crawls - this setting works for most websites. However, some websites use cookies to authenticate and enforce security measures.
You can choose to Enable Cookies under Crawler Settings in order to crawl your site successfully in these cases.
The cookies collected during the audit will also be reported on under your Performance report.
Flatten Shadow DOM
When crawling with Chrome, the ‘Flatten Shadow DOM’ setting will be enabled by default. When ‘Flatten Shadow DOM’ is enabled, the crawler can only see the content that is visible in the rendered HTML.
The way Sitebulb hydrates HTML means the crawler will open every node to find whether or not Shadow Dom is used. This means that when crawling with Shadow Dom enabled, the response can break if the HTML on the page is broken (whether or not the page uses Shadow Dom). Disabling this setting can fix the rendering issue.
Note that this setting will only be relevant if your site uses Shadow DOM. If your site does not use Shadow Dom, we recommend disabling the setting since it will slow down your crawl.
Flatten Iframes
This setting is enabled by default to mimic Googlebot’s behavior - iframes are fitted into a div element in the rendered HTML of the parent page.
Incognito (Session Isolation)
This is relevant for websites where ‘cross-tab tracking’ is in place, meaning that the crawling and rendering of one page can affect the functionality or content of another.
Enabling this setting will instruct Sitebulb to crawl in Incognito mode. This will isolate each session and avoid cross-tab tracking.
Enable Service Workers
Service workers are used on some websites to improve user experience and address potential connectivity issues.
Enable this setting if you know your website renders content or links via service workers. Use Dev Tools to verify the presence of registered service workers on your site.
Considered Loaded Event
This setting allows you to determine what event Sitebulb will use to determine that the page has finished loading and move onto the next step.
‘Wait for the Load event’ is the recommended setting, as this fires once the whole page has loaded, including all dependent resources (stylesheets, fonts, images) - this will be suitable for most websites.
However, very occasionally, you may encounter websites where the page load process means that Sitebulb cannot collect all page data with this deafult setting.
In cases like this you have a range of other options to select the event that marks the page as loaded, including 'Wait for the InteractiveTime event'
Render Timeout
Render timeout refers to the time Sitebulb will wait before gathering the HTML content from your pages. The wait happens after the ‘Considered Load Event’ occurs.
Increasing this timeframe means that Sitebulb will wait for longer, giving JavaScript-heavy pages time to load the content so you can capture as much HTML as possible.
However, note that this adds ‘waiting time’ to each URL crawled, which will impact the speed of your crawl. Only increase this setting if necessary.
Still having issues?
This article outlines the settings that are most commonly the culprit of rendering issues. If you have followed this guide, tested different settings configurations, and are still having issues, don’t hesitate to reach out to us. We’ll work with you to get to the bottom of the issue and help you get efficient and accurate audit data.
The best way to report problems is through the messenger within the software.
You can also reach us through our support email address: [email protected]
