Choosing the right settings for efficient auditing

Choosing the correct audit settings is essential if you’re looking to audit efficiently and get accurate audit data. In this piece, we look at the key audit settings you may need to pay attention to in order to achieve this.

Audit Data

The first screen you’ll encounter while setting up your audit is the Audit Data tab. On this screen, you get to choose what reports will be included in your final Audit. Only the Search Engine Optimisation report is selected by default.

At this point, it is important to have a clear idea of what you are trying to achieve with your Audit and only enable the reports that will provide you with relevant data to meet those goals.

Remember that there is no limit to how many projects you can create in Sitebulb - so you may want to consider separating your auditing goals into different projects (i.e. you may want to set up a separate project to audit Performance).

While it may be tempting to enable everything, the reports you enable add processing time to every URL, slowing down your crawl. And specific reports like Performance, Spelling, and Accessibility are particularly resource-intensive.

Key Tips

Have a clear idea of what data you need
Only enable the relevant reports
Create separate projects for different auditing goals
Use Advanced Settings to more granularly control the data included in your reports

The Crawler

When setting up your Sitebulb audit, you’ll be able to choose between the HTML and Chrome Crawler.

The HTML crawler executes traditional HTML extraction, it is the fastest option and suitable for most sites.

The Chrome Crawler renders JavaScript using the latest stable version of Chromium. Choosing this option will slow down the crawl, but will ensure that all relevant content is rendered and analyzed by Sitebulb if your website uses a JS framework or relies on JS to render some of its content.

Key Tips

Run a sample crawl with the Chrome Crawler or use the Single Page Analysis tool to determine whether your website renders content with JavaScript. Adjust the crawler accordingly.

Crawl Speed

Crawl speed is determined by two key factors, threads or Chrome instances, and your URLs per second limit. The number of threads determines how many requests Sitebulb can send to your server at any one time. The URLs per second limit sets a maximum for how many HTML URLs Sitebulb can process per second.

By default, Sitebulb is set to crawl with 5 threads, or instances of Chrome, and a limit of 5 URLs per second. While this will work for most websites, you may be looking to adjust this crawl speed in order to crawl faster or meet the limitations of your server.

Key Tips

Understand what your server can handle before you adjust (and especially before you increase) crawl speed settings
If appropriate, increase the number of threads or instances of Chrome to provide SItebulb with more crawl resources
Increase or remove the URLs per second limit to allow Sitebulb to crawl faster.

Crawl Sources

By default, ‘Crawl Website’ is selected as the crawl source for every new project you set up. With this setting, Sitebulb will perform a website crawl, beginning from your start URL and following links on every page to discover new URLs until every page on the website is crawled.

What this means is that Sitebulb will only find your website pages if they are linked internally. If you are working with a site with poor architecture or internal linking, you may want to add additional crawl sources in order to find every URL.

Key Tips

Add XML Sitemaps as a Crawl Source to ensure all key pages are crawled
If you are connecting to Google Analytics or Google Search Console, you can also opt to extract URLs found in these platforms and include them in your crawl.
If you are using inclusion or exclusion rules to limit your crawl or know of any directories of the site that are not linked internally, use the URL Seed List to help Sitebulb find and crawl all your internal URLs.

Robot Directives

Robot Directives allow you to control how Sitebulb's crawler appears to the website server when making HTTP requests, and choose how Sitebulb will respond to different robot directives.

By default, Sitebulb respects all robot directives, including those found in your robots.txt file and headers.

Key Tips

The default Sitebulb UA will work in most instances, but you can adjust the UA under Robots Directive Settings to emulate different bots or search engines.
You can also set a Custom user agent and Custom robots.txt UA from here.
Select ‘Is Staging Site’ in robots politeness settings if you are dealing with a website in development.

Running sample audits

If you are unsure of what data you need or what to expect when crawling a new site for the first time, running a sample audit can be a great way of getting an idea of what a full audit will look like and whether you need to adjust key settings.

This is particularly relevant if you are preparing to audit a large website, since the process can be time-intensive, so you will want the final result to be relevant and accurate.

Sample audits are also a great way to carry out separate resource-intensive auditing like the Performance report since you will likely only need to audit a sample of URLs to identify the key issues present across your site.

Key Tips

Run sample reports before crawling large websites and use the resulting data to determine whether you need to adjust settings or set any exclusions and limitations. More on how to crawl large websites
Run sample audits for resource-intensive reports like performance. You can limit the 'Maximum URLs to Audit' or use a URL list of your key pages as the crawl source to ensure Sitebulb audits your key pages.

Limiting the crawl

Just as important as knowing what data you need in your crawl is knowing what data you don’t. Limiting your crawl to the data points and areas of your site that you are trying to analyze can help you find key insights and opportunities for improvement faster.

This may mean excusing external URLs, crawling only the areas of the site you want to analyze, or excluding unnecessary parameterized links and scripts.

Key Tips

Use URL Inclusion and Exclusion rules to limit the crawl to specific areas of the site (if relevant)
Consider Excluding Internal URL Query String Parameters
Consider disabling external link analysis
Consider disabling Page Resources or adjust Advanced Settings to only collect relevant page resources
Consider blocking Ad and Tracking scripts to avoid audit bloat

Audit Data

Key Tips

The Crawler

Key Tips

Crawl Speed

Key Tips

Crawl Sources

Key Tips

Robot Directives

Key Tips

Running sample audits

Key Tips

Limiting the crawl

Key Tips

Other Relevant Documentation