Skip to main content

Content Search Settings

An overview of content search settings and how to search a website for a word, phrase, or string of text

Updated yesterday

This guide gives a step-by-step walkthrough on how to use Sitebulb to search a website for a word, phrase, or string of text, and find every page on which it appears.

Sitebulb's 'Content Search' feature allows you to configure the crawler to search for a specific word or string of text on every page that it crawls. This data is then presented in the Content Search report.

How to add Content Search to your website audit

To add content search to your audit, enable Content Search in your audit settings.

Once you have toggled Content Search on, use the green Add Rule button to set your search parameters.

The Add Rule button will open up the on-screen rule wizard.

At the top of the window, you'll find three tabs:

  • Basic - Use the Basic tab to set up simple searches of one word or string

  • Advanced - use the Advanced tab to set up searches containing multiple related terms and exceptions

  • URLs - Use the URLs tab to determine on which pages Sitebulb should perform the content search.

Keep reading to Learn more about how to set up basic and advanced rules.

Once you're done adding rules and any other audit setup configurations, hit Start Now at the bottom right of the screen to start the audit.

Basic Content Search

For most basic searches, all you need to do is enter the word or text to find in the top box and hit 'Add Rule'.

Enter text to search

The rest of the settings will work well as they are by default for most cases, but you can use these to define your search further if necessary.

Basic settings - explanation

Let's dig into what each option means in more detail:

  • Word or text to Find - This is the search term that Sitebulb will search for when crawling each page on your website.

    Sitebulb uses a phrase match, so the search text 'ski googles' will match on a string like 'best ski goggles' but not on a string like 'best ski or snowboard goggles'.

  • Ignore case - Choose whether Sitebulb should match the search term exactly, taking account of capitalization.
    For example, if your Word or text to find is set to 'ski googles' and Ignore case is ticked, Sitebulb will match on a string like 'Ski Goggles' or 'SKI goggles.' When Ignore Case is unticked, the search would only match on the lowercase 'ski goggles.'

  • Element to Search - Choose from the dropdown to select which HTML element Sitebulb should search. The default setting of 'All html elements' works well in most cases, but we will explore some other examples below.

  • Search In - The options here are 'Text Only' or 'HTML and Text.'
    The 'Text Only' option will only search the visible text on the page, while the 'HTML and Text' option will also search in the HTML (e.g., meta descriptions).

Most of these options are quite intuitive and/or straightforward to test and verify yourself. However, the option 'Element to Search' is a bit more nuanced and requires a bit more explanation.

Element to Search explained

For a start, there are a number of options in the 'Element to Search' dropdown, which refer to the HTML structure of the webpage:

Basic HTML structure

  • All html elements will search the entire HTML of the page for a match with your search terms.

  • In the <head> element - will search only in the head section of the page HTML.

  • In the <body> element - will search only in the body section of the page HTML.

  • In the <body> but not <a> - will search in the <body> section of the page HTML only, but it will not include any anchor (<a>) elements (links).

  • A specific element - allows you to select a specific page element you wish to scrape and search.
    When you select 'A specific element' under the 'Element to Search' setting, a new box appears underneath. Here is where you enter the CSS selector for the specific element you wish to scrape.

How to pick the CSS selector

The CSS Selector allows you to pick out a specific section from a page template.

Consider a typical e-commerce product page, I may only be interested in searching the 'content text' portion of the page, not the navigation elements or boilerplate copy in the body. So I need to pick out the selector that the relevant text.

To choose the selector, use the 'Inspect' feature in Chrome.

Find the section of the page you wish to search in and select it so that the corresponding element is highlighted in the Elements panel.

Select CSS selector in chrome

In this instance, I can see that the inspector I need is: div.product-description-content-text

You can now right-click> Copy > Copy selector to choose the CSS selector for this area

For clarity, here is how the rule looks once set in Sitebulb:

Added CSS Selector

Advanced Content Search

Advanced Content Search rules give you more granular control over the word, text, or regex patterns you want Sitebulb to match. At the top of the Advanced tab, you'll find two boxes that allow you to set these rules:

Advanced Setup
  • Words, text, or regex patterns to find - in this box, you can provide a list of terms for Sitebulb to find, one per line. Sitebulb will treat these as OR statement and match URLs that contain ANY of the terms in this list.

  • And does not contain these words, text, or regex patterns - In this box, you can optionally add search terms to exclude. Sitebulb will provide a count of URLs that match ANY of words, text, or regex patterns to find and DO NOT match ANY of the terms listed in this list.

The rest of the features work exactly like they do in the Basic search tab. Please refer to the content above for more details.

URL matching

In the URLs tab, you can enter inclusion or exclusion patterns so that Sitebulb will only perform the content search analysis on specific pages.

By default, Sitebulb will perform the content search on every single page on the website. This means you are asking Sitebulb to do more work in terms of processing, and more data will be stored once the audit data has been collected. For smaller websites, the size and scale of the additional resource requirements is negligible.

However, Sitebulb can handle websites with millions of pages, and at this sort of scale, you might want to look at reducing the amount of processing work Sitebulb has do while crawling, to reduce crawl time and resources.

Let's look at how to add inclusion and exclusion rules in the URLs tab.

Adding URL exclusion patterns

To exclude particular directories from the Content Search, enter the paths to exclude preceded by a minus (-) sign.

For example, if I wanted to find pages that mention 'crawler', but don't want to perform the search on any of our /documentation/ pages, we would enter the /documentation/ path with a minus (-) sign ahead of it:

  • -/documentation/

Exclude docs pages

In the resulting Content Search URL list, the /documentation/ pages are listed as 'Not Set', so you can differentiate pages where Sitebulb simply did not perform the search, from the legitimate zeroes (pages where Sitebulb searched and did not find a match).

Documentation pages not set

Adding inclusion patterns

Alternatively, you can add inclusion patterns to limit the content search to certain directories. For example, if we only wanted to check for the word on our product and feature pages, we could select to only perform the search on /product/ and /features/ pages, by listing the directories:

  • /product/

  • /features/

Only product or features URLs

Results true zeroes

The URL matching works for both the Basic and Advanced rules, and it is defined individually for each new rule you add, so you can get super specific with your setup.

Adding Content Search rules in bulk

If you have a list of words or phrases that you want to search for, use the 'Add Multiple Rules' button in order to add them in bulk. This allows you to add multiple basic search rules at once.

To add, write your words, text, or regex patterns, one per line, or just copy/paste into the box.

The rest of the settings work exactly like the single 'Basic' configuration above.

While this feature does not give you the granularity to configure each word differently, it allows you to bulk upload hundreds or thousands of phrases all at once.

When the report is complete, each rule will display as if you had entered it one by one. This is different from the advanced search setup, where all instances of the search terms listed in your advanced rule are counted against the same column in the report.

Bulk Content Search

A note on scale

While the 'Add Multiple Rules' feature allows you to add thousands of search rules at once, the resulting Content Search report will only load 50 columns at a time. That means you would need to do a lot of adding or removing columns to analyse the data within the tool in batches.

If you add a large number of rules in bulk, the best way to access the data is to use the green Export All Search Data button at the top of your report to analyse the data outside of Sitebulb.

Viewing Content Search data

Once your audit is complete, you will find the Content Search report on the left-hand side menu of your Audit.

On the Content Search overview, you'll find a list of your content search rules alongside key data:

The two data columns tell you slightly different things:

  • Total Found refers to the total number of instances that Sitebulb found the search term(s) corresponding to each rule, even if some of them were on the same page.

  • Found on URLs represents the number of unique URLs that Sitebulb found the phrase on.

Content search overview

To analyse the data, we can move on to the URLs tab, where we'll find a list of all the URLs crawled. You will see a data column for each search rule with data on how many instances of the corresponding search term were found on each page.

Content Search URL List

We can quickly sort this data by clicking the column heading for any rule we want to sort by.

Sort URL List Data

As always with URL Lists, you can add or remove columns so that you can easily combine technical crawl data with your content search data. You can also create filters on the data to gain additional insights.

Advanced filter on content search

That is the basic setup, and this simple process will allow you to easily set up content searches and view the data in your results.

Final caveat - crawl with Chrome when necessary

The final thing to point out is that on some sites, content is loaded in via JavaScript. That means Sitebulb will need to render the JS in order to see this content and search within it.
​
If the website you are crawling renders content using JS, make sure to select Chrome Crawler in your audit settings.

Content Search use cases and examples

Some examples of key cases where the content serach feature can help include:

  • Check if e-commerce product pages contain 'out of stock' messaging.

  • Check which product pages reference a particular brand name.

  • Understand which pages mention certain target keywords (for building internal links).

Element to Search Example

Let's say we wanted to point some more internal links at our JavaScript crawling page. If we search for the phrase 'javascript crawling' in the entire <html> or entire <body>, this will catch all the links in our top navigation panel:

JavaScript Crawling in header

So literally every single page would get flagged. Not helpful at all.

This is when a setting like In the <body> but not <a> comes in handy. This specific option means that Sitebulb would search in the <body> section only, but it would not include any anchor (<a>) elements. In other words, search the body content but don't include any links.

If we instead choose '<body> but not <a>' then this would only pick up the instances where the phrase 'javascript crawling' is present in the non-link <body> elements.

Very helpful indeed.

Advanced Search example

Imagine we are auditing a travel website. We want to identify pages that talk about specific winter sports, so we could set it up like this:

Winter sports

Once this rule is applied, Sitebulb would search for any pages that contain either 'skiing', 'snowboarding' or 'ice skating' (or any combination of the three).

When we take a look at the results, you can see the value in adding a rule name:

Advanced Results

In this case, the numbers returned in the 'Winter Sports' column reflect the total number of matches. So a result of '6' might mean that 'skiing' is mentioned 4 times, snowboarding' 2 times and 'ice skating' not at all.

Now, imagine we wanted to identify pages that talk about specific winter sports, but only for certain countries. We could rule out specific countries by adding them in the right hand 'does not contain' box, e.g.

Winter sports not europe

Once this rule is applied, Sitebulb would search for any pages that contain either 'skiing', 'snowboarding' or 'ice skating' (or any combination of the three) AND ALSO contain none of 'france', 'spain', 'italy' and 'austria.'

What this does is surface the pages about USA/Canada instead of Europe, as we wanted:

Canada USA winter sports

Using this combination approach allows you to do things like categorise pages based on topic, or group them based on a set of target keywords - which could then be used for content audits or internal linking strategies.

Not got Sitebulb yet?

Don't worry! You can register for a free Sitebulb trial here and get started straight away.

Did this answer your question?