Skip to main content
All CollectionsUsing & Analysing DataUse Cases
How to find broken links on a website
How to find broken links on a website
Updated over 5 months ago

In most website audits, including recommendations to fix broken links is a basic requirement: if they are present, you pretty much always want to report on them.

This is because broken links tick a bunch of boxes:

  • They are bad for SEO

  • They are bad for user experience

  • They are pretty easy to understand for clients/bosses/developers

So it's reasonably straightforward getting buy-in to get them fixed. For background information on what broken links are and why they are important, check out our guide, Ultimate Guide to Broken Links.

What are broken links?

Broken links are NOT pages that return a 404 response - broken links are actually the links that point to these 404 pages (or 410, we'll come on to that).

The difference is subtle but important. If Page-A returns a 404 'not found' status code, and Page-B contains a link that points at Page A, we can describe this individual link as a 'broken link.'

How does Sitebulb define broken links?

Sitebulb classes a broken link as any internal link that points at a page with a HTTP status code of 404 or 410. Lots of other crawling tools group all the 4XX errors in one, but we don't believe this promotes the right mindset for tackling broken links.

  • 404 status means 'Not Found'

  • 410 status means 'Gone'

In essence, both of these mean that the resource is not available - the server is responding to say 'the content is not here.' This means that if a user lands on the page, they will not see any meaningful content.

By comparison, all other 4XX status codes do not mean 'the content isn't here', and in fact a lot of them mean 'you aren't allowed to view this content' (check here for a complete list).

Consider an example, a 403 'Forbidden' status:

403 Forbidden Status

It is often the case with a 403 that a normal user accessing the page in a browser actually is able to view the content, because 403 does not mean 'the content is not here' (403 instead relates to an authorization issue, which can simply happen because you are using a crawler to access the page, rather than a browser).

This is why Sitebulb restricts the definition of 'broken links' to only be links to URLs which return 404 or 410.

How to find broken links

Firstly, you'll need to crawl the website, so start a new website audit or project. Sitebulb will collect links by default, so any broken links will automatically be picked up by Sitebulb in your audit.

Links data is accessible in many different areas of the Sitebulb interface, allowing you to analyse widespread issues and then dig down into specific details. In this section we'll cover how to explore broken links within the Sitebulb interface, and further below we'll cover how to export the links for spreadsheet analysis.

Find all broken links to a single URL

If you wish to explore a single 404/410 URL, you can explore the incoming links that point to this URL by navigating to the 'URL Details' view.

From a URL List (e.g. list of 'Broken internal URLs') you can get there by clicking on the burger menu alongside the URL, which opens up the URL Details view, and from there you can navigate to Incoming Links.

URL Details Broken URLs

The page you end up on allows you to see in more detail the incoming links pointing to the broken URL:

Incoming Links

Note: You can also navigate straight to the URL Details page for a single URL by pasting it into the 'Search URLs' box in the top right:

Find Specific URL

Find all broken links across a website

For this, you want to head to the 'Links' report, using the left-hand navigation.

Scroll down to see the 'Internal Link Status' table, as shown below, where you can find the 'Broken (404 or 410)' row:

Broken Links dashboard

You may have noticed above that we have these two columns 'All' and 'Unique'. The 'All' column represents every single link found, whereas 'Unique' represents links that have unique anchor text, target URL and link location (i.e. a templated header link from 500 pages only counts as 1 unique link). We'll cover the significance of this further down.

Click on either of these values to see the link data within the Sitebulb user-interface, using the Link Explorer:

Broken Links in Link Explorer

This list will show you every referring-target URL pair:

  • Referring URL - the page with an outgoing link to the target URL

  • Target URL - the page with an incoming link from the referring URL

In the case of broken links, the referring URL will be status 200, and the target URL will be status 404/410.

Due to the nature of internal links, you will normally see some URLs repeated in both the referring and target URL columns (you can see this in the image above). This is because the same URL can link out to a single page more than once, and any page can (and usually does) have incoming links from multiple other URLs on the same website.

At this point, your understanding of 'where broken links live' within the website is pretty limited. However, the Link Explorer allows you to... explore the data further, for instance applying URL or path filters on the Target URL.

Filter on target URLs

In the example above, I've restricted the list to only show target URLs in the /Blog/post/2020/ subfolder - and can see there are 4 different links across 3 different URLs.

How to export broken links

Most often, SEOs wish to get broken links into spreadsheet format, typically because this is the most straightforward thing to give to a developer as a 'list of things to fix.'

From the Link Explorer, hit the green Export button, and then from the dropdown select either CSV or Google Sheets.

Export broken links

As a minimum, sharing a links spreadsheet like this should be enough for a developer to get going with the fixes:

Fix broken links

How to report broken links

Whilst a list like the one above is 'ok', it is not particularly helpful for the client/developer/content editor who has to go through all these pages and fix the broken links. This is because there has been no differentiation applied for the three types of broken link:

  1. Broken links in a template navigation element (e.g. header/footer) - where fixing a single template would fix many broken links.

  2. Broken links in a content navigation element (e.g. sidebar) - where cleaning up the database which 'feeds' sidebar links would fix many broken links.

  3. Broken links within the page content itself - where each fix would need to be carried out manually one by one.

Analysing broken links

Applying some analysis to the data and presenting recommendations in a clear, prioritised format will go a long way to:

  • Make it easier for your client to understand what's broken.

  • Make it easier for the developer to understand what needs to be fixed.

  • Make it more likely that fixes actually get implemented.

Which in combination make you more likely to demonstrate your own value as an SEO or consultant.

As such, splitting out broken links into the 3 categories above will improve the quality of your audit report. Sitebulb has a column which aids in this analysis, the 'Location' column:

Location Column

This allows you to separate out your broken links into 4 groups: Header, Footer, Navigation and Content. Header and Footer links will normally be template based (and typically lead to lots of broken links), Content links will normally be one-off instances, and Navigation links can vary.

As we mentioned earlier, Sitebulb's Links report includes a template which differentiates 'All' vs 'Unique' links. Analysing this data table can be helpful for understanding if most of your broken links are templated (e.g. Header/Footer) or one-offs (e.g. Content). Consider the table below, we can see that the site has almost 15,000 broken links, but only 82 unique links. This suggests that at least some of those broken links will be template based.

Lots of broken links low uniques

Prioritising broken link fixes

With the analysis above to hand, this allows you to present your recommendations in a clear and logical manner, prioritizing based on the most efficient work/value ratio. For instance, if you can fix 1 broken link in the footer template which is used on every single page, you can literally fix thousands of broken links with a very small amount of work.

An example report might look like:

Broken Link in footer - HIGH PRIORITY

You have a broken link in the footer, which is contributing to 15,000 broken links across the site. Fixing this link in the footer template should resolve all of these broken links.

You can see a screenshot of the link in question below, and the full list of links in the worksheet 'Broken footer links.'

Broken link in footer

Broken Links in sidebar navigation - MEDIUM PRIORITY

You have a number of broken links in the sidebar navigation, which are contributing to 153 broken links across the site. There are only 4 unique 404 pages which are have incoming links, so if you can remove or fix the links to these 4 pages, it should resolve all these broken links.

You can see a screenshot of an example link below, and the full list of links in the worksheet 'Broken sidebar links.'

Broken links in navigation

#

Broken Links in content - LOW PRIORITY

You have 27 broken links within the text content of individual pages. Since these links are not template/dynamically driven, you will need to fix these links individually.

You can see a screenshot of an example link below, and the full list of links in the worksheet 'Broken content links.' I have pre-sorted the pages based on URL Rank of the referring page, so this list is already in a rough priority order, with the more important pages first.

Link in content

Did this answer your question?