Skip to main content

Crawling Shopify websites with Sitebulb

Authenticate Sitebulb to crawl Shopify websites successfully

Updated today

Occasionally, you'll find that your Sitebulb Audits do not return the expected results. When crawling Shopify websites, this is often due to the fact that Sitebulb ran into a 429 'Too Many Requests' error, as Shopify employs rate-limiting measures.

Sitebulb stops crawling once it receives a 429 response from the server, since no data can be collected for the URLs returning this HTTP status. To overcome these errors and successfully crawl your website, you will need to whitelist Sitebulb or otherwise authenticate the crawler on the website server.

In the case of Shopify websites, that means providing an HTTP message signature in the HTTP headers of each request, allowing Shopify to verify that the request is coming from an authorized crawler.

If you are trying to crawl a Shopify site but you client is unable or unwilling to provide the HTTP message signature, the only course of action is to reduce the crawl speed as much as possible in the Crawler Settings.

Follow the steps below to set up the necessary authorization to crawl Shopify websites successfully.

Create your HTTP message signatures in Shopify

Follow the steps provided in the Shopify documentation to create your signatures:

  1. Online Store > Preferences in Shopify admin

  2. In the Signatures section, click Create signature.

  3. In the Name field, enter a descriptive name for your signature.

  4. In the Domain field, select the domain that you want to use this signature for.

  5. In the Valid for section, select an expiration period for your signature.

  6. Click Create.

Each signature consists of three HTTP headers, each of which is a pair: 'Header name' and a corresponding 'Header value.'

Add Shopify signature to your Audit Settings

Once you have created the signatures, add the resulting HTTP Headers to your audit settings to ensure they are part of the HTTP header requests sent to your website server by Sitebulb.

Navigate to Authentication.

Scroll to find the Shopify Settings area, and you will see three settings boxes, which correspond to the three headers you created in the Shopify admin area.

The header names are already set for you, and the value for 'Signature-Agent' is also already set; so you need to enter the two corresponding values for the 'Signature' and 'Signature-Input' headers:

Once you have input these, it should look like this:

You can now go ahead and start your crawl. Sitebulb will send the custom HTTP headers with every request, ensuring that your Shopify website can be crawled successfully.

Signature Expiration

The Shopify signatures have a maximum expiry of 3 months, and will no longer work if they have expired. You will need to create a new signature and update the details in Sitebulb's project settings in order to crawl properly again.

Troubleshooting

If you have followed all the steps above and still see audits stopping early due to Sitebulb hitting a 429 HTTP response, check the steps below:

  • Crawl Type: Some sites require you to crawl with the Chrome Crawler, and will return a 429 response when crawled with the HTML Crawler even if you have added in the correct Shopify authentication. You should see the crawl hit a 429 on the first URL, and you can fix this by changing to the Chrome Crawler in the Audit Settings.

  • Wait before recrawl: If you hit a 429 error and sought to resolve it using the suggestions above, then immediately try to crawl the same website again, you may still hit a 429 response (as the Shopify server has noticed unnatural behaviour originating from your IP address). Wait 15-30 minutes and try again.

  • Any other issue: Shopify have a troubleshooting section in their documentation, so check through those steps, or contact their support.

Did this answer your question?