When to ignore Search Console indexing issues for Shopify stores

By Ilana Davis

If nothing else, Search Console is consistent when it comes to emailing with potential issues found on your site. The page indexing emails perhaps give the most cause for concern for many merchants.

If your goal is to have ZERO issues in Search Console, I'm sorry to say that won't be possible. Search Console is Google's way of communicating with you. Think of this report as a heads-up. In some cases, action may be required, but in others, it's an FYI.

JSON-LD for SEO and structured data, in general, doesn't impact indexing. Yet I'm often asked what to do with these issues and how to resolve them.

Shopify and your theme handle indexing. Shopify itself handles the robots.txt and sitemap. Your theme handles the other pieces (e.g. code). You are in charge of keeping the page quality high so Google will want to come back and index the pages. That includes proper linking, content, and all the other things that go into on-site SEO.

The higher the quality, the more likely Google will index your pages.

Even after doing everything right, you may still see page indexing issues from Google Search Console. So let's talk about each one, what they mean, and what, if any, action is necessary from you.

Shopify specifics

First, some context that will explain why Shopify stores may see these issues.

Shopify uses (or used depending on your theme) collection-aware URLs (e.g. /collection/collection-name/products/product-name). That means that when a customer clicks through your site to a product, the collection name remains in the URL.

However, your product-only URL (e.g. /products/product-name) is the one that is the preferred URL (aka canonical). That means search engines will most likely index the product-only URL and not your collection-aware URL.

Additionally, Shopify manages your default robots.txt file which tells search engines which pages you want them to look at.

You can make edits to the robots.txt file if you know what you're doing, but it comes with some big risks.

Many SEO gurus recommend editing the robots.txt because other platforms don't manage it for you. So they are used to modifying the file. This is where things can get dicey. I've seen SEOs wipe an entire website from Google due to faulty coding.

In most cases, I do not recommend editing the robots.txt file unless you are familiar with the risks and truly know what you're doing.

Google's Page Indexing Issues

Let's go through the most common indexing issues from Google Search Console.

Google's support documentation is a great tool to reference as we go through. You'll also want to open each section in Search Console to investigate the links.

Page with redirect

This is usually a list of pages with a redirect to another page. Since the page is redirected, Google may still be able to see the link from another source such as a backlink. They do recognize that you're redirecting the page to another. It's Google's way of saying at one time we saw this page and indexed it, but since you're redirecting the page, we may not index it anymore.

Action: No action is needed.

Excluded by 'noindex' tag

When you request a page to not be indexed, you'll see it show up here. This can be done by altering the robots.txt file or using Shopify's seo.hidden metafield.

For Shopify stores, I see this a lot right now on URLs with wpm or web-pixel-shopify-custom-pixel. That’s from Shopify’s web pixel javascript file. A few months ago, Shopify included a disallow in the robots.txt file because Google was indexing all these URLs. Not good and we do not want search engines to index these.

The unfortunate impact is that many of these become 404 errors. I think their plan was to remove the disallow from the robots.txt file to then add noindex instead. If you look at your robots.txt file, it should say Disallow: /cdn/wpm/*.js which is how they are attempting to handle this going forward. To view your robots.txt file, append /robots.txt to your domain (e.g. example.com/robots.txt).

This article from Shopify helps to explain the situation. Ultimately, you can ignore the wpm in your links. Over time they should go away though you may also choose to escalate the problem with Shopify to let them know you're not happy.

Action: If the URLs should not be indexed, no action is required. Any wpm related URLs can be ignored. Other URLs in this report should be looked at to make sure they are correctly assigned a noindex. If URLs incorrectly have a noindex, you'll need to investigate further.

Not found (404)

A 404 error appears when Google sees a link but the link doesn't work. The broken link could be from a variety of reasons such as:

  • removing a product from a collection especially if you have collection-aware URLs
  • hiding, deleting, or archiving a page without setting up a redirect
  • a link to your page from another website

Search engines will try to crawl the page over time to see if the page has been fixed. Eventually, they will crawl it less often.

Broken links are common on any website, but they are not good and should be fixed when possible. Not only are broken links bad for SEO they prove to be a poor customer experience.

Action: Attempt to resolve all 404 errors and fix any broken links as quickly as possible.

Alternate page with proper canonical tag

See my notes above about Shopify's collection-aware URLs. The collection-aware product URL is an alternative page to the product-only URL. Your product-only URL is the canonical (or preferred link) which should be indexed and the collection-aware URL should not.

If you were to index the collection-aware URL, you'd run into duplicate content issues because the pages are identical. Only the URLs are different.

Another example comes from the way Shopify appends things to the URL. For example, you may see ?pr_prod_strat= which is from Shopify's recommended product links used for tracking purposes. The same goes for variant URLs with ?variant= in the URL. Though Google can see parameters in the URL, they recognize the canonical page is the one you want indexed.

Action: There is no action required. Google actually says in their docs there is nothing you need to do.

Blocked by robots.txt

This report is saying these pages were seen by Google but the robots.txt file asked not to index them.

Shopify manages your robots.txt file and has it set so that some pages should not be indexed to protect your SEO.

For example, you don't want your search, cart, or checkout pages to be indexed. What sort of experience would a potential customer have landing on an empty cart page?

You may also see sort parameters such as ?sort_by=price. This may be helpful to your customer on the collection page but adds no value to search engines.

Action: I would say 99% of the time, this entire report can be ignored. The exception would be if you modified your robots.txt file or a page is showing up that shouldn't be.

If other pages show up in this report, check to see if you have name="robots" content="noindex,nofollow" showing up in the source code. Right-click on the page, select View Source, and look for noindex. If the noindex code is there but it shouldn't be, you'll need to determine where the noindex is being set and remove it.

Crawled - currently not indexed

Just because Google sees your page (crawled) doesn't mean they are showing it in search results. It's up to Google and other search engines to determine which pages are shown in search results.

Google is more likely to index pages that are high quality through regular SEO optimizations (e.g. title, content, backlinks). In general, the higher quality pages are visited more frequently and rewarded more by Google. Resubmitting the URL will not speed up this process.

Action: Aside from normal SEO efforts to improve your site, no action is needed.

Discovered - currently not indexed

This means that Google found this page but has not yet been able to crawl the page. Google will attempt to crawl the page again at a later date. They may delay crawling for a variety of reasons but it's not something you can control.

Action: No action is required. Be patient and Google will crawl the site when they can.

Duplicate without user-selected canonical

The best way to explain this one is with duplicate content. In a nutshell, Google thinks the URL has more or less the same content as another page. Since you didn't set one or the other as a canonical, they are choosing for you.

Google doesn't penalize for duplicate content. They simply choose to show one page over the other. This is so that the search results don't show up with multiple links from the same website all saying the same thing.

Action: If you think Google chose the wrong URL as the preferred page, you can do one of three things. I recommend the first as the best option.

  1. Update the content so that the pages are unique. Aim to have around 80% of the content on the page be unique though there is no hard and fast rule to this.
  2. Set the canonical in the code.
  3. Ignore the report as it's not an error and Google is doing what it should be doing.

Duplicate, Google chose different canonical

In this report, Google has a page that you've marked as the canonical but they think a different URL is more appropriate to use. This is Google's way of helping you and attempting to resolve a potential issue.

You may see a few different types of URLs such as

  • the collection-aware URLs
  • junk URLs (spam) that shouldn't exist
  • pagination in the URL parameter such as ?page=8

Action: Look at the URLs in this report and use the INSPECT URL link. Check the "User-declared canonical" and the "Google-selected canonical". These URLs may be different but as long as the one Google selected is ok, you should be good to go.

Dig into the reports

Perhaps the most common action site owners do when they receive emails from Search Console is panic. Please don't panic!

Instead, take time to click through Search Console and investigate the issues. From Search Console's Overview page, select Full Report under Indexing. Here's where you'll see all the potential issues.

Under "Why pages aren't indexed" you'll see the full list of reasons. Click on any one of those to look at the URLs.

Then use this article to help decide if action is needed from you or not. For other issues in the page indexing section that are not covered here, use Google's support documentation.

As you can see from the sections above, in many cases, no action is required from you. It's always best to dig into the report just to be sure.

JSON-LD for SEO

Get more organic search traffic from Google without having to fight for better rankings by utilizing search enhancements called Rich Results.

Linking Llama

Link discontinued products to their best substitute. Keep discontinued products published on your website and continue to benefit from traffic to these pages.