When to use robots.txt or noindex
Perhaps one of the most confusing and misunderstood part of technical SEO is knowing when to use the robots.txt file or set a noindex tag.
Though JSON-LD for SEO has no impact on indexing or the robots file, I still get this question a lot.
Lets make sure we understand what each of them do and when you should use them.
What is the robots.txt for?
Robots.txt is to tell search engines not to crawl the directories or files.
The robots.txt file is a recommendation, not a directive so you may have some URLs sneak through. This is especially the case if Google discovers the URL from other means such as a social media post.
Using the robots.txt is best used for specific areas of the site you don't want Google to crawl.
I discuss the robots.txt file for Shopify stores in more detail if that's helpful to you.
What to include in the robots.txt file
You should include anything you don't want search engines to crawl.
If we allowed Google to crawl these pages, it would be a huge waste of resources. And those resources could be better served on a more important page like a product or collection.
Let's use your checkout URL as an example.
Customers are not searching for your checkout page. Any content on the checkout page that would be helpful to customers should be all over your site, not just the checkout. If you have a promotion of "spend $50 for free shipping" that should be all over your website.
The good news is that Shopify already handles these pages for you in the robots.txt file.
Though you can edit your robots.txt file in Shopify, I'm personally not a fan of doing so. Most who edit their robots.txt file don't know what they are doing and end up hurting more than helping.
What not to include in the robots.txt file
The part where many folks start to panic is when they see indexing issues for URLs that look like ?pr_prod_strat=
or ?srsltid=
appended to valid URLs. Everything after the question mark is called a parameter.
Parameter URLs aren't something to stress about.
It's ok that search engines sees the parameter URLs. Often times those URLs are used for tracking purposes for both Shopify and Google.
Blocking these parameters in the robots.txt file will not impact your rankings for better or for worse. It's also rare that your parameters actually show up in search results. Remember, parameter URLs showing up in the Search Console Indexing Report does not mean those URLs are in search results. It means Google found those URLs someway.
As long as you have the canonical set to the main URL without the parameter, Google will usually ignore the parameters.
Yes, really!
Parameter URLs in Search Console is Google's way of telling you that we see these parameters but we're going to ignore them.
In other words, the canonical is enough. You don't need to add more to the robots.txt file.
NoIndex
Setting a noindex tag is to prevent a URL from being indexed and included in search results.
In Shopify, you can set a noindex tag on URLs without editing your code.
When to use noindex
You should add noindex
meta tags to specific URLs that you don't want showing up in search results. In other words, noindex blocks visibility in search results.
If, for example, you have a private product that is only available to subscribed members, then you may benefit from setting a noindex tag.
Unlike the robots.txt where you may still see URLs showing up in search results, if you set a noindex, Google usually follows that directive.
As it relates to the parameter URLs, do not attempt to use the noindex tag to block parameter URLs.
Shopify uses liquid code and you'll see lots of code examples attempting to remove parameters like pr_prod_strat
or seq=uniform
.
Liquid doesn't allow you to get URL parameters so these solutions will never work to remove only the parameters from indexing.
Even if you could, it's incredibly risky. You run the risk of noindexing the entire URL which would not be ideal by any means.
Can I use robots.txt and noindex?
I would avoid blocking a URL from being crawled in the robots.txt file AND setting a noindex meta tag.
Google needs to crawl the page to see your noindex meta tag.
As such, if you block search engines from crawling the page, they will never see the noindex tag. Causing them to index the URL you don't want showing up in search results.
Think of the robots.txt file as bugs crawling through your site. Don't want bugs crawling, use the robots.txt to block them.
The noindex is already telling you in the name that it has to do with indexing. You can think of a noindex tag as Do Not Index or Do Not Include in Search Results.
JSON-LD for SEO
Get more organic search traffic from Google without having to fight for better rankings by utilizing search enhancements called Rich Results.