The myth of duplicate structured data

By Ilana Davis

or: How you can learn to stop worrying about structured data and love the Schema

Today I want to clear up some confusion about how structured data works when it’s duplicated. I’ve written about it before (here and here) but with some recent comments by other apps that provide structured data, I think there’s some confusion going around.

Caution: be careful what you read and listen too

First I was to start by warning you to be careful who you listen to about structured data and how it relates to SEO.

Most people these days know that SEO is full of lies, half-truths, and misinformation.

It’s also full of opinions that haven’t actually been tested.

Unfortunately, structured data crosses with SEO enough that some of that misinformation is used with structured data.

Some people mean well and are operating with your best intentions at heart, but structured data is such a deep technical topic that it’s really easy to people to get out of their experience.

It doesn’t help that Google is very vague with structured data and how it works.

But listen to me

Yes, I can appreciate the irony in telling you to be careful with who you listen to and then right away saying you should listen to me.

I get it.

Here’s my experience with structured data. Use this to see how deep I’ve gone and make your own decision on how much you could trust me.

(And if you already trust me, feel free to skip this section)

I own JSON-LD for SEO which automatically creates a set of structured data for Shopify stores.

I’m living in and working with structured data every day, some days for hours on end.

There are thousands of stores running the app. Of those thousands I’ve personally reviewed, checked, and talked with at least a few hundred stores about their structured data. I’ve even learned enough to train someone non-technical to help me evaluate store’s structured data.

You don’t have to just take my word though, the app has over 400 5-star reviews on the Shopify App Store. All of those reviews are actual customers and since the app has always been a paid app, they aren’t reviews from people who got a free copy in exchange for a review.

I have deep experience with structured data and how it works in Google. Deeper than many larger, more generalist people would have.

It’s no fault to the generalists for not knowing either.

Structured data is very complex and nuanced. Now combine that with Google’s written and unwritten rules about how they use structured data for their Rich Snippets and you can start to see the problems.

Structured data is a topic for easy confusion without enough expertise.

That’s why JSON-LD for SEO isn’t a general-purpose SEO tool. It’s designed to go deep and provide the best, safest, and most effective structured data for Shopify stores.

(One side note: this deep expertise allowed me to write a custom, hand-coded set of structured data for JSON-LD for SEO’s app page. Within days of publishing the page, Google has already given it Rich Results. Quite a bit faster than the usual 8-13 weeks that’s common with non-optimized structured data)

With that out of the way, let’s dig into how Google uses structured data.

The myths of duplicate Structured Data

I’ve seen reports from a few Shopify SEO apps and themes saying duplicated structured data is bad.

That’s 100% incorrect.

Google doesn’t care if you have duplicate sets of the same structured data on a page

Unlike duplicated content, duplicated structured data is the norm.

If you do what I do and enter a bunch of random sites into Google’s Structured Data Testing tool, you’ll quickly find that even the best sites with the most search enhancements have duplicated, incorrect, or even missing structured data.

Comments saying duplicated structured data is bad is a myth. It’s a misunderstanding of how everything works, most likely because they don’t have enough deep expertise needed with structured data.

How structured data is actually used by Google

So how is structured data actually used then?

What I’ve seen from my hundreds (thousands?) of tests is that Google follows a pattern with how they use structured data, especially product data.

  1. They detect all Product data on a specific url.
  2. They analyze that data to find the best and most relevant set of data.
  3. That data gets added to their database which their algorithms might use for the search results (resulting in Rich Snippets or other search enhancements).

For sites with valid, well-formed, and non-duplicated structured data, that’s the entire process. Simple.

How duplicated structured data is treated by Google

With duplicated structured data, the second step becomes the key to the confusion. That’s how Google ranks multiple sets of the same structured data.

What follows are the different levels of analysis Google uses on your structured data.

They start at the top with 1 and work downwards until they find something good.

Level 1. First they reject all product data that is for different products than the one shown on the page.

It’s super common with Shopify themes and related product apps to have some small sets of structured data added to the product page. This results in polluting the page for “Red ball” with the related items “Blue ball” and “Red wagon”.

When that happens Google just rejects those irrelevant data right away.

One very clear signal of relevance with the product data is if the page url matches the product data’s url.

When Google visits the specific pages for the other products like the Red wagon, then it’ll use their data.

If you understand how the canonical url tag works with SEO, this is very similar.

Level 2. They then look for a set of Product data that is the most complete and includes embedded reviews

Now Google starts to look for the best data they can find.

At this time this is a product data with several specific fields and reviews embedded in it using the aggregateRating field. I like to call these “linked” reviews because the product data is linked to the review data (and the LD part of JSON-LD means Linked Data.

Getting the reviews linked properly and recognized is a very difficult thing to do, which is why I’ve invested so much time into adding integrations between JSON-LD for SEO and various review apps.

Without a product with linked reviews, Google tries their next level.

Level 3. Next they look for an incomplete set of Product data and includes embedded reviews (aggregateRating) and that is at least error-free

Right now Google wants to show review Rich Snippets so they’ve optimized looking for data that includes reviews, even if the product data is incomplete.

This is the type of structured data the majority of Shopify review apps provide.

A problem can occur when the data missing is critical data that’s used in the Rich Snippets. The classic result is that your product will only get the reviews into the Rich Snippet, not the price, not the availability, and not anything else. That’s because the reviews app is only providing the product name, url, and the reviews.

The incomplete data also can make the Rich Snippets become very inconsistent. They’ll show up one day and disappear the next. Flip-flopping like a politician near an election.

Level 4. Then they will look for a complete set of Product data that is error-free

Since Google couldn’t find a Product with the review data, they look for the most complete set of Product data. There are about a dozen or two fields needed at this stage but the important ones include the url, price, currency, name, and description.

Out-of-the box JSON-LD for SEO provides this level of structured data if you don’t have any review apps. If you have a review app that linked, Level 2 above would have taken affect.

This one could get you Rich Snippets showing the product prices and availability.

Level 5. Then they start looking at incomplete sets of Product data that are error-free

Now we’re getting near the bottom of the barrel in structured data quality.

If there is a set of Product data that is missing important fields but is error-free, Google will try to use whatever data it includes.

At this point, Google will even allow data that is misformatted and reported as a warning in the Structured Data Testing Tool.

The majority of Shopify themes are included in this level. Either they are missing data or they have the price data misformatted resulting in a warning.

Luckily, Google still will give Rich Snippets for these products sometimes. Otherwise Rich Snippets would be even rarer for Shopify stores. Though the Rich Snippets will be even more inconsistent and feel like they are always disappearing or changing.

Level 6. (optional) Sometimes Google will look for a standalone AggregateRating instead of a Product

This is a different data type than Product but Google will sometimes fallback to using it if there aren’t any good Product data types on the page. Sometimes it might even take priority over levels around Product data. It’s the most inconsistently applied level I’ve found.

This data type will only get review Rich Snippets because that’s all of the data it includes.

Some reviews apps like Shopify’s own Product Reviews app only provide this data. Sometimes it will link with a theme’s product data but a lot of the time the theme doesn’t have their data setup to create that link.

Errors in the structured data make it worthless

One thing that you might not have noticed, if a set of structured data has an error Google will flat out ignore that entire set of data for this process.

So if you have three sets of duplicated Product data but two have errors, only that error-free set will be used. Even if those other two are more complete.

I have an interesting proof about that.

I know of three Shopify SEO apps who had bugs in their Product data for at least six months which prevented their customers from getting Rich Snippets consistently. The product data was pretty good quality and would have gotten at least level 4 above but the errors caused it to be ignored.

Luckily, some of the stores were “saved” by their theme’s limited structured data (level 5).

I know about this because I got a lot of support emails from new customers asking about those errors and that I had to respond to saying that another app caused it.

The interesting thing about those issues, they were all from the same code. Either one app had the issue and the other apps copied that app’s code, or three distinct companies all developed the same problematic code.

Duplicated structured data is fine

Looking at that flow above, you can see that Google doesn’t care about duplicated Product structured data at all. They, being a software company all about algorithms, have a process and algorithm to evaluate all of the structured data on a page and pick the best one to use.

The order of the rules above might change as Google tinkers with their algorithms but their process is clear:

Duplicate structured data is expected by Google and your Shopify store is not hurt by it at all

That’s a good thing for you as the store owner. It’s also a good thing for Google’s customers, as they can find the sites that better match their searches.

Non-product data

In the above I wrote about the Product structured data format. But there are others too that Google uses (and another 500+ that Schema.org defines that Google doesn’t use).

From what I’ve seen in my research, all of the types of structured data follow similar rules. The fields that define what makes a specific data type “complete” are different and some data types cannot have reviews, but for the most part those rules still apply.

Which is great because is keeps things simple. There aren’t different rules for Products, Organizations, Articles, or any other types that you have to memorize. Just remember (or reference back to) the order above.

Google Merchant Center

Google Merchant Center is a little bit different than Google Search as described above. For the full details, make sure to read my article about Shopify, Google Merchant Center, and JSON-LD for SEO.

For the impatient though, the key differences in Merchant Center are:

  1. all Product data must be warning-free and error-free.
  2. all Product data must be complete and have the fields that Merchant Center requires.

If either of those are not true, Merchant Center will raise hell and stop your ads.

The solution is usually to fix your problem structured data or remove it. More and more customers have decided it’s easier to remove the structured data from their theme and rely on JSON-LD for SEO for all of their structured data. That option is fine with me but there are other options if you want to keep your theme’s structured data.

Tie this all together

Hopefully this finally puts an end to the confusion and misinformation about how structured data is used by Google.

Like I said in the introduction, it makes sense that people don’t understand how it works because structured data is complex and required a deep investment of time and energy in order to do right.

That’s the whole reason JSON-LD for SEO exists. I don’t want Shopify stores to even have to worry about their structured data. I want it make it “just work” for them. From what I’ve seen and heard about from customers, it does. Very well.

JSON-LD for SEO

Get more organic search traffic from Google without having to fight for better rankings by utilizing search enhancements called Rich Results.

Linking Llama

Link discontinued products to their best substitute. Keep discontinued products published on your website and continue to benefit from traffic to these pages.