A Brief History of Google and Content – SEO Monday

By Nik

A history of Google & content that will change the way you see SEO forever!

Today’s post is part of a webinar I had recorded back in February. This section goes through a brief historical recap of the change made by Google over the past several year in response to the changing content habits of the SEO world. I hope you find this insightful and that is brings you a greater sense of why so many types of content seem to “not work” in Google, even when it sounds like everybody’s telling you to simply add content to your site.

The Way It Used to Be

Let’s start with the way it used to be. How far back? Pre-2008 by my account. It wasn’t until 2008 until I personally noticed the first change to Google, so unless others noticed some first moves, I say putting a stake in the ground in 2008 makes sense. So what happened before 2008?

Well, the way it used to be was simple. When Google looked at your site, it would make very rudimentary judgment calls. For example, if you had a page with no body content, your page was placed in the supplemental index. If your page had content, then it was put in the primary index.

Wait… what’s that?

  1. Primary Index: The primary index is where Google stores what it considers to be the most suitable pages for search results. In other words, if they are going to serve up any pages as a result for a particular search query, the preference is to serve up from this index. So, if you type in a search query, the first many results will be from the primary index. So, if Google can scrounge up 50 results from the primary for a particular result, then the top spot you can get if you’re in the supplemental is 51. You can identify what pages are in the primary index by simply doing a Google query: site:yoursite.com/*
  2. Supplemental Index: The supplementary index is where Google keeps pages that it doesn’t want to serve up immediately. Then why do they even keep those pages? Simple – to serve a seemingly infinitesimal set of results. In some cases, it would be ridiculous if Google didn’t serve up supplementary pages, especially if the term is incredibly long-tail. Can you imagine getting only one result for a term? Or none at all? The supplementary plays a major role in what Google is all about. For many websites that have low quality content and that “live in the long-tail”, this historically has been a great source of traffic… ooohhh, foreshadowing…

2008: Duplicate Content

Something happened around 2008. We work with a lot of Yahoo! Stores at Exclusive Concepts. Around 2008, we noticed that these sites were suffering major drops in their rankings – the type of drops you’d see if your page was knocked out of the top results (primary real estate) and put into significantly lower results (supplementary real estate). We couldn’t identify the issue. These pages had content on them! Great content at that – straight from the manufacturer.

We started querying quoted text from these sites and the top results were not these stores – the resulting sites were shopping engines like “thefind”. In fact, it was just as common to see that happen even if the store had created its own unique content. Shopping engines had just started using catalog feeds to populate SEO content on their sites, and we were witnessing a displacement of our client’s rankings because they were seen as copying the content from other sites, primarily shopping feeds.

It soon became clear that Google had created a separation of the same exact (duplicate) content into primary and supplementary based on which iteration Google considered to be the source, and which was considered the duplicating party. How did it make this judgment? Simple – whichever was spidered first. Unlucky for webstores, shopping engines have a high crawl frequency.

So now that good quality content (albeit duplicate) was being put into the supplemental index, and those who tested creating unique content for their sites were seeing better results – a new trend was created in the industry: create unique content, no matter how bad it is – and you’ll get amazing results. Lo and behold, the internet starting becoming a bit spammier – thanks to Google’s insistence on unique content.

2009: Google Caffeine

As websites starting investing in cheap, sweatshop quality content, their results started booming – and thanks to Google’s rollout of Caffeine in Dec of 2009, crap content got the shot of sustenance it was always looking for! All of a sudden, Google was spidering the internet faster, making judgment calls on the uniqueness of content faster, and your pages would end up with a stronghold in the primary index with superpowers if it was simply unique. It could be as poorly written as you wanted. So why pay your copywriter the average of $30-40 a page, right? Overseas, you could get 40-50 pages written for the same cost. And so the internet got spammier and spammier. This was the great depression of onsite content – there was hardly a page on the internet that was readable, yet every ranking page seemed to have some form of content.

2010: Mayday

Mayday! Mayday! I can only imagine what Google’s team did for months, weeks, days, hours – or maybe just a few minutes – to discuss how they could combat the decreasing quality of their results. Somebody’s brilliant idea was to (imaginary paraphrase) “just keep drilling: maybe Google’s not doing enough to combat duplicate content?” I know what you’re asking yourself: “wait, duplicate content? I though the issue was quality.” You’re right, they were wrong.

Mayday, in May of 2010, created a new tier of penalization to duplicate content. Let’s say that many pages being crawled by Google had the same content. The first few would go in the primary. The next few would go into the supplementary. And now – the rest would simply be deleted. Yikes!

This created a race for rewriting content on websites that were penalized by Mayday. The original content was often very well written content, with perfect specs and brand details from the manufacturer – and now, it was being re-written by somebody who was trying to put a few SEO keywords into 50 words of content that needn’t make any sense at all.

Although Mayday may have reduced some server usage on Google’s part, it also fueled more low quality content creation.

2011: Panda

Then in 2011, Panda came about. Panda (an engineer at Google) recognized that low quality content was getting out of hand. He came up with a way to target the loophole that SEOs and websites were using – low quality content. He took it even further than just penalizing page-by-page: he made it mandatory that all sites clean up their act or they weren’t going to win any brownie points with Google.

The algorithmic aspects were simple – use a few signals to determine quality. They seem to include:

  1. Comprehensiveness of topic – does it include semantically-broad terminology in regards to a topic – ie. does the topic reflect other top results, “related searches”, the “topic corpus”, results from Google insights, etc. There are a multitude of internal systems that Google can utilize in order to ontologically determine the depth of a piece of content based on the topic at hand.
  2. Uniqueness of keyword footprint – does the content have the same keyword usage as all the other pages trying to rank for the same keyword?
  3. Quality Assurance – is the content properly edited and readable.

These simple signals easily make it impossible to write content on the cheap, or in a rush. That’s the point.

At the same time, this algorithm only judges on a page-by-page basis – but it penalizes on a sitewide basis. Basically, if you have more bad signals on your site than good in an unhealthy disproportion, then you could be penalized by Panda. You don’t need to rush to create good content to get back in Google’s good graces, but rather should take the time to add good content periodically to the site. To offset this prolonged calendar of content writing, you simply need to request the de-indexation of low quality pages from Google’s index. Then, when Google recalculates the proportion of high quality indexed content to low quality indexed content, the ratios are healthier and favored by Google. That’s how you lift the Panda penalty.

2012: Penguin

Just when you thought nothing else could be mixed into Google’s lethal injection cocktail, they rolled out Penguin.

It’s really too early to comment confidently on Penguin, but there are already some noticeable content-related stances that Google has taken. Penguin seems to focus on some of the old-school tenants of Google, which were aimed at identifying over-optimization. While it seems apparent that even simple over-optimization of keywords can potentially land you negative Penguin points in regards to your rankings for keywords that you are trying too hard to target, the rabbit hole is a bit deeper than that.

Compare your inbound anchor text to your on-page and you may be able to identify your next steps with Penguin:

  1. KW Densities & Permutations: If your on-page densities are unnatural (noticeable bad territory: over 8% including anchor text from your nav), then rework your content. You may even want to consider reworking your navigation.
  2. Inbound to On-Page KW Matching: If your inbound anchor text focuses on words a, b, c, and d – but your on-page focuses on a, b, x, y and z, you’re a likely candidate for Penguin according to our research. Make adjustments accordingly – focusing your on-page to match inbound might be the easiest route here.
  3. Inbound Anchor Text Densities & Permutations: If your inbound anchor text is way too unnatural, for example if you have 20 links using 1 phrase, and no other links at all, you should focus on creating a more natural distribution. Right now, this is a Penguin (algorithmic) issue, but it could also serve as a red flag for manual penalization if you don’t address it now.

Best of luck and we’ll keep you updated on our research with regards to Penguin – sign up to the blog to stay in the loop.

Finally: The Cheese Stands Alone

In the end, there’s only one breed of content that survives the gauntlet of algorithmic changes that Google has built over the past few years. Stick to it:

  1. Keep it unique
  2. Research and be comprehensive
  3. Use natural keywords and unique ones too
  4. Double check your content
  5. Don’t even attempt to stuff your keywords
  6. Make sure your backlinks are as natural as your content

De-index the rest if you need to. Keep your head above water and best of luck! Get an SEO Audit from Exclusive Concepts if you want our team to help you get up-to-date on your SEO.

FacebookTwitterLinkedInShare