Google Penguin Recovery Method - The Orca Technique
Today I am proud to present a tutorial and case-study on how to recover a domain that still has a many good links from a Penguin drop using redirects, Scrapebox and Link Detox Boost.
Our Certified LRT Xpert, Bartosz Góralewicz, again shows and tells everything, and you should not miss this.
Enjoy & Learn
Christoph C. Cemper
Table of contents
- How Penguin works?
- How to diagnose Google Penguin problems?
- Starting over with a new website?
- Is Penguin really a penalty?
- How do you recover from Penguin?
- What do you need to recover?
- Expected outcome
- Recrawling of Links
- What should I expect now?
- What next?
- TL;DR (summary steps and infographic)
For 2 years, Penguin was released quite regularly. The SEO community and webmasters got used to that. Everyone was expecting a Penguin update during the Spring (around May) and then again in the Autumn (around October). Right now, it’s already September 2014, and it has been more than 300 days since the last Penguin update (October 4th 2013). It is keeping thousands of websites under water and making them slowly lose hope that they will ever recover. Not to mention the income loss for them.
So far, the only solution published is to start over with a new domain.
UPDATE: In last Google Webmaster Hangout John Mueller said that next Penguin Update is still not around the corner!
How Penguin works?
Then you have got to wait. This is actually the worst part.
The Google Penguin Update can decrease your ranking when it rolls out, but you can gain the lost ranking back only during the next update/rollout. Previously, this wasn’t such a huge problem. If you got penalized in May, you could hire an SEO professional to clean up your link profile and you knew you would recover in early October. In less than 5 months, you got a clean slate and your SPAM sins were forgiven.
But 2014 is different. The last update was October 4th 2013. That was (in my opinion) the most significant Google Update to happen so far. This was reflected by a few factors:
- The Black Hat community took a huge hit. This is one of the unspoken (or at least rarely mentioned) facts. Many black hats actually left SEO or changed their approach after Penguin 2.1
- Penguin 2.1 was really susceptible to negative SEO.
- Penguin 2.1 (and Panda 4.0) also influenced GOOG stock prices a lot
How to diagnose Google Penguin problems?
Simply look at the visibility (or organic Google traffic in Google Analytics) and compare the date of the drop with the Penguin dates. You can find all confirmed algorithm updates here
On the screenshot above we can see a drop between April 22nd and April 29th 2012. Compared with Google update dates I’ve easily diagnosed Penguin 1.0 (April 24th 2012).
Starting over with a new website?
I know that many online business owners are going out of business after their sites have been under water for more than 300 days now. John Mueller said in one of Google’s hangouts that sometimes it is not worthwhile to save the website, and he simply recommends a new website and a fresh, clean start for some owners. But you don’t want to do it, do you?
Is Penguin really a penalty?
There are also many articles about penalties being transferred to the new website.
Also, John Mueller mentioned the possibility of a penalty going after you (~minute 23.00)
But while focusing on all that technical stuff, the SEO community appears to have missed one important thing…
According to Googlers – Panda and Penguin are not penalties. Yeah – it may sound crazy, but this is a fact. I wrote a little bit more about that here – ex-googlers Panda & Penguin workshop. Googlers clearly claim (and I have heard it few times already) that a penalty = a Manual Penalty. All the rest are algorithms. And you would be surprised how obsessed they are about calling things by name. During the workshop with Googlers, I observed them correcting each and every person asking about the penalty, so really stressing that it be labeled correctly.
To make this point even clearer, let me quote Matt Cutts about the Penguin algorithm here as well:
“It does demote web results, but it’s an algorithmic change, not a penalty. It’s yet another signal among over 200 signals we look at.”
Now when you consider that Google is pushing up to 6 updates per day, it really is logical for them. Imagine that a website went from position #3 to position #11 after a Pigeon update. Would you consider this website penalized?
For me personally, the answer to that is not so clear, as Penguin keeps websites demoted till they clean up their act, which sounds like a penalty to me, but that’s not the point here.
How do you recover from Penguin?
As you see, your website is not being penalized. I keep on getting this strong feeling that during some of the hangouts John Mueller is trying to “scare” webmasters into thinking that there is nothing you can do but clean up your act and wait, or even start afresh with a new domain. It is not true.
You can recover from Penguin with a 301 redirect if you do it right.
What do you need to recover?
- A manual link audit with e.g. Link Detox
- A new domain (no history at all, no backlinks, no content in archive.org, etc.)
- Webmaster Tools access (owner level) for both the new and old domains
- Link Detox Boost
I have been doing this for quite a while for my customers. I came up with this solution after some of my affiliate sites got hit really badly by Penguin 2.1, and as I had nothing to lose, I was running many tests.
The outcome depends on the amount of good links you have left. If you disavowed 90% of your backlinks – don’t expect to recover. You will probably rank even worse. For most of the websites I work with, the ratio of bad links is between 15% and 40%. This is the maximal ratio you can have when I think it is worth it to consider a 301 redirect.
This is only one of the websites I’ve been working on. After the redirect they’ve gained ~3,000 more visitors daily (the base number was around 5,000 per day) within the first month, to really skyrocket later (the website got partially hit with the Panda in May) with around 12,000 more visitors daily 2 months after the redirect.
I saw many attempts to recover from Penguin, and most of them failed because of the link audit. This is usually the hardest part of the process. It requires a clear idea of what we are targeting and what we need to accomplish.
To clean up your link profile, you need to really deep dive into your backlinks. Usually audits I see are basically applied rules from Link Detox (all Suspicious and Toxic links disavowed).
While Link Detox is the most complex link audit tool on the market, just like Google, it may sometimes produce a false positive or a false negative.
Some factors are basically impossible to rate with any algorithm. This is why Google has their Webspam Team and Human Raters (Search Quality Raters) to double check the links before penalizing a website.
So our first step is to run a full Link Detox of our website.
Go to your toolkit!
Scroll down to the Link Detox tool
And start the Link Detox report.
Remember to either connect the Google Search Console (Google Webmaster Tools) account to LinkResearchTools or upload the links from GWT manually.
Nofollow vs Dofollow Evaluation Mode
This is a really sensitive subject in the SEO community. I will not be going into this discussion and present my personal point of view.
The Penguin 2.1 algorithm is still a huge mystery in the SEO industry. Google never published any technical information about it. Therefore, I cannot ignore no-follow links. If I were a Google engineer I would consider all the data I could to find to spammy link profiles, including no-follow links, which are a huge part of the link profile. In my opinion: it is way too huge to ignore. The Penguin algorithm is not made to determine my rankings (using do-follow links), but targeted strictly towards detecting a spammy, unnatural link profile. My understanding of a “link profile” = dofollow + nofollow links.
After you select to evaluate no-follow, scroll to the bottom of the page:
Untick “Remove Dropped Links.”
Usually you don’t need to worry about those, but in this case, we have to make sure that every single link is re-crawled before the 301 redirect. If you were running Link Detox before with removed dropped links, you will probably be shocked at how different the results are for those 2 options.
Remember to look for the redirected sites as well. Even the ones with a 302 redirect can cause problems. You can read more about this in a really great case study done by Derek Devlin - Double Manual Google Penalty Recovery + 302 Redirects Hurt Site
Now we need to go through all the backlinks manually using Link Detox ScreenerTM.
With all the “LinkNotFound” Websites, basically check if you would LIKE to have a link within the domain found. It may be not found due to many reasons, but 95% of those backlinks will be scrapers, expired domains etc., and we can surely disavow those on the domain level.
Tip: Sometimes it is good to search for your domain with a “mydomainname” site:DomainFromTheReport.com command.
During the link audit, disavow all of the bad links on a domain level.
After we are done with the link audit, we need to download the disavow file and submit it to our Google Search Console (Google Webmaster Tools) account. You can do it by going to:
Once we have all that data, we can go ahead with the next step.
Forcing the Recrawling of Links
If we want to redirect the domain, we can’t do it without making sure that each of our links was re-crawled by Google. We’ve got to do it around 24 - 48 hours from submitting the disavow file. Of course it is best to start a Link Detox Boost just after submitting the disavow file and then continue it for around 3 days.
Here is how it is done:
First, export all the backlinks from the Link Detox report using your favorite format (XLSX or CSV)
Now copy all your backlinks (FROM URL tab) to the clipboard.
Now that we got all the backlinks exported we can start a boost campaign.
Now we need to set up the Link Detox Boost:
Copy all the backlinks from the downloaded XLSX file and paste them here, in the Disavowed URLs: field.
Tip: Remember to paste all of your backlinks, not only the ones in the disavow file.
Now scroll down to fill in the rest of the settings:
All you’ve got to do now is tick the box saying that your disavow file has been uploaded in your Google Search Console (Google Webmaster Tools), and then scroll down to agree to the Terms and Conditions. You can upload your disavow file here if you like, but this is not a must. There is no need for that with what we are trying to accomplish.
OK, we are finished and we can run the Boost now.
After we run the Link Detox Boost, we can monitor the results by going to the report’s page. We can see if the URL was boosted, and if the Google Bot actually visited our URL. I think this is the only tool on the market that actually checks for Google Bots and shows the exact date of the crawl.
When we are 100% sure that all the backlinks have been crawled, we can proceed with the redirect.
Disavow file upload to the new domain
Before redirecting the old domain to new domain, remember to upload the disavow file made for Olddomain.com to the Google Search Console (Google Webmaster Tools) of NewDomain.com!
It all changed in 2013 with Google going after 301 redirects made to make the site “penalty-safe.. Black Hat SEOs were using satellite sites redirected to the Money Site to build spammy links to those websites. This way, it was much harder to get the Money Site penalized. That all changed though.
301 redirect’s “transparency.”
Now, if you redirect site A to site B and go to the Google Search Console (Google Webmaster Tools) of site B, you will see backlinks pointing to site A as direct backlinks to your site (site B). This is why so many people failed with their attempts at 301 redirects after being hit with Penguin.
To prevent any spammy/unnatural links from hurting your new website, use your disavow for both the old AND new domains.
I will not explain how to redirect a domain step by step here, as I think that is the topic for a separate article altogether. Creating a 301 redirect is a simple and basic thing for any webmaster. If you have somebody taking care of your hosting/CMS, they will definitely be able to do it for you. I will only list the steps necessary to benchmark their work.
- Using any crawler (e.g. free XENU or paid Screaming Frog) list all of the indexable URLs within your website. Alternate method – if you are 100% sure that your Sitemap is good, export the URLs from the Sitemap using the Scrapebox Sitemap Scraper or any other solution to export the URLs from the sitemap.
- Export all the URLs to a txt file. This will help you diagnose if the 301 redirect is done correctly.
- After you have your website scraped and listed, you can redirect it to a new address.
- Check if all the URLs are pointing to the right URLs on the new domain, e.g. olddomain.com/page223 should point to newdomain.com/page223, and not to newdomain.com
If you are 100% sure that your redirect is done 100% well, you can skip the ScrapeBox section below by clicking here.
Automated redirect check of all the URLs with ScrapeBox
If you want to check it all automatically, use the ScrapeBox Alive Check add-on. If you don’t own ScrapeBox, it is not too expensive to buy for a one-time payment of $57 (instead of $97) using this link: http://www.scrapebox.com/bhw (BlackHatWorld discount).
Now if you open ScrapeBox it looks something like this:
Few people know however that it is not only the ultimate comment spam machine. It is something I can’t imagine not having in my SEO tools (just like LinkResearchTools).
Even more power is available when you go to Add-ons:
As you see there are quite a lot of useful tools you can use for many SEO objectives you may have. All you need is some creativity, and of course some knowledge of how to use the tool to its full potential.
Now go to ScrapeBox Alive Check:
Now when the tool opens, load the URLs from the olddomain.com that you would like to check.
Now click on OPTIONS and set up the Alive Check to follow 301 redirect and to report the URLs that are not redirected correctly.
Use the settings above. This way all the URLs that are redirected with a 301 redirect with the code 200 on the target domain (Follow relocation ticked) are marked as ALIVE.
Tip: with these settings, the URLs that are not redirected at all will return false positives. Make sure that your htaccess file/php redirect is uploaded/configured right before proceeding.
And now we can start checking the Alive Check by simply clicking START.
All you’ve got to do now is simply export all of the Dead URLs if there are any. If not – congratulations! Your redirect is perfect.
Tip: It is worthwhile to check if your website’s 404 page really returns a 404 response code to the bot. You can do it with web-sniffer.net
Below you can see an example of the analysis done with web-sniffer.net (non-www to www redirect for http://linkresearchtools.com).
Now that your website is properly redirected, both disavows are uploaded, and links are re-indexed, we can move on to the next part.
Recrawling of Links
After the 301 is finished, we can (and should) speed up the redirect.
I know that John Mueller has said many times that Google will eventually re-crawl everything and push all the Page Rank to your new website. I also know that they will release Penguin 3.0 ☺
While this is all probably true, time is money and my goal is to make money for my customers, so let’s speed it up.
What are we trying to accomplish here?
This whole part of the article is dedicated to speeding up the redirect’s indexing. Of course, we could skip this point and wait for 1-3 months till Google re-indexes most of our pages (yeah, not all of them). Personally, I am not that patient when it comes to my customers, as I usually try to show them results as quickly as possible.
After the redirect is done, there are always some things that I don’t like. I am always pursuing the perfect scenario, so this is what we are going to do this time as well.
Things I don’t like after the redirect:
- Duplicate results in Google for your content queries.
- Content indexed on olddomain.com which is not on newdomain.com yet.
- The OldDomain ranking higher than NewDomain.
- NewDomain not ranking and OldDomain ranking
- Site:olddomain.com > site:newdomain.com
All the problems above are caused by Page Rank not being fully transferred to the new domain.
For example, if olddomain.com/article2453 is indexed in Google and newdomain.comarticle2453 is not indexed, as I understand it – Google is still keeping Page Rank and all of the other signals from this page on the old domain. If so, your new domain ranks lower, as not all the content within this domain is indexed.
The best way to get rid of this problem is to make Google re-crawl all of the redirected URLs.
Sitemap scrape (the easy way)
Use all the URLs that you scraped or exported from the sitemap (in the Redirect section). Use all of those URLs from the old domain and start a Link Boost with those backlinks.
Google Index Scrape (the harder way)
You would be surprised which URLs you will find indexed in Google when scraping your website’s index from Google. This is not something that should be ignored. 90% of SEOs look at the Google index (you can check it by googling site:domain.com) before looking at any other factors. Keeping your index clean of all the unwanted/duplicated pages is also a Panda factor.
After the 301 redirect is done, you need to scrape (extract) all the URLs of the old domain from Google. As Google doesn’t let you extract the indexed URLs, you need to use automated scrapers to do the work for you.
Scraping Google’s index with ScrapeBox
There are a lot of tools to perform automated Google Searches (to scrape Google). ScrapeBox is one of the easiest to use though, and it is also a tool that you can use for almost any other off-page or on-page SEO work.
If you’ve already got some experience with ScrapeBox (or similar tools), you need to know that scraping Google’s index is different than any other scrapes. It took me a lot of time to get my scrapes really close to the number of results in Google. The answer to the question about how to achieve this is actually quite simple.
To scrape Google’s index we need 2 things:
- site:domain.com command
This way our search looks like this: “keyword site:domain.com.” Using simply site:domain.com will result in a maximum of 100 unique results. This is not enough for most websites (with less than 100 URLs I recommend to basically copy the results from Google manually).
This seems simple enough, but the choice of keywords tends to be quite difficult. With generic keywords like “click here, a, the, post, next, previous, etc.” we get really limited results as well. To scrape a website with 1,000 indexed URLs we need around 100 – 150 website-related keywords. Where can we get those from?
Google Search Console (Google Webmaster Tools).
After running many tests, I figured that the best keywords to scrape are the ones you are ranking for.
To get a list of the keywords that your domain is ranking for, simply go to Google Search Console (Google Webmaster Tools), go to your domain’s dashboard, then to “Search Traffic” and “Search Queries”:
By simply clicking “Download this table” you get all of the queries (keywords). If you want to get more keywords, simply change the dates to more than 30 days (up to 90 days’ worth of history is available in Google Search Console (Google Webmaster Tools)).
Now we can use the keywords extracted from Google Search Console (Google Webmaster Tools) in ScrapeBox. This way we’ve got really good keywords related to our website (according to Google). We want to use them for our “keyword site:domain.com” command.
Of course, we also need some proxies to use with ScrapeBox. You can find proxy sources quite easily with just a few minutes of Googling. If you are really lazy, you can get them with e.g. BuyProxies.org, but private proxies are not the best idea for scraping. If you’ve really got a huge problem finding a proxy for ScrapeBox, feel free to contact me directly; I will send you a fresh batch ☺.
Now, when you get Google Proxy working, all you have to do is click “Start Harvesting.”
After the harvesting session is finished, we need to remove all of the unwanted results and de-duplicate the URL list.
Enter your domain address and click OK.
Now you should filter out all of the unwanted results from your list. All we need to do to finish the scrape is de-duplicate the URLs. We don’t want more than 1 unique URL from our domain.
Now that we got the whole index scraped and filtered, all we need to do is submit it to Link Detox Boost.
This way we got all the URLs still in the Google’s index crawled again to speed up the redirect’s indexation.
After all this hard work, we are now finished.
What should I expect now?
Remember not to open your Don Perignon too early. It often happens that first few days your website will rank twice for 1 keyword. This can cause a traffic spike that will eventually get a little bit lower. In my case it was unique with every niche I did it in. Sometimes the redirect and traffic spike finishes in 4 days, sometimes in 2 weeks. Be patient and expect the traffic to go up.
Watch out for Panda
Remember that we fixed all of the off-page problems with the right disavow. But the website’s content is still the same. If this was the reason of your problems in the past, they will surely come back to hunt you again.
It is a time now to monitor your traffic really closely. E.g. if you got a really huge spike and after 5 weeks your traffic goes down all of the sudden, I would look into potential Panda issues. Redirect may (not always) help with Panda, but user experience or on-page problems will always come back until it is fixed completely.
With a Penguin issues fixed, we are only half way there. You can’t just leave it like that now.
To continue what you’ve started with the redirect, you need to do some more hard work to build the authority of the new domain.
Redirect is always “killing” some of the link juice your website gained overtime. It is a good time to start getting (not building) some new natural links. With only the redirect done, your rankings will start to decrease overtime due to lack of positive off-site signals.
Monitor your backlinks
301 redirect is not making the old domain unimportant. You still need to monitor the backlinks going to that domain and risk related to its link profile. It is best to either monitor new links manually once per few weeks (depending on the amount of new links/month). You can also use Link Alerts from LinkResearchTools.
Only thing that comes to my mind when thinking about a smart summary of such a wide topic is:
Be brave or die waiting 😉
There is no short and easy way to sum up the scale of Google’s ignorance for Penguin victims. I was myself a witness of a few companies going bankrupt because of Penguin and tens of people laid off from work while their employees were (and are) waiting for recovery.
We reached the moment, where many webmasters become desperate. The ones with some “Google sins” cleaned up their act months ago and are waiting and hoping quietly for Google’s mercy. Unfortunately, there is one more group. Negative SEO victims.
For years now, negative SEO was an excuse for most black hat SEOs. With any manual penalty or algorithm update, they would say “wasn’t me – it was negative SEO.” Until negative SEO became a real thing recently.
For the ones doubting that negative SEO is a real thing, just contact me, I am happy to give you a few thousand (!) of real life examples.
The solution I explained here is not the easiest one.
Fortunately, if done right, the whole thing can be accomplished in even 7 days. With Google, Penguin updates becoming unpredictable, and basing your income and business on the next update date may be a destructive strategy. If you can change the domain address, then this is the best (and only) solution out there.
John Mueller said many times that sometimes it is better to start with a new domain than try to fix the existing one. In a somehow complicated way we are doing exactly what he advised. ☺
For those of you that only want to know the general idea, here is a recap on the strategy to recover:
- Perform a link audit
- Upload the disavow to both (old and new) domains
- Help Google re-crawl all the backlinks pointing to the old domain
- Redirect old domain to new domain with a 301 redirect
- Scrape all the URLs from Google’s index
- Re-crawl all the indexed URLs
Enjoy the recovery!
Share this Penguin Recovery Infographic On Your Site
Why is it called "Orca Technique"?
"Sharks and killer whales (orca whale) are also primary predators for penguins when they dive into deep waters. This is also true when they whales are in their migration season because they will be much closer to the coastline. While the killer whales are looking for large marine mammals during the migration, a quick snack of penguin is too good to pass up." source: http://www.penguins-world.com/penguin-predators/
This tutorial was written by Bartosz Góralewicz, CEO at Elephate, and proud user of LinkResearchTools and Link Detox.
It makes us happy when we're able to provide resources like this to our readers. Clear strategies and how-to guides simplify what can be a confusing process.
Bartosz demonstrated his expertise in this guide by showing how to use three SEO tools to implement six strategies in order to recover from a Penguin hit. Therefore, I’m very happy to publish his research on our site.
I look forward to Bartosz’s future work, and I personally recommend working with him whenever you get the opportunity!