5 Steps to Take If Your Blog Posts Are Being Stolen
Modern marketing is all about content. Getting a blog up and writing lengthy, valuable posts is the lifeblood of thousands of businesses. Many people spend their full-time jobs working to create the excellent content people read and share online every day.
It's understandably devastating, then, to find someone else online has simply stolen your content. The copy and paste function is right there, and they just yoinked the whole thing, word for word, and published it on their own site. So what can you do about it?
Step 1: Identify the Details
The first thing you should do, after you get over your panic, anger, and depression over your content being stolen, is do some investigation. With the proper investigation, you can determine the appropriate course of action. Here are the details you should look for.
Is the stolen version indexed on Google? It probably is, or else you wouldn't have found it in the first place. However, if some other webmaster or a reader happened to follow a link chain and ended up on some shady blog that stole your content and thought to notify you, they might not have thought to check Google.
It's easy enough to check Google. Simply take a relatively unique sentence from the post and Google it in quotes. This will show you a Google search of every post they know of that uses that exact phrase. Yours will come up (unless you're having indexing issues), and you'll see any site that quotes you, and any that syndicates you, and any that has stolen from you.
If the stolen version isn't indexed in Google, great! Feel free to end the process here. It literally doesn't matter if it's not indexed; the site stealing your content isn't likely to be making money off it and there's no real risk of your site being harmed by it.
This is the first filter I always run, because otherwise, tracking down every single instance of copied content and going through this entire process is just going to turn into a full-time job. It ends up being more trouble than it's worth.
Does the stolen version out-rank yours (or rank at all)? This is where things start to get dangerous. If the content thief is indexed, you'll want to see if their content ranks higher than yours. The easiest way is with that same site search; if their content is above yours, they might be out-ranking you.
Alternatively, you can look for the primary keywords for that piece of content. Check two or three different keyword phrases where you know your content is – or should be – ranking, and look for where their URL stands. If it's not in the first 2-3 pages of Google search results, it might as well not exist. If it's out-ranking your content, it's a much larger issue to deal with.
Did they canonicalize your version? The page attribute of rel="canonical" is a piece of meta data you can use to tell Google what the original version of a piece of content is. Content syndication sites do this to acknowledge the original publisher of a piece of content. That way they aren't penalized as content thieves, and the original publisher gets more value from it than they otherwise would.
Some novice bloggers read about content syndication and curation, and think they can get away with copying posts if they just acknowledge the original publisher. It's possible that they're trying to do so. It's still not a good thing to do, but you may be able to turn it into an opportunity rather than an adversarial battle.
Did the thief change names/attribution/links? This is another indication that the content thief might not be on the up and up. If they changed any of your internal links to links to their site, or just removed them, and if they changed the author name and bio, chances are they're trying to pass the content off as their own.
Now, this doesn't mean the site owner necessarily did so. Some companies hire cheap freelancers to produce content for them, and sometimes those freelancers steal content rather than write unique content. The site owner might not know, and that means they're more likely to be responsive to your requests for removal in the next steps of this process.
Did the thief back-date the content to appear earlier than yours? This is another sign that the thief is trying to pass the content off as their own, and is more of a sign of intentional theft. If their publication date for the content is earlier than your own, you know it's stolen.
Thankfully, this isn't as much of an issue as you might think. Google know back-dating is a thing, and they don't care about the date you list as the publication date in either the URL or the content of the page. The only thing they really care about is the date they indexed the post. Since they undoubtedly found your post first, the backdating doesn't hurt you. It's just more evidence of theft, really.
Take screenshots, links, and records. While you're doing all of this, gather evidence. A screenshot of the post that shows the URL of the page, a copy of the URL, the WHOIS information for the site, any other stolen content on the site from your site, and other evidence can be useful later on down the road. Screenshots are necessary in case the person takes the content down, but later steals other content; you can build a case against them with clear evidence of a history of theft in that case. Ideally you won't need it, but you never know.
Step 2: Reach Out and Request Removal
The second step in the process is to reach out to the person who runs the site that stole your content and request that the content be removed.
"But they're a thief, why would they remove it just because I asked them?" You might ask, and that's a valid question. There are a few reasons why they might acknowledge you and remove it.
- The site owner didn't know it was stolen. If it was given to them by a writer they pay for, and they didn't check it first, they might just have been scammed. They'll remove it and go after their writer, potentially with legal action.
- The thief doesn't want to fight you and would rather remove it than risk being sued. Obviously legal action is dangerous and expensive, so they're more likely to want to avoid it if at all possible.
- They've gotten their value and are willing to cut and run. Unfortunately, many content thieves are private blog networks, where any given site in the network is disposable. They'll just kill the site and spring up elsewhere, which is why it's so hard to fight them.
And, of course, there are cases where you can't reach the website owner, or they ignore your communications, thinking that you can't do anything against them. This is especially prevalent if the thief is based in a country you aren't likely to reach with legal action, like China or the Middle East.
How can you contact the site owner? You have a couple of options. The one most people turn to first is the WHOIS search. This is a search of the registration information attached to the site, and will tell you the name, address, contact information, and other relevant information attached to a site. Unfortunately, more and more content thieves are using WHOIS protection, so you won't be able to find valid information when you search.
You can also look on the site itself, in their About section or footer, and see if they have any contact information. Sometimes they will, though many content thieves will have a simple P.O. Box address, an invalid phone number, or no information at all beyond their brand name.
If you can't reach the site owner through those means, you can check to see if there's a contact form on their site. It's a long shot, but submitting a complaint through their contact form might work. Alternatively, you can start leaving public comments on their blog. Those generally won't show up, though; they'll be held in moderation and ignored. That's assuming they have blog comments enabled at all.
We create blog content that converts - not just for ourselves, but for our clients, too.
We pick blog topics like hedge funds pick stocks. Then, we create articles that are 10x better to earn the top spot.
Content marketing has two ingredients - content and marketing. We've earned our black belts in both.
Step 3: Send Their Host a DMCA
If the site owner does not respond or is not contactable, your next alternative is to start issuing legal takedown notices. Thankfully, you don't need to involve a lawyer in this.
At the very least, you're going to want to identify the web host of the site. This is generally available in the WHOIS information even if the contact information for the site owner is not. You can also use a stand-alone checker like HostingChecker to look up the webhost, DNS IP address, nameservers, and other such information.
At this point, you will want to reach out to the web host and submit a takedown notification for the content. Most web hosts will have a page like this one that goes over their copyright policies and takedown policies. You can also find third party pages like this one that will walk you through the exact process for submitting a DMCA takedown notice to that host.
If you need help actually writing a DMCA takedown notice, complete with the information necessary to prove copyright infringement, it's pretty easy. You can find sample letters online to customize for your usage. You can also talk to a lawyer if you want to draft up a letter you can use repeatedly; it should be cheap enough for a one-time thing.
Some web hosts or other sites will have their own specialized takedown process. For example, if the content thieves have set up a Facebook page impersonating you, you can submit an intellectual property dispute through a form like this one.
I say "at the very least" for submitting a DMCA to the website host, because ideally that's all you need to do. The web host doesn't want to be sued for hosting infringing content, so they will often take down the infringing site. However, if the web host is foreign or otherwise doesn't respond to your takedown, you may want to go to Google (and Bing, and DuckDuckGo, and whatever other search engines you want to target) and submit a DMCA there.
After all, if the content is no longer indexed on Google, it's not going to do you any harm, right? And it will be of no benefit to the site owner, so they won't need to bother keeping it up. Google has an entire process for a DMCA request, found here.
Step 4: Disable Potential Scrapers
It's one thing to get the copied content taken down, but if you don't do anything to try to prevent other people from copying your content, it becomes an endless uphill battle to secure your intellectual property. At least unlike a trademark, you can't lose your copyright by not defending it!
Content theft happens in one of three ways, typically.
- The site owner (or a low-paid freelance thief) finds the content and copies it manually.
- The site owner has an automatic scraper that visits a site and copies all of its content.
- The site owner monitors RSS feeds and copies content from there.
You can take some steps to help minimize these routes for content theft.
The first step I recommend you take is to adjust your RSS feed. If you don't use RSS, and you're pretty sure none of your followers use RSS, you can disable it entirely. That link tells you how to do it in WordPress, and you can find similar tutorials for other content platforms.
If you do use RSS, or just want to keep it around, I highly recommend changing the format from "full content" to "summary". This means that the RSS will only show a summary or short snippet of the content of the post, rather than the entire post. As an added bonus, your RSS readers will need to visit your page to read the full content, so this technically drives up your traffic.
That takes care of one of the three avenues of content theft, but what about the others? Next, you can try to address scrapers.
Now, you don't want to disable web crawlers entirely. Search engines use those crawlers to index pages, so blocking them means you'll be handicapping yourself in organic search results.
You can use robots.txt directives and .htaccess tweaks to help prevent scrapers and bots from viewing your site, but you aren't necessarily going to get them all. Many content thieves run scrapers from their own computers, so blocking known bot agents or IP addresses doesn't help.
You can also set up a service like Cloudflare's DNS level security to help. These services are free and can help filter incoming traffic and will throw up a wall between you and a non-trusted scraper.
Step 5: Set Up Monitoring
The final step is to set up monitoring to ensure that you're aware when future content theft happens as soon as it happens. There are a few ways you can do this.
First, you can set up Google Alerts for your content. Set up alerts for reasonably unique sentences for your content, and monitor those alerts periodically.
You can also use Copyscape's Copy Sentry. Copyscape is an online plagiarism detector that looks for copies of your content around the web. You can scan manually from time to time, or you can pay for active monitoring and alerts to be notified the moment your content is spotted.
The DMCA site itself also offers up to two free scans of their index looking for copied content as well. You can also pay a small fee for an unlimited number of scans, if you want to scan more actively.
This allows you to actively take care of any content theft before it becomes a problem. Between these monitors and the automatic blocking options you have, you should dramatically cut down on content theft – and the time you have to spend dealing with it.