Blogging

12 Ways to Prevent Your Blog Posts from Being Stolen

Written by James Parsons on June 28th, 2020 in Blogging

Stopping Content Theft

Content theft is a surprisingly common problem with content marketing. For every legitimate content marketer out there, there are dozens of spammers who would love to just steal your content, pack it with affiliate links or spam ads, pull in a few dollars off it until their site is reported, and repeat with a new target the next month.

Now, having your content stolen isn’t always a bad thing. It’s never a good thing, but a lot of the time, it won’t actively harm your site. On rare occasions, you can even get a minor SEO boost, though that’s exceedingly uncommon.

There are a few ways you can minimize blog post theft and a few that people recommend but that you should avoid. Let’s talk about it!

Identifying Content Theft

First up, let’s talk about identifying when your content has been stolen. See, 99% of the time, when your content is stolen, the person stealing your content is not going to be ranking on Google. It might not be in the first ten pages, and it will almost definitely not outrank your copy. Google scrapes websites consistently, and they also take the website’s trust into consideration. If your website has no history of stealing content, and the person stealing your content has been stealing content for years, it’s not going to be very hard for them to determine who is the original creator.

It’s only if a big name site steals you content that ranking becomes an issue. Most of the time, it’s simply content stolen for PBN usage or other spam. Since Google is so effective at finding spam, these sites are difficult to find in the search results!

With that, these first four tips are to help you find when your content is being stolen so that you can take action to do something about it.

1. Set Up Google Alerts

Google Alerts is a service Google offers that takes advantage of its crawling and indexing power. You set up alerts about names, keywords, or key phrases, and when Google detects a new post using that keyword or name, they will alert you that the post exists. You can use this for a lot of different purposes, including inspiration, following the news, and watching for brand mentions, but you can also use it for content theft monitoring.

Creating a New Alert

What you need to do is identify a unique phrase or sentence you use in your content. You can either seed a particular phrase into all of your content – something like “we here at BrandName” that other companies wouldn’t use – or you can pick a unique sentence from each new post you write. Create an alert, and let it run. When Google finds new content that matches the alert, you’ll get a notification, and can check to see if it’s stolen content.

2. Use a Plagiarism Detector

There are a handful of tools out there that help look for instances of plagiarism. They’re usually meant to check writing you buy or commission to make sure it’s not stolen, but you can use it proactively as well.

Copyscape Site

There are several options to choose from.

  • Copysentry, by Copyscape, is the best-known tool. They scan and monitor the web and look for instances of duplicate content. They also have tools to help you deal with any stolen content results that they find.
  • On that note, you can use Copyscape by itself without a subscription to test individual pages one at a time.
  • Grammarly is generally used to check spelling and grammar errors in your text, but it will also run a scan to check for plagiarism. You can run your old content through the tool and see what copies come up.
  • Plagium is a free or paid tool with varying levels of searching for copies, and it can scan potential content theft inside PDFs and other files, not just web-indexed content.

You can also just use a Google search for a unique phrase in your posts too. It’s more manual, but it works well enough.

3. Use Internal Links

Linking from one blog post to another, like this, is generally good SEO practice. It helps keep users on your site (when they click from one post to another). It helps Google index every page on your site. Internal links aren’t really a link juice powerhouse or anything, but they’re a useful tool.

So how does this help with content theft? Well, a lot of content thieves simply take the content from another site without editing it. I’ve even seen people who steal an entire site, design and all. It’s happened to me before, on one of my older sites.

Pingback Examples

When you have internal links in the content, those links become external backlinks when someone steals your content and posts it as-is on another domain. You can see those links using a backlink monitoring tool, or even just using WordPress’s default trackback/pingback system, or using Google Analytics and looking at new referrers. This means that when a WordPress site steals your content (and most “autoblogs” that steal content are built on WordPress), you’ll be notified as soon as it happens by WordPress.

Whenever you see a new referring domain, check to see if it’s spam or stolen content, and take appropriate action as necessary.

4. Use Reverse Image Search

Your blog posts aren’t the only content that can be stolen, and indeed, another piece of content is likely to be stolen much more often: your images. A lot of people seem to think that anything they find on Google Image Search is fair game to use when many images on there are not. So here’s what you do.

First, determine if your images are yours. If you bought a stock photo license to use or if you’ve been using creative commons images, ignore them. Those images can be used freely by other people, so searching for them won’t do you any good. The same goes for screenshots; anyone can take a screenshot of the same thing, so unless it’s something proprietary like your internal analytics, it’s not something you can pursue.

All Sites That Shared an Image

If you license assets through a service like Canva, keep in mind that other people can use those assets too, so while the exact composition is yours, other people can make similar images and not violate your copyright.

On the other hand, if you pay for unique graphic design, take your own photos, or otherwise produce unique images, those are your copyright and you can defend them. Use Google’s reverse image search or a service like TinEye to see if other people have used your images.

Dealing with Current Theft

If you have identified blog posts, content, site design, or images that have been stolen and are definitely your copyright, you can take action to get the stolen content removed.

1. Ask the Webmaster to Stop

In some cases, the stolen content wasn’t willfully stolen. For example, I’ve seen instances where a blogger hires a freelancer for cheap to produce a piece of content for them. The freelancer “produces” a piece of content and the blogger publishes it, without ever checking to see if the content is original. Turns out, that was your content! Now, this other blogger has unwittingly published your content. This is actually the worst-case scenario, because that blogger might actually out-rank you for your own content, and that’s bad.

Contact Page Website

In these cases, you can often just send an email to the site owner and notify them of the theft. As long as you can prove it’s your content (generally by linking your own version, though you may have to prove publication dates as well) the blogger will likely be apologetic and remove it.

If they aren’t, or if the blog is willfully stolen, or if they simply don’t respond, you can move on to playing hardball.

2. Issue a DMCA Takedown Notice

Hardball, in this case, is a takedown. There are a few ways you can go about this, in increasing severity.

First, issue a takedown notice to the site owner. You can find numerous guides on how to draft a DMCA notice online, but the main thing is it’s basically just a legal threat. Either they take down the content in compliance with your notice, or you can pursue legal action.

If the site owner themselves doesn’t remove the content, use a service like Who Is Hosting This to identify the web host of the offending site. Send them the notice – they likely have their own DMCA process and form you can fill out – and they should remove the offending content.

Remove Content from Google DCMA

If the web host doesn’t (and there are some shady hosts that largely ignore legal threats), you can file the DMCA with Google. Google has its own process for removing content from one of its services, found here. If the content isn’t indexed on Google (and you can do the same with Bing), the spammer is likely not going to keep it up much longer.

Preventing Future Theft

All of the above is about finding and dealing with current theft, but what about preventing theft in the future? There are some steps you can take to prevent future content theft.

1. Make Your RSS Display Summaries

Most blogs have RSS feeds built into them. Many bloggers don’t even know they have an RSS feed until they check. That’s what a lot of content thieves prey upon; they use a bot to scrape the RSS feed, which is by default sharing the full text of the blog post in an easy to scrape format.

Summary in RSS

It’s easy enough to change this to just a summary mode, which will only show either your meta description or the first paragraph or so of the post. In WordPress, all you have to do is go to your admin console, go to the Reading section, find the feed, and change “full text” to “summary” under the appropriate option. It’s detailed here. For other blog platforms, you may have to take different steps or use a third party RSS management tool, but it’s still going to be pretty easy.

2. Use Cloudflare’s Content Protection

Using third-party tools can work pretty well to help prevent content scraping bots, though very little can fully prevent a content scraper from doing it manually. Cloudflare, for example, offers “content scraping protection” as part of all of their plans, including their free plan. You can talk to them about setting up this protection, as well as the DDoS protection and other benefits that Cloudflare can bring to the table.

Their Scrape Shield setting itself doesn’t actually prevent bots from stealing your content, but it does have a few extra features like email and hotlink protection. The standard Cloudflare firewall settings do most of the heavy lifting from preventing bot requests from ever hitting your servers.

Scrapeshield Settings

There are, of course, other tools you can use to do the same thing. Radware lets you control bots, for example, though it’s less automatic. If you don’t like Cloudflare, you can always find an alternative that works well for you.

3. Use a Feed Delay

I mentioned up above that most of the time, scraped content isn’t a big deal. There are two reasons for this. First, 99% of the time, the site that’s stealing your content will never out-rank you, so you don’t have to worry about it splitting your audience. Second, though, Google is very good at catching scraped content these days.

So how do they determine which content is the original and which is scraped? Primarily, they simply look at when they discovered it. If they find your content today and a scraped copy next week, chances are they’ll trust your content more. Now, other factors do go into consideration here, like the relative quality levels of the sites and so forth, but generally, Google can identify when content is stolen versus syndicated versus backdated or whatever.

So, simply add a delay to when a bot can scrape your content. Keeping the RSS method of scraping in mind, you can set a delay on your RSS feed to only show your posts a day or so later than when they’re published. You can simply add this to your functions.php file in your theme:

function publish_later_on_feed($where) {

global $wpdb;

if ( is_feed() ) {
// timestamp in WP-format
$now = gmdate('Y-m-d H:i:s');

// value for wait; + device
$wait = '10'; // integer

// http://dev.mysql.com/doc/refman/5.0/en/date-and-time-functions.html#function_timestampdiff
$device = 'MINUTE'; //MINUTE, HOUR, DAY, WEEK, MONTH, YEAR

// add SQL-sytax to default $where
$where .= " AND TIMESTAMPDIFF($device, $wpdb->posts.post_date_gmt, '$now') > $wait ";
}
return $where;
}

add_filter('posts_where', 'publish_later_on_feed');

This gives Google time to index your content before the scrapers get to it.

4. Watermark Your Blog Images

Blog images are stolen far more often than blog content, so go ahead and watermark them. Watermarks come in a variety of forms, from barely-visible patterns to clearly visible logos to artist elements added to the designs to digital watermarks.

Easy Watermark Plugin

Watermarking doesn’t specifically prevent your images from being stolen, but it makes it more obvious when they are. You can point to a watermark to prove that it’s yours, and people who are aware that they’re stealing may have to put the work in to remove the watermark. Since they don’t want to have to do that, they’ll be more likely to leave your images alone and look elsewhere for their theft needs.

5. Add a Copyright Notice

While it might not seem like adding a copyright notice to your website would stop scrapers, it works for some of them. There are some people out there who seem to believe that if there’s no copyright notice, the content is fair game to take. That’s 100% not true – once you publish something original, you own the copyright to it – but since copyright is a huge and complex topic, I can understand the confusion.

Adding a copyright notice to your site is easy enough. For WordPress, all you need to do is add a block of text to your footer that says something like this:

Copyright © 2020 SiteName, All Rights Reserved.

You can manually edit this once a year, or you can use <?php echo date(‘Y', time()); ?> to automatically pull the current year. You can also use “1999-2020” or whatever the date you founded your site is to make sure everyone knows that you’ve had the copyright the whole time. It doesn’t really matter how exactly you phrase it, as long as you’re stating that your website is copywritten in a place that is visible to your users.

6. Block Scraper Bot IPs

Once you recognize that your content is being stolen, you can look for the IP addresses of the bots that are stealing and scraping the content. Bots have to access your site, which means they have IP addresses, and you can block those. You can do it manually through robots.txt, though shady bots might just ignore robots.txt directives. You can do it forcefully through .htaccess edits, as described here. You can also use plugins to help you do it, like this.

To block an IP in your .htaccess, simply find the .htaccess file on the root directory of your website and add this line to it (remember to place the IP with the website’s IP address):

Deny from 123.123.123.123

You should be extremely cautious here; don’t block IP addresses that are too broad, that are associated with ISPs, or that are associated with good bots like Google or Bing. Blocking those can have devastating effects on your search traffic.

What Not to Do

In the process of researching this topic, you may have come across one or two recommendations for things to do that can stop manual scrapers. In fact, most of what I’ve written above is meant to stop automatic scraper bots, not people just copy and pasting your content or saving your images. There’s a reason for that: it’s nearly impossible to do.

If you want to stop someone from copying and pasting your content, you can disable right-clicks or disable text highlighting. There are plugins and scripts to do that. I highly recommend not doing that, however.

Why? Three reasons. First, it’s hugely disruptive to normal users. Some people highlight as they read to mark their place. Some people want to copy and paste a snippet to save a quote or share a snippet with a friend. Some people use their right-click menu for other purposes. It’s a huge usability issue.

Second, it’s a huge hit to your social media. People love quoting and sharing posts they read, but if you disable the ability to copy a line to share, they’re never going to do it. You lose out on all of that benefit.

Third, and most importantly, it doesn’t work. Disabling right-click or disabling highlighting text has to be done with a script, and it’s trivially easy to block scripts from the client-side. There’s even browser extensions that can do it for you automatically.

Remove Copy Blocking

Anyone who cares enough to copy your content can do so, and blocking right=clicks will stall them out for, at most, 30 seconds. Heck, they don’t even have to block the script, they can just press Ctrl+S on their keyboard (or ⌘+S for us MacOS folks) and save the entire webpage to the desktop, along with the images you’re trying to protect. If there’s content on your site, people can steal it, and scripts will only slow them down and hurt user experience.

Stick with more proactive ways of blocking scraping, and deal with scraping aggressively when it occurs. That’s all you really need to do.

Written by James Parsons

James Parsons is the founder and CEO of Content Powered, a content creation company. He’s been a content marketer for over 10 years and writes for Forbes, Entrepreneur, Inc, and many other publications on blogging and website strategy.

Are You Blogging? You Should Be.

I wrote a 6 part article titled “Why Blog?” that breaks down the stats and facts on why blogging is one of the best marketing investments, period. I guarantee you’ll learn something new.

Read Article

Comments

Leave a Reply

Let’s Grow Your Business

Want some free consulting? Let’s hop on a call and talk about what we can do to help.

Share
Tweet
Share
Pin
Buffer