How to Perform a Blog Content Audit (The Right Way)
Performing a full content audit on your blog (or a client’s blog) is a great way to get an overview of the blog as a whole. You can harvest the metrics of each blog post, you can analyze what needs to be improved or removed to boost your site’s performance, and you can use the audit results as a historical record moving forward.
There are a bunch of different ways marketers offer to perform a content audit. There’s not necessarily one right or wrong way to do it, so long as you know your objectives and the method you use meets them.
What I’m about to offer you is just one of many possible ways to audit your content. I’ve found it useful, but you may want to remove some steps and add others to suit your needs.
Step 1: Download Post Data
Performing a content audit is a lot easier if all of your data is stored locally, in one easy to access space. I like working in Excel, but any spreadsheet program can work for you – LibreOffice, Google Sheets, what have you – so long as you can use cell formatting with colors.
There are a few different ways you can handle downloading all of your data. If you have access to the site’s database directly, you may be able to export it, either as a natural backup file you can edit or via a SQL query to export data. If you don’t feel comfortable with queries or a direct download, or you don’t have access to it directly, I recommend Screaming Frog.
Screaming Frog is a web scraper that can crawl an entire website and harvest data about it. I like it for two main reasons. First, it works on any site, and you don’t need internal access to use it. Second, it harvests a bunch of extra data about the sites it scrapes, which you can configure. Some of that data will be useful to both this audit and future audits, such as broken link data and duplicate pages.
Screaming Frog is free for small usage, but if you want extra forms of data like spelling checks, direct integration into Google Analytics, or a crawl limit of more than 500 URLs, you need to buy a premium version. Access for a year costs about $200, but it’s well worth it in my opinion, especially if you want to use it on a large site or multiple client sites.
Run your scraper and download your data. You should be left with a spreadsheet that has basic data, including the blog post title, the URL, the metadata, the HTTP status code, and so on. This is all the general report that Screaming Frog puts out.
For a content audit you won’t need all of that, so hide (or delete, but I prefer hiding) columns you don’t need.
Step 2: Harvest Post Data
Once you have the overall post list for your site set up in a spreadsheet, you need to harvest more useful data about it. Here are the pieces of data I’ve found useful to have, but you can add any you feel are useful and remove those you don’t need.
1. Word count. You can harvest word counts however you like. The WordPress plugin WP Word Count gives you a bunch of interesting data, including word counts per post, per author, per month, and other displays. Per post is what you’re after. Sites like WordCounter allow you to paste in your blog posts and get a word count to fill in. And, of course, your favorite word processor will have a word counting feature as well. Just make sure you use the same tool for every post since different tools can count words differently.
To assist with analyzing this data, color-code the boxes. Any post with under 1,000 words, color red. Anything over 1,000, mark green. If you want more granularity, you can set a middle range (such as 1,000 to 1,500 words) to yellow, with anything 1,500 or over as green.
You can do this manually, or you can apply an automatic filter. Conditional styling in Excel is pretty easy and will adjust automatically if you change the value in the cell (such as when you edit a post to add more content, increasing word count.)
2. Organic traffic in the past 30 days. For this, you need access to your site’s Google Analytics, or whichever analytics platform you are using. Unfortunately, you need access to historical data, so you can’t swoop in and install Analytics and get your data right then. You will need to install it and wait at least 30 days to get this data.
Here, you can set your thresholds. Using a similar color-coding scheme as above, pick a range of monthly visitors that you consider to be worthwhile. For a small site, anything over 10 visitors per month may be worth keeping. For a mid-sized site, anything over, say, 50 is good. For a larger site, anything over 100 or 1,000 may be good – it depends a lot on your benchmarks. The point here is to remove posts that have very little to zero traffic.
Using 50 as an example, set any cell that has under 50 viewers in the past 30 days to the color red and anything with 51 or more to green. This can give you a quick at-a-glance overview of what posts on your site are drawing in traffic the most.
3. Total backlinks. To harvest the total number of backlinks pointing at a post, you’ll need to use a tool.
Google Analytics is often the best choice if you have access to it, but it also might be incomplete or missing some historical data. Other backlink checkers rely on their indexes and may vary depending on what data they’ve been able to harvest.
For the most part, your data here will just be a numeral. If a post has zero backlinks, mark the cell red. If it has one or more, mark it green.
You can also do a more detailed link audit, now or later. Go through your backlinks and audit them based on their quality. Is the link from a legitimate site or a spam domain? Is the content relevant or is it just tossed in? Is it an English site (or whatever your primary language is) or a foreign site? If the link is questionable, mark the cell yellow or red depending on whether it’s simply questionable or worth disavowing.
4. Total social shares. This is a metric that you may or may not care about enough to harvest. I like to use something like Social Warfare to count the social shares of your posts. The trouble is, a lot of social networks have discontinued their APIs for social share counts, so the number isn’t going to be very accurate. You could have a ton of shares on Facebook, but since you can’t just pull that data from Facebook directly, it’s inconsistent whether or not they’re counted. The same goes for Twitter.
There’s also the fact that social shares aren’t long-term meaningful the way backlinks are. Backlinks (even if they don’t actively refer traffic) are still a factor for SEO. Social shares are not. If you choose to ignore social shares as a relevancy metric in your content audit, that’s fine. I won’t tell anyone.
If you do choose to use it, like the other metrics, set a threshold for meaningful. Maybe it’s 1, 10, 100, or whatever else. Anything under it, mark red. Anything over it, mark green.
5. Total comments. Blog comments are generally more meaningful than social shares and can be an additional value to your posts. Some people, like Neil Patel, argue that blog comments add to the overall word count of a post. Others argue that comments are generally not likely to be valuable. A lot of it depends on whether or not your blog generally gets comments like “thanks for the post” or more insightful, meaningful discussions.
For a simple numerical indicator, just make it a binary; if the post has one or more comments on it (that you didn’t leave yourself), mark it green. If it has zero, mark it red.
If you frequently get comments but most of them aren’t meaningful, you can audit those comments. Tally up not just the number of comments, but the number of them that are meaningful, based on whatever metrics you want to judge them. Again, mark posts that fail to have meaningful comments in red, and those that have some in green.
6. Quality score. No, I’m not talking about the usual quality score metrics, like what you see with Google ads. In this case, the quality score is a little more subjective, and it’s something you have to judge manually. You can either do this yourself or enlist the help of someone you trust to be relatively impartial and objective.
Pick a scale. I like 1-10, but some people like 1-5. Read through the post. Think about it. Did the post help you understand a topic or solve a problem (if you had a hypothetical problem)? Did it satisfy you? Was it boring, rambling, off-topic, or disjointed? Assign the post a value on your scale.
You can also choose to regiment this. For example, you might come up with five categories to judge the post on, such as:
- On a scale of 1-10, how well did the post satisfy your curiosity about the subject as a hypothetical reader?
- On a scale of 1-10, how well did the post solve a problem you had as a hypothetical reader?
- On a scale of 1-10, how easy or difficult was the post to read, with 1 being very dense and difficult?
- On a scale of 1-10, how valuable would you say the post is, in general?
Average out the values to get one overall value for the post, to go in your spreadsheet. Usually, six is the threshold for a 1-10 scale; anything 1-5 should be marked red, and anything 6-10 can be marked green. Again, use yellow for a middle range if you want more nuance.
Yes, this step will take a lot of time, especially if you have a very large site with hundreds or thousands of articles. The larger your site is, the more helpful it is if you establish an objective scale and enlist the help of a few trustworthy people to audit.
7. Copyscape results. In comparison to the previous metric, this one is easy. Run your posts through Copyscape. Anything that has zero duplicate results, mark green. Anything that finds results, mark red.
Keep in mind that sometimes Copyscape can come up with false positives. If you quote a passage that’s a sentence too long, Copyscape might flag it as duplicated content. Likewise, content syndication throws it off something fierce. As you analyze your posts, just check what kind of copied content comes up. If it’s legitimate, like a quote, mark the post green anyway. If it’s scraped or stolen content, mark it yellow or otherwise flag it to handle that issue later. See my post on how to handle stolen content for more details. And, of course, if your post has stolen content from elsewhere, flag it red.
You may also want to put an additional marker or use an additional color for real instances of copied content on your site. Those are a priority to get rid of, and even if the post has other positive metrics for it, if it’s stolen, it’s a potentially huge landmine waiting to go off (if it hasn’t already).
8. Grammarly results. This is another relatively easy one. Run your posts through Grammarly and see what comes up.
Keep in mind that Grammarly has a lot of less than perfect suggestions. They love to make style suggestions that aren’t worthwhile recommendations, and they occasionally just get things wrong. This is doubly true if you use industry terms or jargon that have other meanings outside of the industry. Talking about agility as a business procedure will confuse their algorithm since the word “agile” has a defined meaning that wouldn’t necessarily make sense in context.
In any case, analyze the results and give your post a rating. Red for if the post has a bunch of legitimate spelling and grammar errors, and green if it’s fine, even if Grammarly wants to scream about stylistic choices.
9. Google Indexation status. Another easy one. Take the URL of the post and search for it on Google. The first result should be your post. If it’s not, and if you can’t find the post in Google’s search results, flag the column as red. If you can, flag it green. Indexation issues are important to deal with, but they’re also somewhat independent of other auditing metrics, so you might consider flagging it yellow instead of red as a priority to sort out.
Step 3: Analyze the Results
Now you have a spreadsheet with plenty of data on your posts, nicely color-coded to indicate if a post is in a danger zone or if it’s doing fine.
First, go through this and look for any issues you flagged as priority issues, such as copied content or indexation issues.
If a post has copied content on it, consider deleting it immediately. Google penalizes sites for copied content, so even if the post had backlinks, comments, and traffic, it’s possible that none of those metrics were benefitting your site. If the other metrics are beneficial enough that you want to keep the post, identify the copied sections, and rewrite them to be unique.
If the post is not indexed, look into why. Check for robots.txt or other bot directives that might be blocking it, or
noindex flags in the metadata. Make sure it has other internal links pointing at it, and that it’s in your sitemap. Make sure there are no errors on the page. There are a lot of possible issues and solutions.
Once the priority issues are solved, look at the rest of the metrics for the posts. Posts that are mostly green are fine and can be kept around with no issues. Posts that have a few red cells might be worth buffing up and improving with better content, or they might be worth removing, in the case of time-sensitive content that is no longer relevant.
Posts that are mostly red should be examined. Is there any value there that is worth saving? A post with zero traffic and zero other beneficial metrics, but with a few backlinks, might still be worth saving. You can consider merging, expanding, or otherwise improving the post. Otherwise, it might just be better off if it was deleted.
Before deleting a post, especially a post with a couple of green boxes, you might also consider checking the historical traffic for the past 1-3 years. If the post performed well but its traffic dropped off suddenly, there might be a good reason for that. There are some explanations, like Google algorithm updates, that you can address with simple changes to get back in Google’s good graces. Others, like new competition, might be beatable if you expand the content to exceed the competition. Still, others might not be worth fighting for – ultimately that’s a judgment that you need to make for yourself after reading each post and determining if the post is actually helpful or just full of fluff.
Overall, a content audit ends up being a lot of work, most of which is subjective. There are a few tools out there that can help you, but there’s no substitute for human judgment. Analyze your posts, decide if they’re worth keeping, figure out what’s working the best for your site, and adjust accordingly.