I've seen recently a lot of plugins for WordPress aimed at taking care of the duplicate content issue in search engines. Don't get me wrong, those plugins are doing well what they were created for : adding a meta tag in some pages so that Googlebot and its friends don't index them. The problem is that doing such a thing is, in my very humble opinion, a bad idea.
Duplicate Content ? Duplicate Content ?
First of all, if you don't know what's wrong with duplicate content, read this nicely illustrated article from SEOmoz. In short, when a search engine bot sees the same content on different pages, let alone different websites, it doesn't like it, tries to identify the original source for it and puts a penalty on others. Duplicate content is a real cross site issue, often synonym to spam, splog and content stealing.
Inside your own blog, duplicate content might be a problem. You have original content on your post page, you have the very same content in the timely archives for the same day, month and year, and again the same in each category archive page under which you filed your post. Four, five, six times the same content? This has to be a bit confusing for our friend Googlebot, and it might put a penalty on some and any of these pages.
Note the emphasize on might. The real problem here is that you want to have control over which page is the most important, and not let search engines decide for yourself.
What to do, then ?
At this point, you have 3 possibilities :
- Do nothing
Let search engines index 6 times the same content and decide which is best. That could work. Or that could painfully decrease your visibility in search engine pages, since you can't decide what page should be proeminent over others. Not an option for those who want to fine tune things.
- Use a "don't index this" plugin
Basically, those plugins will simply add something like <meta name="robots" content="noindex,follow" /> in some pages, so that search engine bots will follow links but won't remember what they've seen on that page.
As I've said in the intro, this seems a dumb idea to me. Why? Mainly for two reasons. First of all, it's much better for your site to have 6 pages indexed instead of 1. Or 6,000 instead of one thousand. And second reason, simply because it's up to you not to serve duplicate content.
- The right way : just give different content
Reason #771 why WordPress is great is you can customize just everything to suit your needs. Like, having a smart theme with smart archive pages. Instead of displaying each post in its entirety, display an excerpt of it, and get your page indexed
The Right Way™ to do it
"The right way" purposely emphasized: as usual and as with most things, There's More Than One Way To Do It. But at least that's my way to do it, and judging by my referrer visits, it's not working too bad.
Keeping your site safe from the duplicate content issue, and more generally getting things optimized for search engines, should not be a plugin's job, it must be your theme's job, it must be coded and designed for it. Just like looking good, using good markup, being crossbrowser friendly, etc…
WordPress themes have a template page specifically made for those pesky archive pages, be they year, month, day, category, author, whatever archives. You need at least one file, located in wp-content/themes/yourtheme/, named archive.php.
If your theme does not have an archive.php, it
sucks is incomplete. Create one, for example by duplicating and editing the file you'll find in the default theme directory. If your theme has an archive.php and displays whole posts, hack it. An example (and rather minimalist) template to display post excerpts would be:
- <?php get_header(); ?>
- <div id="content">
- <?php if (have_posts()) : ?>
- <?php while ( have_posts() ) : the_post() ?>
- <div class="post">
- <h2><?php the_title(); ?></h2>
- <div class="storycontent">
- $short = get_the_excerpt();
- if (strpos($short,'[...]') === false) $short.='[...]';
- echo $short;
- → <strong><a href="<?php the_permalink() ?>">Read more</a></strong></p>
- </div> <!-- post -->
- <?php endwhile ?>
- </div> <!-- content -->
- <?php get_sidebar(); ?>
- <?php get_footer(); ?>
Does that really matter ?
My theme is serving archive pages using the example above. While most of my incoming search engine visitors land on post pages directly, I do have visitors landing on category archives. Oh, not much, about 2 or 3% of them. Just checking the past hour referrers, I've found people coming from Google and looking for :
french lolcats (ho hai i'm fwench!)
php bitwise gd function
So, why would I want to cut my visitor number by 2 or 3 percent with a noindex directive for bots ? Would that make 3 visitors a day, or 3 visitors and hour, that's still 3 potential readers, 3 potential ad clickers, 3 potential bloggers who will like and link my site. That'd be silly to tell Google not to send those fine people to my site.
Do stay away from the duplicate content problem. But it's definitely a theme issue, not a plugin's business.
Want to share or tweet this post? Please use this short URL: http://ozh.in/ey