I've seen recently a lot of plugins for WordPress aimed at taking care of the duplicate content issue in search engines. Don't get me wrong, those plugins are doing well what they were created for : adding a meta tag in some pages so that Googlebot and its friends don't index them. The problem is that doing such a thing is, in my very humble opinion, a bad idea.
Duplicate Content ? Duplicate Content ?
First of all, if you don't know what's wrong with duplicate content, read this nicely illustrated article from SEOmoz. In short, when a search engine bot sees the same content on different pages, let alone different websites, it doesn't like it, tries to identify the original source for it and puts a penalty on others. Duplicate content is a real cross site issue, often synonym to spam, splog and content stealing.
Inside your own blog, duplicate content might be a problem. You have original content on your post page, you have the very same content in the timely archives for the same day, month and year, and again the same in each category archive page under which you filed your post. Four, five, six times the same content? This has to be a bit confusing for our friend Googlebot, and it might put a penalty on some and any of these pages.
Note the emphasize on might. The real problem here is that you want to have control over which page is the most important, and not let search engines decide for yourself.
What to do, then ?
At this point, you have 3 possibilities :
- Do nothing
 Let search engines index 6 times the same content and decide which is best. That could work. Or that could painfully decrease your visibility in search engine pages, since you can't decide what page should be proeminent over others. Not an option for those who want to fine tune things.
- Use a "don't index this" plugin
 Basically, those plugins will simply add something like <meta name="robots" content="noindex,follow" /> in some pages, so that search engine bots will follow links but won't remember what they've seen on that page.
 As I've said in the intro, this seems a dumb idea to me. Why? Mainly for two reasons. First of all, it's much better for your site to have 6 pages indexed instead of 1. Or 6,000 instead of one thousand. And second reason, simply because it's up to you not to serve duplicate content.
- The right way : just give different content
 Reason #771 why WordPress is great is you can customize just everything to suit your needs. Like, having a smart theme with smart archive pages. Instead of displaying each post in its entirety, display an excerpt of it, and get your page indexed
The Right Way™ to do it
"The right way" purposely emphasized: as usual and as with most things, There's More Than One Way To Do It. But at least that's my way to do it, and judging by my referrer visits, it's not working too bad.
Keeping your site safe from the duplicate content issue, and more generally getting things optimized for search engines, should not be a plugin's job, it must be your theme's job, it must be coded and designed for it. Just like looking good, using good markup, being crossbrowser friendly, etc…
WordPress themes have a template page specifically made for those pesky archive pages, be they year, month, day, category, author, whatever archives. You need at least one file, located in wp-content/themes/yourtheme/, named archive.php.
If your theme does not have an archive.php, it sucks is incomplete. Create one, for example by duplicating and editing the file you'll find in the default theme directory. If your theme has an archive.php and displays whole posts, hack it. An example (and rather minimalist) template to display post excerpts would be:
- <?php get_header(); ?>
- <div id="content">
- <?php if (have_posts()) : ?>
- <?php while ( have_posts() ) : the_post() ?>
- <div class="post">
- <h2><?php the_title(); ?></h2>
- <div class="storycontent">
- <p><?php
- $short = get_the_excerpt();
- if (strpos($short,'[...]') === false) $short.='[...]';
- echo $short;
- ?>
- → <strong><a href="<?php the_permalink() ?>">Read more</a></strong></p>
- </div>
- </div> <!-- post -->
- <?php endwhile ?>
- </div> <!-- content -->
- <?php get_sidebar(); ?>
- <?php get_footer(); ?>
Does that really matter ?
My theme is serving archive pages using the example above. While most of my incoming search engine visitors land on post pages directly, I do have visitors landing on category archives. Oh, not much, about 2 or 3% of them. Just checking the past hour referrers, I've found people coming from Google and looking for :
lolcat + javascript (go figure)
french lolcats (ho hai i'm fwench!)
php bitwise gd function
So they came here, and actually they probably didn't find what they were looking for : I've never written anything about javascripted lolcats. But I've posted about lolcats, and about javascript, for sure. I've never written anything about bitwise operations in gd, but I've posted about gd, and about bitwise operators. Yet, Google showed them results that made these people think they would find what they were looking for on my site, and they came. And that would never have happened if I made my site with this noindex meta tag on archive pages.
So, why would I want to cut my visitor number by 2 or 3 percent with a noindex directive for bots ? Would that make 3 visitors a day, or 3 visitors and hour, that's still 3 potential readers, 3 potential ad clickers, 3 potential bloggers who will like and link my site. That'd be silly to tell Google not to send those fine people to my site.
Summary
Do stay away from the duplicate content problem. But it's definitely a theme issue, not a plugin's business.
Shorter URL
Want to share or tweet this post? Please use this short URL: http://ozh.in/ey


 
		
[…] è¿™æ‰æ˜¯SEO的最好方法. 喜欢看英文的朋å‹, 建议看看Ozhçš„è¿™ç¯‡æ–‡ç« . 喜欢本文å—?订阅 catch the digital flowï¼Œç²¾å½©æ–‡ç« ä¸€ç½‘æ‰“å°½ã€‚ var […]
Hey ozh, j'ai une question. penses-tu que l'utilisation d'un post teaser permette d'éviter la redondance entre post et archive ?
Le post teaser evite en effet d'afficher le post en entier sur la page principale. Pour peu que l'affichage soit différent dans les archives, on aurait donc un contenu différent non ?
ps: vive le japonais.
Armouf » oué, probablement. Ca vaut pas mon truc cependant a mon avis :)
[…] Once you've tried the Admin drop down menu, you won't get rid of it. You'll find clever thoughts about blogging, the internet. But you'll aslo find some useless but so necessary […]
[…] WordPress, Duplicate Content, and Wrong SEO Plugins […]
[…] about all the fuss over duplicated content. WordPress, Duplicate Content, and Wrong SEO Plugins by Planet Ozh will help set you straight. Keeping your site safe from the duplicate content issue, and more […]
Thank you so much! I've been trying to find a plugin or something to do this; this is exactly what I needed!
[…] http://planetozh.com/blog/2007/06/wordpress-duplicate-content-and-wrong-seo-plugins/ […]
I've got three sites, all with very well known themes, and I've done everything I know to do and still have supplemental @ ARUGH. I've been messing with this for two months and can't get it fixed. Ideas?
I agree that duplicate content could be a theme issue, but I disagree that it's not a plugin's business. I've found a way, which does involve some work, to make each page view of a WordPress blog page unique, which I implemented in a WordPress plugin. The results are at least 80% and better in uniqueness of each page view. When each page view of a page is unique, it doesn't really matter where and how many times the content appears on the blog.
I have always triedto get away from the duplicate content. Just not worth using it.
thanks good work
[…] WordPress, Duplicate Content, and Wrong SEO Plugins « planetOzh (tags: wordpress seo optimization duplicate-content) […]
It is just i was looking for.
Little Confused:
1st, why would wordpress.org, as an blogging engine, create duplicates within one blog!!! WordPress should know that it is not a good and practical way.
Perhaps, some themes might be codded wrongly, then I understand.
Permalinks: my_blogname.com/year/month/date/post_title
The original post will always have the same url, even if the post (not page) is listed under dozens of categories, or can be found by navigating the archives. Where does than duplicate posts originate.
I hope what I have written above is not totally utter non-sense, as I am newbie at all this.
Purpose for asking this dilema is because my new theme is under-development, and soon the blog will be completely hosted on my own wordpress.ORG engine, as it is currently on wordpress.COM. I simply want to be sure that I do not end up having "duplicate contents" within a single blog.
Thank you and looking forward for your explanation.
It is very important to make your site as duplicate content free as possible. However, it is very time consuming and there are different methods to do it by. Currently, I have been managing duplicate content in the robots.txt file and using a WordPress plugin, which is working better for me at the moment. But, I am still a long way from perfection what is for sure.
I haven't seen much of this method that is implemented into the WordPress theme, this is new for me. I am going to have to shoot this off to my coder and get some input from him.
I will have to analyze you site out a bit and see what you got going on.
Thanks for sharing…
To get traffic that is relevant and not people who come to my blog seatching for something that is not there on my blog.
when we we do section targetting for google ads, to ensure better conversion, the same holds good for getting relevant traffic from search engines. I am better off with traffic that is relevant than wasting bandwidth for traffic that has high bounce rate.
Hi there, thanks for a super useful and timely post.
The one thing that often grsates me when using a cms for site development is the duplicate content factor, especioally with WP and Joomla, but thanks to posts like these it's becoming manageable.
??????????????…
??Wordpress??????Google?????????????????????????????????????????Google????????????????????????????…
[…] plus post titles, and so forth, with no additional content. This is a better method. Here’s a good post on the issue, which also advocates using smart […]
Great tips, I'll follow some steps now,
Thanks for share.
[…] in the "I hate SEO plugins" category (why), and tagged as "but some are better than others", there's an article on Urban […]
Hi –
I can't get excited about duplicate content for my wordpress blogs – I see Google shows several duplicate contents in the same page of serps: E.G., under the site URL, with the URL with the postname, the same post under Category and again under Tag.
I was thinking there must be some solution to the WordPress duplicate content problem other than using noindex. Thank you very much for the info, only problem is I'm just starting and hacking pages is still a lot of guesswork for me. Your example will help a lot though. Thanks!
Will your archives template above show excerpts when none are provided in the post/page itself. Or does this simply create it's own excerpt from the first N characters/words?
Forgive me if I am mistaken becoz I am not too tech savvy. But what I understand is that I just copy and paste the archive code give above in my current archive.php file of my wordpress theme. Is that right?