In: , , , ,
On: 2010 / 09 / 06 Viewed: 35613 times
Shorter URL for this post: http://ozh.in/tf

Lately, every time I manually flagged a comment as spam, I started to have the impression that they all were posted on the same post, namely an old announcement for my plugin "Absolute Comments".

An SQL query later, my impression was confirmed: this blog post *is* a spam magnet:

I suspect that some particular keywords in page titles act like pheromones to spammers. These keywords probably include "comments", "guestbook", "feedback" and anything that will make them think there's a form they can spam.

The SQL query I used to list blog posts sorted by the number of spam they get is:

  1. SELECT COUNT(wp_posts.ID) as spam_count, wp_posts.ID, wp_posts.post_title
  2. FROM wp_posts, wp_comments
  3. WHERE wp_comments.comment_approved = 'spam' AND wp_comments.comment_post_ID=wp_posts.ID
  4. GROUP BY wp_posts.ID
  5. ORDER BY spam_count DESC

If you want to play with it, I made a quick standalone script that will run the query and output a pretty interactive pie chart. Download spam-magnet.txt, rename as .php and put it in your blog root (where wp-load.php is). Live demo: spam-magnets.php.

Edit: per request, I've made this a plugin too.

If you run it and notice a pattern on your blogs too, share your analysis! I've always enjoyed useless fun stats :)

Shorter URL

Want to share or tweet this post? Please use this short URL: http://ozh.in/tf

Metastuff

This entry "Spam Magnet Blog Posts" was posted on 06/09/2010 at 5:18 pm and is tagged with , , , ,
Watch this discussion : Comments RSS 2.0.

25 Blablas

  1. 1
    Rarst Ukraine »
    commented, on 06/Sep/10 at 7:03 pm # :

    I had discovered very high counts (tens times larger than average) on some posts when doing research for my antispam plugin.

    Since then I no longer retain spam older than one month. Checking now it had pretty much leveled, with several mild spikes.

    So I suspect it's more likely the case of some stupid bot getting obsessive with post, rather than intelligent targeting of keywords.

  2. 2
    W-Shadow Latvia »
    said, on 06/Sep/10 at 7:16 pm # :

    At first glance, my stats reinforce the keyword theory. The post with the highest number of spam comments has words like "send", "message" and "forum" in the title. However, the second most-spammed post is one that discusses how to write drivers in Delphi. Huh?

    Overall, the only real pattern that I can see is that old, popular posts take the brunt of the comment spam.

  3. 3
    Ozh France »
    commented, on 06/Sep/10 at 7:34 pm # :

    It's not the first time actually that I notice the keyword magnetism. Before I closed comments on it, my most attractive blog for spammers was Cool Guestbooks, much more than any other

  4. 4
    James United States »
    thought, on 07/Sep/10 at 8:55 am # :

    Thanks to your awesome script, I found a few spam magnets that shouldn't have even had comments open in the first place. :)

  5. 5
    Waldar France »
    replied, on 07/Sep/10 at 12:22 pm # :

    Query can be upgraded :
    Using table alias make it shorter and somehow easier to read.
    Using ANSI joins makes a clear isolation between the filter and the join predicate.

    Also, note that using the post_title this way works only on MySQL, this syntax is much much forbidden in every other RDBMS.

    1. SELECT COUNT(*) AS spam_count, ps.ID, ps.post_title
    2.     FROM wp_posts as ps
    3.          INNER JOIN wp_comments as cm
    4.            ON cm.comment_post_ID = ps.ID
    5.    WHERE cm.comment_approved = 'spam'
    6. GROUP BY ps.ID
    7. ORDER BY spam_count DESC;
  6. 6
    mrmist United Kingdom »
    commented, on 08/Sep/10 at 4:07 pm # :

    I think my results buck the trend, as my top result by an order of magnitude is "mouldy bread". But, then, my site is a tad off-the-wall.

  7. 7
    BG! United Kingdom »
    thought, on 08/Sep/10 at 6:24 pm # :

    That's a useful script – thanks, Ozh!

    Could it be made into a plugin?

  8. 8
    Jen United States »
    commented, on 08/Sep/10 at 9:06 pm # :

    I don't have a wp-upload.php? Can this still be ran without that file?

  9. 9
    rob United States »
    wrote, on 08/Sep/10 at 9:21 pm # :

    I've never heard of wp-load.php. Where is that? Do I need it?

  10. 10
    Ozh France »
    thought, on 08/Sep/10 at 9:31 pm # :

    You all have a wp-load.php file, it's the file WP uses to load everything needed. Don't do anything with this file, it's just mentioned to explain in which directory you should drop the file. dear god…

  11. 11
    Ike United States »
    thought, on 08/Sep/10 at 10:30 pm # :

    Outstanding, Ozh.

    I'll upload this as soon as I get a chance.

    It would be fun to compare all the Spam Magnets from around the blogosphere, and check the word clouds for common ground.

    Might there be a way to export those in a uniform manner, so you can build a scalable and useful data set?

  12. 12
    Jen United States »
    replied, on 08/Sep/10 at 10:36 pm # :

    I'm still a bit confused. If I put this file in the root, in that file it's calling for the wp-upload as required. I do not have that wp-upload.php file at all.

  13. 13
    Richard United States »
    thought, on 09/Sep/10 at 3:21 am # :

    I'm getting the following error when I run the script.

    Result of expression 'this.ka.a[0]' [undefined] is not an object.

  14. 14
    Ozh France »
    replied, on 09/Sep/10 at 7:54 am # :

    Jen » Sorry you don't qualify to use this script.

  15. 15
    Ozh France »
    wrote, on 09/Sep/10 at 7:58 am # :

    Ike » I don't think a global list would me much valuable. The results on each blog are either worthless (no particular keyword), very predictable (keywords such as "comments", "guestbook"), or specific keywords very tight to the blog topic if any. Collecting the data would require collecting much more stuff like blog topic, blog age, and a lot of other statistical stuff I have no clue about.

  16. 16
    BJ Johnson United States »
    replied, on 09/Sep/10 at 11:17 pm # :

    I'm getting this, too, though it's not exactly the same as Richard's error.

    'this.ka.a[0]' is undefined

    And a search for that string, and sub-sets of it, in the WP directory tree turns up zip.

    WP 3.0.1, MySQL 5

  17. 17
    Ozh France »
    wrote, on 09/Sep/10 at 11:28 pm # :

    BJ Johnson » This is not a WP error, this is generated by the Google chart. Dunno. Can't fix.

  18. 18
    AskApache United States »
    replied, on 10/Sep/10 at 1:46 am # :

    Ozh, dude I just tried a couple of your plugins, wow you got a real gift! Thanks!

  19. 19
    RavanH France »
    wrote, on 10/Sep/10 at 11:38 am # :

    This little script is just great! And it works on WP3.0 in Multi-Site mode too :)

    Anybody made this into a small plugin yet? I'm thinking about putting the chart on each dashboard in my network so siteowners can benefit from the knowledge too..

  20. 20
    Ozh France »
    replied, on 10/Sep/10 at 12:56 pm # :

    Peeps, I've packaged this as a plugin: http://wordpress.org/extend/plugins/ozh-spam-magnet-checker/

  21. 21
    BG! United Kingdom »
    said, on 10/Sep/10 at 4:47 pm # :

    The plugin works a treat, Ozh. Thank you.

  22. 22
    Muskie Canada »
    replied, on 11/Sep/10 at 11:55 am # :

    Thanks for this. It is a nice little plugin. I've gotten way too much spam, which I've fought for years, I switched to DISQUS and now I seem to get less, but I also did some other drastic things over the years, like delete 1000s of spam comments permanently at some point, so my historical graph is only considering the last 1000 or two thousand spam comments I've gotten.

    What bothers me is they leave the same comment on so many posts…

  23. 23
    RavanH France »
    commented, on 13/Sep/10 at 2:18 pm # :

    Peeps, I've packaged this as a plugin: http://wordpress.org/extend/plugins/ozh-spam-magnet-checker/

    You ROCK !

  24. 24
    laztec France »
    said, on 24/Sep/10 at 2:15 pm # :

    Salut l'artiste ! (your super doudeul is still standing à l'honneur on our home page)
    et grand merci pour ce nouveau jolijoujou, as you already know I'm collecting yours.

    Thanks to "various spam plugins on patrol" as well, I don't even see most of this crap content, just attempts stats. Fortunately I've got two of them in my file de commentaires indésirables today, happy to share useless fun stats with you :

    Spam Magnet Blog Posts :

    familles 50%
    WPtouch avec W3 Total Cache ou WP SuperCache 50%

    Notons que "familles" est en l'occurrence le simple titre d'une page statique (au sens WordPress), ce qui m'incite à penser que la partie nominative "Blog Posts" de ta sympatoche extension est peut-être un poil restrictive.

    J'avais moi aussi remarqué la corrélation spam/tags-keywords-mots clés, et bien qu'ayant supprimé ces mots à la base, ils continuent d'attirer les mouches à m…iel :
    comment expliquer, par exemple, que tout visiteur arrivant par le tag "agglomérat" soit inévitablement répertorié comme spammeur quand on le checke sur les sites appropriés ? Bon, d'un côté, c'est pratique, ça accélère le repérage et les mesures à prendre…

  25. 25
    José Luís Brazil »
    thought, on 08/Oct/10 at 4:01 pm # :

    Great plugin!

    However, its strong point is also its weakness: since it does not create a table, it only works if you does not delete your comment spam. So, unfortunately, I am not eligible to use it.

    Anyway, I have a question/suggestion: why don't you create a plugin to serve as an interface to dinamic Google Graphics?

Leave a Reply

Comment Guidelines or Die

  • HTML: You can use these tags: <a href=""> <em> <i> <b> <strong> <blockquote>
  • Posting code: Post raw code (no <> &lt; etc) within appropriate tags : [php][/php], [css][/css], [html][/html], [js][/js], [sql][/sql], [xml][/xml], or generic [code][code]
  • Gravatars: Curious about the little images next to each commenter's name ? Go to Gravatar.
  • Spam: Various spam plugins on patrol. I'll put pins in a Voodoo doll if you spam me.
  • I will mark as Spam test comments, all comments with SEO names (ie "My Cool Online Shop" instead of "Joe") or containing forum-like signatures.

Read more ?