Lately, every time I manually flagged a comment as spam, I started to have the impression that they all were posted on the same post, namely an old announcement for my plugin "Absolute Comments".
An SQL query later, my impression was confirmed: this blog post *is* a spam magnet:
I suspect that some particular keywords in page titles act like pheromones to spammers. These keywords probably include "comments", "guestbook", "feedback" and anything that will make them think there's a form they can spam.
The SQL query I used to list blog posts sorted by the number of spam they get is:
- SELECT COUNT(wp_posts.ID) as spam_count, wp_posts.ID, wp_posts.post_title
- FROM wp_posts, wp_comments
- WHERE wp_comments.comment_approved = 'spam' AND wp_comments.comment_post_ID=wp_posts.ID
- GROUP BY wp_posts.ID
- ORDER BY spam_count DESC
If you want to play with it, I made a quick standalone script that will run the query and output a pretty interactive pie chart. Download spam-magnet.txt, rename as .php and put it in your blog root (where wp-load.php is). Live demo: spam-magnets.php.
Edit: per request, I've made this a plugin too.
If you run it and notice a pattern on your blogs too, share your analysis! I've always enjoyed useless fun stats :)
Shorter URL
Want to share or tweet this post? Please use this short URL: http://ozh.in/tf
I had discovered very high counts (tens times larger than average) on some posts when doing research for my antispam plugin.
Since then I no longer retain spam older than one month. Checking now it had pretty much leveled, with several mild spikes.
So I suspect it's more likely the case of some stupid bot getting obsessive with post, rather than intelligent targeting of keywords.
At first glance, my stats reinforce the keyword theory. The post with the highest number of spam comments has words like "send", "message" and "forum" in the title. However, the second most-spammed post is one that discusses how to write drivers in Delphi. Huh?
Overall, the only real pattern that I can see is that old, popular posts take the brunt of the comment spam.
It's not the first time actually that I notice the keyword magnetism. Before I closed comments on it, my most attractive blog for spammers was Cool Guestbooks, much more than any other
Thanks to your awesome script, I found a few spam magnets that shouldn't have even had comments open in the first place. :)
Query can be upgraded :
Using table alias make it shorter and somehow easier to read.
Using ANSI joins makes a clear isolation between the filter and the join predicate.
Also, note that using the post_title this way works only on MySQL, this syntax is much much forbidden in every other RDBMS.
I think my results buck the trend, as my top result by an order of magnitude is "mouldy bread". But, then, my site is a tad off-the-wall.
That's a useful script – thanks, Ozh!
Could it be made into a plugin?
I don't have a wp-upload.php? Can this still be ran without that file?
I've never heard of wp-load.php. Where is that? Do I need it?
You all have a wp-load.php file, it's the file WP uses to load everything needed. Don't do anything with this file, it's just mentioned to explain in which directory you should drop the file. dear god…
Outstanding, Ozh.
I'll upload this as soon as I get a chance.
It would be fun to compare all the Spam Magnets from around the blogosphere, and check the word clouds for common ground.
Might there be a way to export those in a uniform manner, so you can build a scalable and useful data set?
I'm still a bit confused. If I put this file in the root, in that file it's calling for the wp-upload as required. I do not have that wp-upload.php file at all.
I'm getting the following error when I run the script.
Result of expression 'this.ka.a[0]' [undefined] is not an object.
Jen » Sorry you don't qualify to use this script.
Ike » I don't think a global list would me much valuable. The results on each blog are either worthless (no particular keyword), very predictable (keywords such as "comments", "guestbook"), or specific keywords very tight to the blog topic if any. Collecting the data would require collecting much more stuff like blog topic, blog age, and a lot of other statistical stuff I have no clue about.
I'm getting this, too, though it's not exactly the same as Richard's error.
'this.ka.a[0]' is undefined
And a search for that string, and sub-sets of it, in the WP directory tree turns up zip.
WP 3.0.1, MySQL 5
BJ Johnson » This is not a WP error, this is generated by the Google chart. Dunno. Can't fix.
Ozh, dude I just tried a couple of your plugins, wow you got a real gift! Thanks!
This little script is just great! And it works on WP3.0 in Multi-Site mode too :)
Anybody made this into a small plugin yet? I'm thinking about putting the chart on each dashboard in my network so siteowners can benefit from the knowledge too..
Peeps, I've packaged this as a plugin: http://wordpress.org/extend/plugins/ozh-spam-magnet-checker/
The plugin works a treat, Ozh. Thank you.
Thanks for this. It is a nice little plugin. I've gotten way too much spam, which I've fought for years, I switched to DISQUS and now I seem to get less, but I also did some other drastic things over the years, like delete 1000s of spam comments permanently at some point, so my historical graph is only considering the last 1000 or two thousand spam comments I've gotten.
What bothers me is they leave the same comment on so many posts…
You ROCK !
Salut l'artiste ! (your super doudeul is still standing à l'honneur on our home page)
et grand merci pour ce nouveau jolijoujou, as you already know I'm collecting yours.
Thanks to "various spam plugins on patrol" as well, I don't even see most of this crap content, just attempts stats. Fortunately I've got two of them in my file de commentaires indésirables today, happy to share useless fun stats with you :
Spam Magnet Blog Posts :
familles 50%
WPtouch avec W3 Total Cache ou WP SuperCache 50%
Notons que "familles" est en l'occurrence le simple titre d'une page statique (au sens WordPress), ce qui m'incite à penser que la partie nominative "Blog Posts" de ta sympatoche extension est peut-être un poil restrictive.
J'avais moi aussi remarqué la corrélation spam/tags-keywords-mots clés, et bien qu'ayant supprimé ces mots à la base, ils continuent d'attirer les mouches à m…iel :
comment expliquer, par exemple, que tout visiteur arrivant par le tag "agglomérat" soit inévitablement répertorié comme spammeur quand on le checke sur les sites appropriés ? Bon, d'un côté, c'est pratique, ça accélère le repérage et les mesures à prendre…
Great plugin!
However, its strong point is also its weakness: since it does not create a table, it only works if you does not delete your comment spam. So, unfortunately, I am not eligible to use it.
Anyway, I have a question/suggestion: why don't you create a plugin to serve as an interface to dinamic Google Graphics?