http:BL is a blacklist of all the suspicious IPs that were trapped in one of the honey pots run for Project Honey Pot. This service has a simple API, allowing anyone to check an IP against their blacklist. Here is a detailed and simple example script showing how to use this API.
Honey pot ?
Simply put, a honey pot is a webpage that will trick malicious users (as in "robots") into doing something that will allow collecting data about them, while real users (as in "human") won't notice or do anything particular.
For example, this is a honey pot. Open this page in your browser, and you'll see nothing but uninteresting legalish text. Now have a look at the source of the page, and you will spot a hidden <div> containing a form and emails. So, typically, if anything inputs text in that form, or sends anything to those emails, it's not human. It's a program spidering the web in search for spam food.
Using http:BL
First, you'll need to create an account on PHPot in order to be given a access key. Don't worry, it's free, and I even suspect it's a spam-free service :P Your access key will be a random string, like ab234fghijkl
Testing an IP with http:BL is a simple DNS query. For example, to check IP 12.13.14.15 you will need to query the following domain :
ab234fghijkl.15.14.13.12.dnsbl.httpbl.org.
The red part is your accesskey, the green part is the IP in the reversed octet format.
The DNS query response will be something like 127.3.5.1 with the following meaning :
- 127: the first octet is always 127. if it's not 127, then the query has failed, for some reason.
- 3: the second octet is the number of days since last activity of the checked IP
- 5: the third octet represents a threat score for IP. The greater this number, the more dangerous it is.
- 1: the last octet defines the type of visitor
For more detailed information refer to the API documentation. For now, we'll just make a simple example script to check wether an IP is threatening or not.
- // your http:BL key
- $apikey = 'abcdefghijkl';
- // IP to test : your visitor's
- $ip = $_SERVER['REMOTE_ADDR'];
- // build the lookup DNS query
- // Example : for '127.9.1.2' you should query 'abcdefghijkl.2.1.9.127.dnsbl.httpbl.org'
- $lookup = $apikey . '.' . implode('.', array_reverse(explode ('.', $ip ))) . '.dnsbl.httpbl.org';
- // check query response
- $result = explode( '.', gethostbyname($lookup));
- if ($result[0] == 127) {
- // query successful !
- $activity = $result[1];
- $threat = $result[2];
- $type = $result[3];
- if ($type & 0) $typemeaning .= 'Search Engine, ';
- if ($type & 1) $typemeaning .= 'Suspicious, ';
- if ($type & 2) $typemeaning .= 'Harvester, ';
- if ($type & 4) $typemeaning .= 'Comment Spammer, ';
- $typemeaning = trim($typemeaning,', ');
- echo "IP seems to belong to a $typemeaning ($type) with threat level $threat";
- }
This simple snippet checks an IP against PHPot's blacklist, an if anything suspicious is detected, it outputs its verdict. How simple was that ?
Now, what to do with this ? You can for example prevent your email address from appearing on a contact form if a email harvester is detected, or disable comment posting for all comment spammers. We will log and block malicious users :
- // Our blocking policy
- if (
- ($type >= 4 && $threat > 1) // Comment spammer with very low threat level
- ||
- ($type < 4 && $threat > 40) // Other types, with threat level greater than 40
- ) {
- $block = true;
- }
- if ($block) {
- logme($block,$ip,$type,$threat,$activity);
- blockme();
- die();
- }
The logme() function would be just a logfile writing, collection a few data for further analysis : ip, requested page, user agent, etc…
The blockme() function would be a nice "403 Fordidden" screen, explaining that unfortunately the IP was flagged as malicious and therefore the access to the page is not granted.
What about false positives and legitimate users ?
That's a good one. I believe every blocking measures should give a second chance to real users. It could be some harmless and innocent reader using a popular open proxy to read your site behind their corporate firewall, after all.
A simple yet effective way to give real humans a chance to see your page is giving them a javascript redirection link. 99.9% of the infrequent false positive should have a real browser with javascript enabled. We will also set a cookie that will tell the checking script not to annoy this user and let him access pages.
- function blockme() {
- header('HTTP/1.0 403 Forbidden');
- echo <<<HTML
- <script type="text/javascript">
- function setcookie( name, value, expires, path, domain, secure ) {
- // set time, it's in milliseconds
- var today = new Date();
- today.setTime( today.getTime() );
- if ( expires ) {
- expires = expires * 1000 * 60 * 60 * 24;
- }
- var expires_date = new Date( today.getTime() + (expires) );
- document.cookie = name + "=" +escape( value ) +
- ( ( expires ) ? ";expires=" + expires_date.toGMTString() : "" ) +
- ( ( path ) ? ";path=" + path : "" ) +
- ( ( domain ) ? ";domain=" + domain : "" ) +
- ( ( secure ) ? ";secure" : "" );
- }
- function letmein() {
- setcookie('notabot','true',1,'/', '', '');
- location.reload(true);
- }
- </script>
- <h1>Forbidden</h1>
- <p>Sorry. You are using a suspicious IP.</p>
- <p>If you <strong>ARE NOT</strong> a bot of any kind, please <a href="javascript:letmein()">click here</a> to access the page. Sorry for this !</p>
- <p>Otherwise, please have fun with <a href="http://planetozh.com/smelly.php">this page</a></p>
- HTML;
- }
Now, before checking an IP, we'll first check for any cookie named 'notabot' with value 'true'. If there is one, don't bother making any check against the blacklist, and let the user access the page.
Wrapping it up
Here is the final script that checks for a whitelisting cookie, otherwise checks the IP and decide whether to block or not the user, logging malicious attempts accordingly. It also logs people clicking on the "I'm a human, not a bot" link so that you can measure how tight or lose your blocking policy is.
- httpbl.php (highlighted code, cut and paste)
- httpbl.txt (raw text, save as .php)
To use the script, you would include it on the very top of your pages, i.e. :
- <?php require('/home/you/blog/httpbl.php'); ?>
Disclaimer and stuff
This script is a rather simple example serving as a basic http:BL tutorial for PHP. There has to be room for some improvements, such as better logging, or giving alternate javascript-free access to legitimate users.
Project Honey Pot is an awesome initiative in which you can contribute by setting up your own honey pots. Not only it's as easy as 1-2-3, but it's kind of rewarding : the day I had my first honey pot installed, it identified a new before-unseen harvester :) Installing a honey pot is an easy way of making the web a cleaner place. Or at least contributing to do so.
Links to Project Honey Pot include my referral number. I'm not earning anything but, maybe, satisfaction. What are you waiting for ? Install your honey pots.
Shorter URL
Want to share or tweet this page? Please use this short URL: http://ozh.in/lc
[…] in writing their own script, there is already an http:BL WordPress Plugin waiting for you. Sphere: Related Content (No Ratings Yet) Loading… […]
Great stuff, Ozh – merci!
However, I installed the httpBL Apache2-module last night, AFAIK when that is enabled it makes this stuff in WordPress kind of obsolete, right? It would double check every connection to the weblog: once through httpd and second in WP.
You'd get a double check, but I don't think there's much point as the service uses DNS. The DNS query/result would be cached by your host, so both apache & WP would get the same result.
( I think ) ;)
BOK » indeed I too think having both the apache module and a PHP script is redundant in principles. I don't know how this apache module works, but the 2 solutions could be discussed regarding 2 things :
– performance : i *guess* the module should be faster than a PHP script, although it should depend a lot on apache config & building
– flexibility : how fine-tunable and configurable the module is ? A php script is easy to configure (for example modify the threat level for a particular type of bot). How about the module ?
Sore I'm a little late, was AFK / in Antwerp during the weekend…
– flexibility: (and configuration) I have taken over some stuff from the PDF-documentation in my httpd.conf that looks like this
HTTPBLDefaultAction allow
# allow all search engines
HTTPBLRBLReqHandler 255:0-255:0-255:0 allow
# deny any other listed IPs with any "score" that have been active in the
# last 30 days
HTTPBLRBLReqHandler 255:0-30:0-255:255 deny
It's still a bit cryptic to me, but as it's still in a testing-phase I guess the documentation will improve the coming weeks (Howto's, wiki, etc.).
[…] Honey Pot & http:BL Simple PHP Script « planetOzh A useful resource on how to write a honey pot http:BL php script (tags: security PHP honeypot anti-spam) […]
Hey, thanks for the script. It works great. I've modified it a little to fit my needs better (mostly with the logging), but love it!
Incase anyone wants to run this from perl/cgi, add this to the begining of your perl script:
my $httpbl = `/usr/bin/php /path/to/script/httpbl.php`;
if ($httpbl) {
print "Content-Type: text/html\n\n";
print "$httpbl";
exit();
}
Hey, thanks for the script. Works great!
Error "typemeaning" undefined …
Add this line " $typemeaning = (""); " as shown below then the error disappears. If it is defined as a empty variable, then it selects from the list that is defined.
if ($result[0] == 127) {
// query successful !
$activity = $result[1];
$threat = $result[2];
$type = $result[3];
$typemeaning = ("");
Sorry, I have some lead in to my questions…
You see, I was visiting another site that decided, inexplicably and out of the blue, that I might be a robot! This was a surprise to me because I had stubbed my toe earlier today and it hurt like…well you know, and I'd have sort of thought robots wouldn't feel throbbing pain. But who am I to know these thing? Can I truly know that I'm not a robot? What does it mean to 'know' something about one's self, when I'm always biased by my experience, or possibly my programming? Alas, any further scrutiny of these questions meanders down a dark, slippery, and winding path of existential exploration that will either bore or alarm you, the gentle reader, and which with your kindest interest at heart, I shall skip.
Anyway, the site that claimed I was probably a robot gave me a JavaScript link to click to continue browsing there, but since I happened to have JavaScript disabled at the time (as I usually do), and since I wasn't really enthusiastic to turn it on, the link was mostly useless. It also offered a more tantalizing and very slightly hidden link that it told me I ABSOLUTELY SHOULDN'T CLICK. Naturally, I had no choice but to click it.
Really, put a big red button marked "DO NOT PRESS" in a public place and see what happens! What did they expect? Arghhhh!
Right, so the verboten link brought me to your page "planetozh.com/smelly.php" which I read with keen interest. Noting that you expressly forbade use of any email addresses that might appear on the page, yet also noting that none were readily apparent, I inspected the page source to see what I might be missing. And there it was, the forbidden email address! An address which I won't repost here because it's worth at least $50 and I'm not feeling very rich, and even if I were I'd prefer to spend my money on a bottle of ibuprofen for the damaged toe, or perhaps on some hydraulic fluid or something if it turns out I am a robot.
While investigating this hidden yet enticing email address I also noticed your page had a form, which I completed as follows: "Sbj", a brief question "who reads this stuff?", a sort of throwaway email address that I check about every four months, and my guest name which I recorded as "Some Guest." After submitting the form, the data I entered were summarily reproduced at the bottom of your page, to my slight disappointment. I was kind of hoping for something a bit more momentous for all the effort, like maybe some red text on a black background wrapped in Geocities-style blink tags warning me that I submitted the form. Really, it was a let down.
OK, with that background out of the way, a few questions:
1) Who reads the stuff submitted with that form? I saw it copied back on the bottom of the page but hoped something more interesting might happen.
2) Is my IP address permanently or temporarily blacklisted someplace? What happens to the email address I submitted?
3) What's the "Sbj" field in the form for?
4) Can you change the response after submitting the form to make it do something more interesting?
Casual follow-up through Google on some of these led me here, where I was pleased to find I could post my questions directly! Thank you in advance for any additional insight or clarification.
Now if only I could remember what site I was trying to use before it inexplicably decided I was a robot…
Cheers,
Some Guest
Is this project honey pot httpbl list still being actively screened — I think the project has either been scrapped, or isn't actively monitored anymore. Can you clarify??
Thanks for this, it was very usefull. Managed to get rid of some stupid comment spammer using your script.
greetings
tina
http://geovoyagers
Hi. Is project honey pot still maintained?
Project HoneyPot is awesome. I put an http:BL implementation on my blog and spam hits decreased PROFOUNDLY.
YOURLS is awesome.
YOURLS + http:BL = naturally great