Friday, December 9, 2011

Switch to Google Safe Browsing v2

Google maintains a list of malicious URLs and phishing sites distributed through their Google Safe Browsing API. On December 12, version 1 was deprecated in favor of version 2. The API for version 2 works quite differently from version 1.

Importance of Google Safe Browsing

Google Safe Browsing is part of most popular web browsers including Firefox, Chrome, Safari and Opera. Internet Explorer uses it owns list, Microsoft SmartScreen. This makes Google Safe Browsing lists the most used security filter among all web users.

The Google Safe Browsing lists are also very extensive. There are currently about 460,000 entries in the lists and they are updated every 30 minutes. You can refer to "Google Safe Browsing v2: Implementation Notes" for more detailed numbers.

Coverage

I was curious see the overlap between Google Safe Browsing v2 and a few other security blacklists
Of the Alexa top 1,000,000 sites, 250 are blocked by Google Safe Browsing v2.

Google Safe Browsing v2 libraries

The Google Safe browsing v2 API is fairly complex, at least more so than version 1. There are a number of libraries available, but not all implement the complete API. Here is a list of the libraries available within Google Safe Browsing v2:

Language Name Missing features Comment
Python google-safe-browsing none Reference implementation from Google
Perl Net::Google::SafeBrowsing2 none Several back-ends available for storage: MySQL, Sqlite, DBI, etc.
PHP phpgsb MAC Helpful statistics for testing
PHP gsb4u MAC Storage: MySQL, Sqlite;
C# google-safebrowse-v2-client-csharp MAC
Back-off mechanism ?
Save full hashes,
discard them after 45 minutes
MAC
Storage: data file
C# Google-Safe-Browsing-API-2.0-C-p MAC Storage: SQL server
Java jGoogleSafeBrowsing ??? Not finished?
Google Safe Browsing v2 libraries

Lookup API

If you need to check fewer than 10,000 URLs a day, you can use the much simpler Lookup API. This API allows you to send URLs directly to Google and receive the classification.

I've made a Perl library for the Lookup API, Net::Google::SafeBrowsing2::Lookup and I'm working on Ruby anfd Python implementations.

3 comments:

Simon Kelly said...

Hi

I have created and implementation of the API in Scala. It is based on your Perl code.

The source can be found on GitHub: https://github.com/snopoke/google-safebrowsing2

Julien Sobrier said...

@Simon Thanks you, I have seen your library on the GSB mailing list.

Edward Chu said...

@Simon

I am trying to use your project in my java project with MySQL.

I am using "gumblar.cn as a test case. After the databse is in sync, calling sb2.jLookup("http://gumblar.cn/", null, false) still returns null because the host key ("29d8ce97") does not exist in database.

Google's diagnostic tool (http://google.com/safebrowsing/diagnostic?site=gumblar.cn/) determines that the site is suspicious.

I have also used GSB Toolkit (http://gsbtool.beaver6813.com/hlookup.php) as reference, the hash ("29d8ce97") has a match in chunk number 80880.

However, the data stored in database has different host key.
mysql> SELECT * FROM viglink.gsb2_addchunks where iAddChunkNum = 80880;
+----------+----------+--------------+---------------------+
| sHostkey | sPrefix | iAddChunkNum | sList |
+----------+----------+--------------+---------------------+
| CEB11987 | 1828D855 | 80880 | goog-malware-shavar |
+----------+----------+--------------+---------------------+

Sorry for this long message, but I would like to see if I am missing anything important to make it work. Thanks!