Google maintains a list of malicious URLs and phishing sites distributed through their Google Safe Browsing API. On December 12, version 1 was deprecated in favor of version 2. The API for version 2 works quite differently from version 1.
Importance of Google Safe Browsing
Google Safe Browsing is part of most popular web browsers including Firefox, Chrome, Safari and Opera. Internet Explorer uses it owns list, Microsoft SmartScreen. This makes Google Safe Browsing lists the most used security filter among all web users.
The Google Safe Browsing lists are also very extensive. There are currently about 460,000 entries in the lists and they are updated every 30 minutes. You can refer to "Google Safe Browsing v2: Implementation Notes" for more detailed numbers.
Coverage
I was curious see the overlap between Google Safe Browsing v2 and a few other security blacklists
Google Safe Browsing v2 libraries
The Google Safe browsing v2 API is fairly complex, at least more so than version 1. There are a number of libraries available, but not all implement the complete API. Here is a list of the libraries available within Google Safe Browsing v2:
Lookup API
If you need to check fewer than 10,000 URLs a day, you can use the much simpler Lookup API. This API allows you to send URLs directly to Google and receive the classification.
I've made a Perl library for the Lookup API, Net::Google::SafeBrowsing2::Lookup and I'm working on Ruby anfd Python implementations.
Importance of Google Safe Browsing
Google Safe Browsing is part of most popular web browsers including Firefox, Chrome, Safari and Opera. Internet Explorer uses it owns list, Microsoft SmartScreen. This makes Google Safe Browsing lists the most used security filter among all web users.
The Google Safe Browsing lists are also very extensive. There are currently about 460,000 entries in the lists and they are updated every 30 minutes. You can refer to "Google Safe Browsing v2: Implementation Notes" for more detailed numbers.
Coverage
I was curious see the overlap between Google Safe Browsing v2 and a few other security blacklists
- Malware domain list: 18,670 blocked / 71,352 entries (26%)
- Clean-MX Phishing: 540 blocked / 1,820 entries (30%)
- Phishtank: 1,318 blocked / 5,665 entries (24%)
Google Safe Browsing v2 libraries
The Google Safe browsing v2 API is fairly complex, at least more so than version 1. There are a number of libraries available, but not all implement the complete API. Here is a list of the libraries available within Google Safe Browsing v2:
| Language | Name | Missing features | Comment |
|---|---|---|---|
| Python | google-safe-browsing | none | Reference implementation from Google |
| Perl | Net::Google::SafeBrowsing2 | none | Several back-ends available for storage: MySQL, Sqlite, DBI, etc. |
| PHP | phpgsb | MAC | Helpful statistics for testing |
| PHP | gsb4u | MAC | Storage: MySQL, Sqlite; |
| C# | google-safebrowse-v2-client-csharp | MAC Back-off mechanism ? Save full hashes, discard them after 45 minutes MAC |
Storage: data file |
| C# | Google-Safe-Browsing-API-2.0-C-p | MAC | Storage: SQL server |
| Java | jGoogleSafeBrowsing | ??? | Not finished? |
Lookup API
If you need to check fewer than 10,000 URLs a day, you can use the much simpler Lookup API. This API allows you to send URLs directly to Google and receive the classification.
I've made a Perl library for the Lookup API, Net::Google::SafeBrowsing2::Lookup and I'm working on Ruby anfd Python implementations.
3 comments:
Hi
I have created and implementation of the API in Scala. It is based on your Perl code.
The source can be found on GitHub: https://github.com/snopoke/google-safebrowsing2
@Simon Thanks you, I have seen your library on the GSB mailing list.
@Simon
I am trying to use your project in my java project with MySQL.
I am using "gumblar.cn as a test case. After the databse is in sync, calling sb2.jLookup("http://gumblar.cn/", null, false) still returns null because the host key ("29d8ce97") does not exist in database.
Google's diagnostic tool (http://google.com/safebrowsing/diagnostic?site=gumblar.cn/) determines that the site is suspicious.
I have also used GSB Toolkit (http://gsbtool.beaver6813.com/hlookup.php) as reference, the hash ("29d8ce97") has a match in chunk number 80880.
However, the data stored in database has different host key.
mysql> SELECT * FROM viglink.gsb2_addchunks where iAddChunkNum = 80880;
+----------+----------+--------------+---------------------+
| sHostkey | sPrefix | iAddChunkNum | sList |
+----------+----------+--------------+---------------------+
| CEB11987 | 1828D855 | 80880 | goog-malware-shavar |
+----------+----------+--------------+---------------------+
Sorry for this long message, but I would like to see if I am missing anything important to make it work. Thanks!
Post a Comment