Friday, May 15, 2009

Those who know where and when you surf

Nobody likes the idea of a big brother watching over them as they surf. Yet as browser technology evolves, we are starting to have some privacy-violating "byproducts" of innocuous features. Now, we already generally accept that our immediate/upstream Internet providers can possibly snoop on our traffic—they can see what sites we try to access (whether by IP destination or monitoring DNS queries) and can peek into plaintext content. But the ability for an ISP to monitor traffic is limited to only the traffic that passes through it—namely, its customers. What if there were central organizations that could monitor user traffic (to a certain degree) across any/all ISPs?

I'm going to go ahead and overlook the obvious companies/services where you install a toolbar into your browser so they can watch and tabulate everywhere you go (like Alexa). That is an opt-in situation where you willingly give all your info to a third party. I'm also going to overlook any web proxy services that you might be using, because those are also opt-in situations with the same ramifications. And of course, any internal monitoring within an organization is also exempt; I want to focus on who, outside of your local network, organization, and your ISP, is receiving your surfing data without you explicitly opting-in to give it to them.

So let's start with HTTPS, as implemented in common web browsers. Most common browsers include certificate revocation checking capabilities for ensuring an HTTPS site certificate hasn't been compromised. When you access an HTTPS site, the browser will receive the server certificate and run off the to the designated CRL and OCSP servers to query whether the certificate has been revoked. Those CRL and OCSP servers are operated by the certificate issuers, such as Thawte, Verisign, etc. The key here is that many SSL certificates have a one-to-one relationship with a specific web site (i.e. traditional certificates and not wildcard certificates); so a query to an OCSP responder for a web site certificate essentially tells the certificate issuer that you are in the process of accessing that site. Now, certificate issuers can only track accesses to sites they issue certificates for—but in 2007, Verisign and subsidiaries were deemed to have issued over 57% of the SSL certificates in use, giving them the theoretical capability to track site access to over half of the HTTPS sites on the Internet.

Fortunately none of the OCSP responders I briefly reviewed set HTTP browser cookies, which means they are not tracking unique individuals (at least, not through simple browser means). That is a good thing; the best tracking granularity they can achieve is generally per source IP address (ignoring browser header/request and TCP protocol fingerprinting to differentiate different browsers/systems from the same IP). This can be problematic if you're a home user and have a one-to-one relationship with your IP address, but tolerable if you're behind a NAT or proxy that serves many users (giving it a many-to-one relationship with the IP address). [Aside/favor: If you happen to see a public OCSP setting browser cookies, let me know!]

Default browser start pages are another privacy leaking avenue. Every time you start your web browser, Microsoft, Apple, Opera, or Mozilla hears about it through the various default browser start page request(s):

http://en-us.start2.mozilla.com/firefox?client=firefox-a&rls=org.mozilla:en-US:official
http://www.apple.com/startpage/
http://portal.opera.com
http://go.microsoft.com/fwlink/?LinkId=74005
http://runonce.msn.com/runonce3.aspx


The start pages themselves may seem innocuous, but they can chain to more exposure. For example, Apple's start page includes a request to metrics.apple.com, which is really a pointer to Omniture (resolving metrics.apple.com returns a DNS CNAME to appleglobal.112.2o7.net); the request collects and sends a lot of identifying browser information including screen resolution and all installed browser plugins. I've seen occasions where responses to metrics.apple.com forward to Doubleclick.net too. Now, sure, it's common for web pages to include metrics/analytics and advertising links—but such elements on browser start pages means these syndication/service providers also have access to realtime information of when you start your browser. In addition to Apple's use of Omniture and Doubleclick, Opera's portal page includes resource links to Google Analytics, and Mozilla redirects to Google. Microsoft's portal page used Webtrends.com for metrics.

But more importantly, can any of these providers uniquely identify you? Microsoft relies on m.webtrends.com, and that site sets a unique cookie for you--so Webtrends has the ability to track you no matter where you go. Ditto for Apple, Omniture, and Doubleclick. The fact that these metric and advertising elements are hosted on browser start pages means the services have the opportunity to plant their unique cookie right at the beginning of your surfing session. Plus, look at where and how the start pages are hosted. Microsoft redirects from Microsoft.com to MSN.com, where the start page actually lives. By placing the start page on MSN, all of the MSN cookies are now available--and if you are logged in, that means Microsoft knows who you are and can individually identify you if they so choose. Since Mozilla redirects to Google, the same case generally applies to Google using its own cookies to identify/track you.

But fortunately this is only a minor information exposure; knowing you started your web browser is not exactly an earth-shattering privacy violation, and it only occurs at the very beginning of your surfing session (not when you open a new tab, etc.). Plus, this immediate information doesn’t expose where you actually wind up surfing to...just that you opened a browser. The previously mentioned OCSP issue reveals far more information regarding where you are surfing; but it is not the only feature that does so...

Many browsers have recently added various anti-phishing and safe surfing features which essentially query the host/URL you are accessing in a database to see if it's a known offender and thus should be blocked. Think about that for a second: for every URL/host you visit while surfing the Internet, a lookup is done to ensure the URL is safe. So how is that lookup being done?

Some of the browser vendors have designed their features so that databases of the known offenders are downloaded and stored on the user's local system, so they can be queried locally. This is ideal, as it means the lookups are fast (just look inside the downloaded file) and it doesn't expose the lookups that are actually occurring. However, note that I began this paragraph with the term "SOME of the browser vendors"...

It turns out the biggest questionable privacy offender is Opera. Their SiteCheck function, enabled by default and meant to warn you when you try to access a malware site, sends a real-time request to sitecheck2.opera.com for every new host you surf to. The transaction looks something like:

GET /?host=www.google.com&hdn=naLWSHPy7ud1pACYor32hg== HTTP/1.1
User-Agent: Opera/9.64 (Windows NT 5.2; U; en) Presto/2.1.1
Host: sitecheck2.opera.com
...

HTTP/1.1 200 OK
Date: Fri, 15 May 2009 15:17:11 GMT
Content-Type: text/xml
...
<?xml version="1.0" encoding="utf-8"?>
<operatrust version="1.1">
<action type="searchresponse">
<host></host>
<ce>14400</ce>
<w>1</w>
</action>
</operatrust>


That means Opera hears about every web site host you access, and can keep a historical record if they really wanted to. Now, the SiteCheck queries are not utilizing cookies to track individual users, so tracking granularity is limited to per-source-IP resolution. But it's still a notable privacy exposure. Other browser features like IE8's Suggested Sites likely expose the same level of information, but I have not confirmed it personally (but their level of privacy warnings and opt-in requirements seem to suggest so).

So now that we've gone through all of that, what can you do about it? Well, the good news is that you can generally turn off all of the above mentioned behaviors (you'll have to dig around in the Advanced Configuration areas of your browser)...but you might not want to. Sure, you can change your browser start page to something else without ramifications (actually, if you change it to a blank page, you'll find your browser starts faster because it doesn't have to initially load an external page). But OCSP checking and SiteCheck features are security protections that actually help keep you safe--so by turning that stuff off, you are opening yourself to more risk. So you need to decide what’s more important to you: your security, or your privacy.

Happy deciding,
- Jeff

4 comments:

Mike A. said...

Any website with javascript enabled can know where you surf.

http://startpanic.com/

Jeff Forristal said...

Mike,

While that is true, it's a bit more limited in form overall compared to the widespread things I was mentioning. First, history query attacks in browsers act like an oracle...you have to query them for the explicit sites you are interested in, and you will only learn as much as you ask. Second, such a setup/site can only nab users who somehow get to the site in the first place. And unless those users keep revisiting, you won't get updates/trends.

Plus, direct history attacks are only the tip of the iceberg; you can mash that up with other data in realtime to figure out new things. For example, this site uses your history along with Quantcast data to estimate your gender based on your previous browsing habits:
http://www.mikeonads.com/2008/07/13/using-your-browser-url-history-estimate-gender/

Fun stuff.
- Jeff

Anonymous said...

FYI Microsoft claims to strip and PII from what is sent to their anti-phishing web services:
http://blogs.msdn.com/ie/archive/2008/07/02/ie8-security-part-iii-smartscreen-filter.aspx

Jeff Forristal said...

Anonymous -

Actually, the wording in the blog post says it's sent (encrypted) over HTTPS (so, nothing stripped there) and "the data is not stored with a user's IP address or other personally identifiable information." This is really an on-your-honor statement like any other privacy policy affirmation. While I'll give Microsoft the benefit of the doubt, it's not exactly a guarantee.

Plus, things like this are subject to word games, such as the blog post statement "no personally identifiable information is retained or used for purposes OTHER THAN IMPROVING ONLINE SAFETY" (emphasis mine) That can easily mean that the information is retained for certain purposes (i.e. improving online safety), not stripped. Does "improving online safety" include tracking individual users to come up with aggregated trends to the surfing habits of a generalized user profile, which is then re-applied as a basline back to the security protection mechanisms? In other words, you could likely get away with tracking users as long as it is the means and not the end, and the end winds up being something related to improving security as stated.

- Jeff