Sunday, January 30, 2011

Egypt ... now just gyped

There are a number of good references discussing the recent events in Egypt (protest timeline) and the subsequent Egyptian-government ordered Internet shutdown (ISPs in Egypt have withdrawn their routes via BGP- timeline). This shutdown includes cellphone SMS/MMS/data networks. The premise for the government-ordered shutdown was to avoid what recently happened in Tunisia where social networks (Facebook in particular) and blogs helped to strengthen and organize protests (reference). In Egypt, a video was recorded of a protester being shot in the head, point-blank by police - in an effort to prevent this and other information from going viral and escalating protests, the Egyptian government (allegedly) ordered this Internet-shutdown.

In a recent report, it was shown that over 50% of Egypt's Internet users are the youth of the nation (18-34 year old) and that the "social media scene has quickly gained ground" among its users. Zscaler was servicing a number of web transactions for customers from Egypt before the routes went dark. I wanted to share some stats on what we were seeing in terms of web usage up until the plug was pulled January 27 at 22:34 UTC.

The following chart shows the daily percentage of the week's web transactions from/to Egypt clients/servers that traversed our cloud from January 24th - 28th. The y-axis is the percentage of transactions from all Egypt transactions we observed from Jan. 24- 28.

Egypt's Web Transactions Preceding Shutdown

Our data showed a 68% percent increase in transactions to Egypt web servers on January 26th - the spike was most noticeable in the News/Media category of web servers for people using Egyptian news sources to obtain information on the protests. Then we see the decline and eventual drop to (near) zero on January 28th for Egyptian web transactions (client and server). Taking a look at the web server transactions for the 28th showed www.egyptse.com, the Egyptian Stock Exchange, (217.139.183.2 - NOOR network) remaining live and visited by customers - as others have noted, this remains the only live Egypt network.

Among the Egyptian websites that were visited on the 27th, that are no longer accessible include:
  • *.masrawy.com (41.178.51.93)
  • *.ahram.org.eg
  • *.arabia.msn.com (41.178.51.12)
  • www.egynews.net -> productnews.link.net (41.178.51.29)
  • egypt.usaid.gov (196.219.223.215)
  • algomhuria.net.eg
  • ahram.org.eg
  • *.gov.eg sites
The .eg domains no longer resolve due to the Egyptian nameservers being inaccessible from the Internet outage. There are stories that discuss this DNS outage (here).

On January 27th, prior to the shutdown, this is the breakdown in web surfing activity that was being seen from client traffic originating in Egypt.
The above chart illustrates the Egyptian Internet usage during the protests and leading up to their Internet shutdown. Social media and news related web pages accounted for roughly 65% of the web browsing that was done from Egyptian client IPs through our cloud.

The top sites visited by Egypt web clients on the 27th include:
  • Facebook related (42.02% total)
    • *.facebook.com (25.36%)
    • *.fbcdn.net (16.66%)
  • *.aljazeera.net (6.63%)
  • Google (6.96%)
The Internet remains dark for nearly all of Egypt to (allegedly) stop what these stats show - the ability to stay up to date with news/events and communicate and share ideas with friends ... after all, information is power. I leave it to the reader to decide whether the end justifies the means, whether this is an acceptable form of "censorship" and whether these measures should ever be permissible. In any case, the world is watching and learning from these historical events.

Tuesday, January 25, 2011

Google Safe Browsing v2: Implementation notes

I wanted to share what I learned while I implementing Net::Google::SafeBrowsing2, a Perl library for Google Safe Browsing v2. I have put together "Google Safe Browsing v2 API: implementation notes", a collection of notes and real-world numbers about the API. This is intended for people who want to learn more about the API, whether as a user or to make their own implementation.

This is not another description of the API. Rather, it provides information about what you should expect from the API:
  • how many updates does it take to get the full database initially
  • how many updates there are per day on average
  • how many add chunks and sub chunks you should expect
  • how to test a library
  • key differences between version 1 and 2
  • etc.
The DOC and PDF versions can be downloaded from our website:



-- Julien

Web Transactions Per User Per Day

I searched the web recently looking for statistics on the average number of web transactions that end-users make per day, and equations for estimating end-user web transactions. I couldn't find anything that was worth repeating, so I ran my own numbers from Zscaler data.

First, a quick note on what I'm defining as a web transaction- all HTTP(S) client requests / server responses. Some web pages have a large number of web transactions associated with a single web page, for example, the cnn.com homepage has 127 transactions:
Other's like Google's search results pages contain only a few transactions:
I took a random sampling of 100,000 users (excluding group accounts and inactive users) for a 24-hour period on a non-holiday mid-week day. These users are from a global population of enterprise users that typically work an 8-10 hour day. I set a minimum threshold of 500 transactions for an account to be considered an active user - those with less than 500 transactions could be system-driven versus user-driven transactions (e.g., Windows Update), part-time workers, or temporary/test accounts.

The average for this data sample was:
3343.80227 web transactions per user per day

So, within a small organization with 1,000 active users - there are roughly 3.34 million web transactions from the organization's user population during a workday. Note: when I originally ran the numbers I included accounts with <500 transactions within the randomly selected user population- these accounted for roughly 14% of the randomly selected user population and brought the average down to 2251 web transactions per user per day. Because it is unlikely that an organization's entire user-population will be active at once, it may be important to consider approximately 10% of your organization's user-population as inactive.

The maximum number of web transactions from a single user account was 597,064 (this was an outlier). The median (50th percentile) was 1912 web transactions per user per day.


The above plot is the number of users (Y-axis) with a web transaction count between 500 and 4,000 (X-axis) for a 24-hour period. There are a number of users with <1000 transactions in a day, but there is a long "tail" to the right of highly active users. Plotting the aggregated user count for web transaction ranges in the thousands (e.g., data point 2 contains the transaction range 2000-2999) with the transaction range in reverse order shows that the curve is roughly exponential (inverse natural log function).

The black trendline that was generated in Excel from our data was:

y = 18293e ^ (-0.267x)
Where,
e is the mathematical constant, 2.71828
x is the transaction range in thousands (e.g., 2 = 2000-2999 transactions)
y is the user population (based on a 100,000 user population) that fall into the "x" transaction range

To estimate your user population for a specific transaction range, modify the equation to be:

y = (total_user_population * 0.18293e) ^ (-0.267x)

Using this function it is possible to roughly estimate the number of users that have a specific range of transactions (e.g., power web users) within an organization within a 24-hour period.

A relatively small organization of 300 users must deal with an estimated 1 million web transactions in a 24-hour period for their user population. Some estimates of the 2011 Federal Government payroll state a work-force of 1.35 million Federal civilian employees -- this is an estimated 4.5 billion web transactions a day (this does not include Federal civilian contractors).

These types of numbers are good for an organization to be aware of when considering scaling solutions for enforcing web security / policy and for storing and analyzing transactions.

Monday, January 24, 2011

Alexa Illustrates Web Security Risks (part 2)

I wanted to circle back and close the loop from my original post on this. First- not surprisingly I’m not the only one to have taken note at malicious sites landing in Alexa (reference sucuri.net blog).

I wrote some scripts to check a number of the domains listed in the Alexa top 1 million against Google SafeBrowsing (GSB), SURBL, and to cross-reference with MalwareDomainsList (MDL). In the previous post, I mentioned a few of my findings related to GSB and SURBL lookups - particularly FakeAV. Additionally, a number of the sites listed included porn sites that were listed in SURBL due to their advertisements within spam links. Snippet of some of the results.
While the GSB and SURBL lookups for 1 million sites aren't very quick repeatable processes, it is a fairly quick process to do the cross-reference with the MDL (MDL list here). The results from today's Alexa and MDL intersection include 87 sites. However, several of the listed sites are overly aggressive listings on MDL's part- for example: hotfile.com, rapidshare.com, and stashbox.org are free file hosting services that are listed. Free file hosting services are frequently abused to store malware- however, the sites themselves are legitimate and should not be blocked at the domain level.

Some of the more interesting sites listed, include:
  • bulletproof-web.com - as the name suggests, it's a bullet-proof hosting provider
  • bloggoogle.info, domaingoogle.info, hostinggoogle.info, datagoogle.info, businessgoogle.info - NeoSploit exploit kit (reference example)
  • gdfgdfgdgdfgdfg.in.ua - FakeAV drive-by redirect related to Twitter spam campaign (reference example)
  • protect-pc-2011.co.cc, multy-protect.co.cc, fastperot.co.cc - TDSS rootkit / FakeAV
Seeing these Alexa results further illustrates the threat of FakeAV and the recent come-back of NeoSploit in 2011 that others have highlighted with the release of its version 4 (reference example).

Using TheBrain to Visualize Web Transaction Logs

For those unfamiliar with TheBrain - it is a highly interactive mind-mapping software, which includes a free edition called the PersonalBrain. I have used this software on-and-off for a few years, and really like it for organizing and interacting with my thoughts when I start a new project. I recently took a look at using the software to organize and visualize web transaction logs for analysis (specifically to extract suspicious / malicious transactions). Below details my experiences - I'm curious to know what others have found to work.

After exchanging a few emails with TheBrain support (they are very responsive) - they shared a 5-page document on their supported XML formats. A DTD file is available, and is extremely easy to convert your data into "Thoughts" and "Links" (just make sure to properly handle any XML entities within your data). Thoughts are basically nodes within your mind-map and links can be parent/child or "jump" relationships (I think of these as bi-directional "lookup" relationships or cross-references). In my case I wrote a Perl script to extract and convert portions of my data into the supported XML format and then imported it into PersonalBrain.
The above is data from 5 transactions imported into the PersonalBrain with these relationships:
  • ServerIP, Domain, URLPath, RequestType, Country, ASN, Score, AnalyticCheck, and Transactions are all children of Data, and each has related data under each Thought. For example, 1.2.3.4 falls under ServerIP and China falls under Country.
  • ServerIP data has jump links to related ASN, Country, and Domain data and vice versa (the links are bi-directional)
  • Domain data has jump links to related URLPaths and vice versa
  • Transaction data has related jump links to ServerIP, Domain, URLPath, RequestType, Country, ASN, Score, and AnalyticCheck and vice versa

By having the data in this visual format, it is easy to quickly drill-down and view transactions with higher "suspicious" scores and then cross-reference their transaction information with other transactions. You have the ability to view both primary and secondary relationships, similar to what is displayed in the above graph, or a more concise view with just the primary relationships:


Zeus/SpyEye and other bots are often configured to use an IP lookup service (a possible example above) and provide their resolvable IP to the C&C. These types of inter-related transactions can become apparent when correlating botnet web transactions.

I found TheBrain to have a very easy format for converting and importing data to, and to have a very user-friendly, interactive, and fun! interface for working with your imported data. TheBrain includes the ability to "forget" and "remember" thoughts - i.e., the ability to remove and restore thoughts/links, so while you are conducting your analysis you can clear your brain of any data that is in the way.

What I did find TheBrain woefully inadequate for is large data-sets. When I exported all of the data that I wanted to review for a day, the XML file was 1.66GB for all of the Thoughts and Links. When I tried to import into PersonalBrain on my MacBook Pro (4GB, 2.53GHz Core 2 Duo) I ended up letting it attempt the import overnight ... after about 20 hours of waiting for it to import the data I "force quit" the application (though it never said "not responding").

If your data-set is of relatively small size, TheBrain may be a good free tool to add to your arsenal. Feel free to share other good, interactive, free visualization tools for analyzing web transaction logs, particularly if they scale to larger data-sets.

Thursday, January 20, 2011

Exploit in the wild for MS06-014 – a five year old vulnerability

Although 0day vulnerabilities receive all the attention, it’s not unusual to see attackers still taking advantage of old vulnerabilities to attack end users. Here's such an example where the vulnerability used was MS06-014 – a five year old vulnerability!. hxxp://www.win0day.com/win/6.htm delivers an obfuscated JavaScript exploit for this attack. More information about this vulnerability can be found here. Back in 2006 Metasploit released exploit code for this vulnerability.

Lets look at obfuscated JavaScript used:

The de-obfuscated code looks like this:

The exploit takes advantage of vulnerable ActiveX object “RDS.DataControl” having classid “BD96C556-65A3-11D0-983A-00C04FC29E36”. The exploit is designed to download executable files, which are then stored on victim's machine. This executable file path in the exploit is as follows:

This in turn decodes to:

Virustotal results indicate that 21/43 AV engines have protection against this Trojan – a concerning statistic considering the age of the exploit used to deploy the malware. Virustotal’s URL submission indicates that malware URL was submitted on 2010-11-18 and still is in active state. Why would attackers continue to leverage such an old vulnerability? Sadly, as we have shown in our quarterly reports, nearly one in five corporate users still employ Internet Explorer 6, a nine year old web browser.

Pradeep

Blackhat SEO numbers for December 2010 (Part II)

This is a follow up to the numbers I presented in Part I, which discussed malicious spam pages in Google results and the malicious that sites they redirect to.

Google warnings

The number of spam pages which are flagged by Google represent only about 44% of all spam identified by Zscaler. If we look at spam pages redirecting to a malware, 57% are flagged. These numbers are about the same as what we saw in March 2010 (53% flagged).


52% of the malicious spam links are flagged by Google

Distributions of spam links per page

Spammers are still able to elevate their links to the first page of search results. However, compared to March 2010, there are fewer spam links on the first page than there used to be.


Number of spam links on each result page in Google

In general, more search terms contain Blackhat SEO spam links, but there are fewer such links per search, when compared to March 2010.

Number of spam links per poisoned search



Overall, Google's Blackhat spam SEO situation has improved: there are fewer spam links on the first page and fewer search terms had more than 50% of links returned as malicious. However, Google still struggles to clean their index, or at least to warn users about real threats.

-- Julien

Wednesday, January 12, 2011

High profile websites hijacked to lead to fake stores

Recently, a lot of high profile .EDU and .GOV were hijacked to redirect users to fake online stores. Google searches related to buying software ("buy windows 7 key", where to buy microsoft, "purchase microsoft word", "buy microsoft office", etc.) contain a long list of websites running on non-standard ports: www.kidsforkidsfestival.org:8080, en.jurispedia.org:4444, >www.notiuno.com:4577, etc. These links redirect users to online stores which claim to sell software at a discounted price.

Spam results for buying Windows

Major websites hijacked

The list of hijacked sites include:
  • Harvard (Alexa rank in US: 875, cxc.harvard.edu)
  • MIT (Alexa rank in US: 963, petar.blog.lcs.mit.edu, fig.scripts.mit.edu, hlt.media.mit.edu)
  • Stanford (rank 782, mentalhealth.stanford.edu, yuba.stanford.edu, assu.stanford.edu)
  • Fandango (rank 236, www.summermovies.fandango.com)
There are also governmental sites in the list, from US, China and other countries:
  • openworld.gov
  • paceflorida.gov
  • fpa.tas.gov.au
  • ezhouinvest.gov.cn
  • perak.gov.my
  • misiones.gov.ar
  • etc.
Fake stores

The fake stores use multiple domain names, and each site looks slightly different: softsupreme.com, softsupreme.net, buysupreme.net, software-supreme.com, softbuy-download.net, softbuy-download.com, sacon.org, topoemdownloads.net, etc. I've seen more than 75 different domains so far.

Fake store

Multiple languages and other spams

Unlike the usual Blackhat spam SEO coming from the Google Hot Trends, this type of spam is targeted at multiple languages: English, French ("achat windows"), German ("Microsoft kaufen"), etc.

Hijacked sites on non-standard ports are also used for other types of spam: US student visa, Viagra, etc.

Once again spammers have managed to poison search results for popular searches. This specific spam was reported a month ago, but it still shows up in the first page of results for multiple searches.

-- Julien

Tuesday, January 11, 2011

Alexa Illustrates Web Security Risks (part 1)

I recently needed to look at some Alexa data related to their tracking of the top web domains visited for a side project that I was working on.


During my investigation of their data, I found it interesting to see a number of suspicious / malicious domains included in their daily top 1M list.

In this first blog section, I want to show that FakeAV / scareware malware has infiltrated the top websites according to Alexa. To begin with, there are 150 domains in the top list that contain the string "virus." This illustrates the popularity and the potential profitability of distributing software that cleans (or claims to clean) infected systems.
It could be inferred then, that there are a lot of systems on the Internet that users are trying to clean and/or protect from infection. Unfortunately, looking at the domains / sites in the list, it is difficult to determine if the wares being peddled on the site are legitimate or malicious. From my experience, most legit A/V products don't include the word "virus" within their domain name. The volume and sometimes "pushy" nature of anti-virus related sites further adds to the confusion of what are real or fake / malicious. Many of these sites appear to be affiliate sites (whether authorized or not), but there are malicious sites sprinkled in the results as well...

For example, a top scareware site in Alexa is hxxp://antivirus-defender.ru/. This site shows the typical scareware scanning screen (in Russian):

But with one twist- after the fake scanning is completed to scare the victim to purchase / download / install the wares, they are presented with a screen to enter a code that they purchase over SMS in order to download:


This translates to English as:

Unlike other scareware campaigns where the install is allowed first, and then pop-ups and warnings entice the victim to pay- this campaign preempts payment before installation and payment is done over SMS, which is a bit unique.

There are a handful of other malicious A/V sites within the Alexa results as well- e.g., antivirus-scanonline.com (is listed in Alexa and Google Safe Browsing) and virus-scanonline.com (a known malware site which is now dead). Looking up other key strings within Alexa, such as "scann", uncovered a few more malicious results: onlinescannerxp.com, best-guardinscanner.in, thebestscan-scanner.com, best-scan-scanner.in, smart-securityscanner.net, etc.

FakeAV was just one example of malware within the Alexa list. Doing SURBL and Google SafeBrowsing lookups of the Alexa domains showed a number of other results. For example, the domain freefilesoft.net is listed at position number 3378 in Alexa, but is also listed in SURBL.

It appears to offer up a Fake Codec that installs Adware.Hotbar software:

(hxxp://www.freefilesoft.net/xvid_dl/)

In the next section I will analyze the results from my scans of the top 1M sites and identify other threats / drive-by-downloads that are included within the most popular sites according to Alexa.

Wednesday, January 5, 2011

Blackhat SEO numbers for December 2010 (Part I)

Blackhat spam SEO was very prevalent in 2010 and it is not likely to disappear in 2011. I've compiled a few statistics on Blackhat spam SEO pages found in Google search results during December 2010:
  • Number of spam pages:  4,814
  • Number of spam domains: 428
  • Number of malicious sites: 483
I usually limit my Google scans to the first 10 pages of results, so there are likely many more spam pages in Google's full index.

Malicious sites

Fake AV pages are still the most popular type of attack, accounting for 85% of all malicious sites. Next in line are fake software stores, with 6% of the sites. I'll give more details about this type of attack in a future blog post.

5% of the malicious sites were unreachable, and could not be classified.

Types of malicious sites: mostly fake AV

44% of the malicious sites use a .IN domain name. 25% use a .COM extension, and 16% use an IP address without a domain name. .CC domains represent only 4% of all malicious domains. .CO.CC used to be the most popular TLD for fake AV pages, but it is now .IN

Malicious sites by domain extension


Spam pages

I found 428 legitimate sites hosting 4,814 spam pages in Google search results. That's an average of 11 spam links per domain within the top ranks for popular searches.

The spam sites are found all over the world: 31 different TLDs were found amongst spam sites. The international .COM extension was found in 58% of the sites, .ORG in 8% and .NET in 6%. The .EDU TLD represents 10% of the total. HJacked college websites were mostly to lead to fake software stores.

Spam sites by domain extension




Most dangerous searches

356 Google searches contained at least one malicious spam link in December 2010.

The most dangerous searches relate to buying software online, and lead to a fake store. The most dangerous popular search (shown in Google Hot Trends) was for "sherwood blount" with 63 spam links amongst the first 100 search results!

Top-10 most dangerous Google searches in December 2010

I am still compiling the numbers and will do another post on the topic shortly. It looks like malicious Blackhat spam SEO will still be a major threat, if not the most significant threat to users in 2011.

-- Julien