I've shown in Spam SEO trends & statistics (Part II) that the volume of spam SEO in Google searches can vary greatly, with anywhere from 0 to 90% of the search results being malicious. As a data miner, I'd love to be able to know which hot searches I should focus on, and after how many days. With minimal resources, I'd like to quickly find the trends that would give the me the most number infected links in order to find more malicious pages.
Trends by search rank
First, let's try to see if the rank of a search term turns out to be a factor. Hot trends are ranked from 1 to 20. A search term can appear on several days in hot trends, with a different rank each time. The graph below shows the average number of spam links for each trend rank:
The average count of spam links varies from five to twelve. It looks like ranks two thru thirteen give the highest numbers, but I am not sure the graph will look the same over time.
I generated the same type of graph, but based on the percentage of searches that contain at least one spam link:
The variations are smaller: 44% to 65% of each rank contains at least one SEO spam link.
The distribution of spam per hot trend search is pretty flat, so it is not a good indicator of which search is more likely to be interesting for security research.
Trends by days
Next I checked the distribution of spam over time. One researcher told me that there seems to be a spike of spam, five days after a term first appears in the hot trends. A search is more likely to contain spam, or contain more spam results, five days after it first appeared in Google hot trends. Here is what I get from my data:
The graph shows a peak at day six: about nine spam SEO links on average are shows for searches scanned six days after they first appeared in Google hot trends. Most the spam links can be found after five to twelve days.
The graph below shows similar information: the number of searches that contain at least one spam result. As I have not scanned as many searches each day, I also included the number of search terms scanned that day:
On day eleven - eleven days after a trend made it to the top-20 - 76% of the 132 searches I scanned contained at least on spam link.
Conclusion
How can I reduce the number of security scans required, while maximizing the potential number of spam links I find? I cannot rely on the rank of a trend, as the distribution is flat, but I can skip trends which appeared within five days. I'll probably scan the search results five to ten days after they show up in Google Hot trends.
-- Julien
Subscribe to:
Post Comments (Atom)
0 comments:
Post a Comment