Spam Web Page Detection – a New Approach
Web spam is the bombardment of mostly unsolicited advertising messages or links sent across a wide array of media, including social networking websites, instant messaging applications, online newsgroups or forums, mobile phones, and blogs. Web spam has even been found stuffed within the results pages of popular search engines like Google. While the majority of web spam is benign, certain campaigns are tied to particular types of web pages disguised to contain valuable information. In actuality, these spam web pages are often littered with irrelevant and meaningless content, sometimes inappropriate in nature, or worse yet they can be used to host and spread malware.
The spam web pages filled with advertisements typically link to other websites for the sole purpose of boosting traffic to promote a number of products. One of the most common types of products is pharmaceuticals. More traffic results in more sales and thus, more money in the spammers’ pockets. There are a number of link exploitation techniques that are deployed to trick users into clicking through. The content on these spam web pages is rarely related. For example, a spam page disguised to sell iPads may contain ads that link to pharma-fraud sites, gambling portals, or other tech gadget hubs.
There has been no single defining profile that can encompass all types of spam websites. As such, this makes spam web page detection extremely difficult.
In response to the growing volume of spammy web pages identified by zveloLABS, much of which is attributed to the increasing trend by users to mingle personal and work life while surfing the Web, zvelo has rolled out an enhanced spam web page detection feature.
The feature combines extensive link and content analysis that take into account a number of factors, such as the number of external links and page reputation. Content category variances are also measured. The more unrelated content the more likely the page is spam. These pages are also compared to a list of commonly spam-targeted URL categories like finance, porn and blogs. A point system is employed to determine if a site is spam based on the aforementioned and other criteria. The results are dynamically stored within the zveloDB® URL database so that any subsequent hits to these spam web pages can be accounted for and blocked by zvelo’s technology partners.
Here are three examples of spam web pages detected and categorized by zvelo:
Spam web page detection example 1: A forum page
Spam web page detection example 2: A page with garbled content
Spam web page detection example 3: A page with hidden hyperlinks