ISPs, Telcos, device manufacturers, and security vendors go to great lengths to provide their customers’ with online security from malicious and objectionable content (adult, pornography, hate speech, terrorism, cryptocurrency mining, etc.). The industry’s best web filtering (and DNS filtering) and parental controls are powered by a global network of more than 1 billion end users providing unmatched coverage and accuracy of active web traffic and websites. zvelo provides 99.9% coverage and over 99% accuracy for the ActiveWeb. That’s best-in-class URL classification database for OEMs and device manufacturers.
Given the dynamic nature of the majority of today’s websites, categorization at the full path URL versus the base domain is superior and now required. Parts of a website include the top-level domain (.com, .org, etc.), the base domain (example.com), sub-domain (subdomain.example.com) or sub-path (example.com/page). When categorizing content, it is highly important to recognize exactly what is being classified within a website because content can differ dramatically across full path URLs.
What is a URL parameter? Quite simply it is a string of characters, or a query string, that is appended to a URL that contains data. This data is passed to predefined web applications to find the appropriate content and return it back to the user’s web browser which then generates the entire web page. The query string can also be used for various other methods such as identifying a user’s session or using it as a way to look up information about your online bank account after you have logged in. URLs with parameters are used by various types of web sites however online shopping, auction, and banking type sites are probably the most prevalent.
Manually classifying the content on a single web page takes but a few seconds to accomplish. Analyzing the keywords – words or phrases – used and the number of instances of each – keyword density – is one way to go about it. When needing to classify the content on billions of web pages at a time, however, the task becomes overwhelmingly daunting for any human eye to handle. In this scenario, only an automated content classification engine can succeed.
zvelo has received many requests from its technology partners who are in the web filtering and parental control sectors to institute and support a new category that can be used to identify websites that promote self-harm behaviors. As a result of such demand, a new “Self Harm” category has been added to the zveloDB® URL database.
Anatomy of a Dynamic Website Of the hundreds of billions of URL queries zvelo has received for website categorization in 2013, an estimated 27% have been classified as being dynamic (see image 1). Dynamic categories in this data sample included Social Networking, News, Search Engines, Personal Pages & Blogs, Community Forums, Technology (General), and Chat.…
In mid-2013, British Prime Minister, David Cameron, began a push to block pornographic material on the Web in UK households. Under the new legislation, porn would be filtered by default and citizens would have to opt-in to view such adult content. Enforcement of such an ambitious initiative comes with many content categorization and technical challenges, not just in the UK, but within any internet service provider infrastructure.
Wi-Fi hotspots commonly found in many American coffee shops, restaurants and other popular after-school hang outs are providing kids with what they demand – free Internet access. This may help keep them connected with family or friends, in addition to sparing parents from costly data plan overages, but the complimentary Web access was proven to come with a twist in an Adaptive Mobile independent study. The adult, dating, extremist, drug, gambling and other similarly objectionable content typically blocked at home by some type of parental controls solution is easily accessible by kids at these Wi-Fi locations.
I got my hands on a copy of a Northwestern University research paper titled “Evaluating Android Anti-malware against Transformation Attacks.” After digging into it, my zveloLABS colleagues and I decided to conduct an experiment of our own based on the information provided in the research paper.
In early 2013, zvelo deployed a new approach to detect spam web pages. These web pages have little value and consist mostly of meaningless content and links, sometimes objectionable in nature, or worse yet they can be used to host and spread malware. Spam web pages continue to sprout online and following are some interesting trends about the types of web content spammers are targeting, which zveloLABS has mapped out.
Web spam is the bombardment of mostly unsolicited advertising messages or links sent across a wide array of media, including social networking websites, instant messaging applications, online newsgroups or forums, mobile phones, and blogs. Web spam has even been found stuffed within the results pages of popular search engines like Google. While the majority of web spam is benign, certain campaigns are tied to particular types of web pages disguised to contain valuable information. In actuality, these spam web pages are often littered with irrelevant and meaningless content, sometimes inappropriate in nature, or worse yet they can be used to host and spread malware.