Given the dynamic nature of the majority of today’s websites, categorization at the full path URL versus the base domain is superior and now required. Parts of a website include the top-level domain (.com, .org, etc.), the base domain (example.com), sub-domain (subdomain.example.com) or sub-path (example.com/page). When categorizing content, it is highly important to recognize exactly what is being classified within a website because content can differ dramatically across full path URLs.
The Internet Watch Foundation works to remove online videos and images of child sexual abuse and its 2013 Annual & Charity Report highlighted significant milestones achieved and a big year of change.
What is a URL parameter? Quite simply it is a string of characters, or a query string, that is appended to a URL that contains data. This data is passed to predefined web applications to find the appropriate content and return it back to the user’s web browser which then generates the entire web page. The query string can also be used for various other methods such as identifying a user’s session or using it as a way to look up information about your online bank account after you have logged in. URLs with parameters are used by various types of web sites however online shopping, auction, and banking type sites are probably the most prevalent.
Manually classifying the content on a single web page takes but a few seconds to accomplish. Analyzing the keywords – words or phrases – used and the number of instances of each – keyword density – is one way to go about it. When needing to classify the content on billions of web pages at a time, however, the task becomes overwhelmingly daunting for any human eye to handle. In this scenario, only an automated content classification engine can succeed.
Prior to this blog post, zveloLABS published a phishing URL alert about fake Apple account verification websites. Now, zvelo’s team of engineers and researchers has unearthed a new phishing attack campaign using fraudulent Facebook log-in sites.
Instances of large-scale compromises of both private industry and public institutions in 2013 prompted a flurry of activity among security researchers to identify emerging and established threats. Commonly identified as Advance Persistent Threats (APTs), this phenomenon is expected to continue well into the foreseeable future. Fundamental to the spread of these threats is one of their foremost methods of propagation – a water hole attack.
zvelo has received many requests from its technology partners who are in the web filtering and parental control sectors to institute and support a new category that can be used to identify websites that promote self-harm behaviors. As a result of such demand, a new “Self Harm” category has been added to the zveloDB® URL database.
zveloLABS discovered a phishing website masquerading as an account verification page for Apple IDs, as depicted in the following screenshot and explained in this blog post.
Anatomy of a Dynamic Website Of the hundreds of billions of URL queries zvelo has received for website categorization in 2013, an estimated 27% have been classified as being dynamic (see image 1). Dynamic categories in this data sample included Social Networking, News, Search Engines, Personal Pages & Blogs, Community Forums, Technology (General), and Chat.…
How does zvelo provide the most accurate content categorization service and the best URL database available? The approach is two-fold and while a substantial chunk of the workload is handled by zvelo’s line-up of machine learning and artificial intelligence-based categorization processes and systems, the quality assurance and other daily efforts put forth by its human Web Analysts can never be discounted.