Given the dynamic nature of the majority of today’s websites, categorization at the full path URL versus the base domain is superior and now required. Parts of a website include the top-level domain (.com, .org, etc.), the base domain (example.com), sub-domain (subdomain.example.com) or sub-path (example.com/page). When categorizing content, it is highly important to recognize exactly what is being classified within a website because content can differ dramatically across full path URLs.
zvelo is the provider of the most advanced URL Database for Web Categorization and Malicious Detection—designed for OEMs, device manufacturer’s, and Network Security vendors. zvelo’s content categorization engines power web filtering and parental controls, whitelists and blacklists for anti-virus companies, home network protection devices, and much more. By categorizing content into topic-based, objectionable, and malicious category groupings—zveloDB provides the most advanced malicious detection for advanced threat intelligence and cybersecurity.
What is a URL parameter? Quite simply it is a string of characters, or a query string, that is appended to a URL that contains data. This data is passed to predefined web applications to find the appropriate content and return it back to the user’s web browser which then generates the entire web page. The query string can also be used for various other methods such as identifying a user’s session or using it as a way to look up information about your online bank account after you have logged in. URLs with parameters are used by various types of web sites however online shopping, auction, and banking type sites are probably the most prevalent.
Manually classifying the content on a single web page takes but a few seconds to accomplish. Analyzing the keywords – words or phrases – used and the number of instances of each – keyword density – is one way to go about it. When needing to classify the content on billions of web pages at a time, however, the task becomes overwhelmingly daunting for any human eye to handle. In this scenario, only an automated content classification engine can succeed.
Anatomy of a Dynamic Website Of the hundreds of billions of URL queries zvelo has received for website categorization in 2013, an estimated 27% have been classified as being dynamic (see image 1). Dynamic categories in this data sample included Social Networking, News, Search Engines, Personal Pages & Blogs, Community Forums, Technology (General), and Chat.…
How does zvelo provide the most accurate content categorization service and the best URL database available? The approach is two-fold and while a substantial chunk of the workload is handled by zvelo’s line-up of machine learning and artificial intelligence-based categorization processes and systems, the quality assurance and other daily efforts put forth by its human Web Analysts can never be discounted.
Static HTML websites are becoming increasingly rare, and nowadays sites pack quite the punch. We’ve grown accustomed to photo and video slideshows, widgets, feeds, social network integrations, and other dynamic elements. Websites come overloaded with media, are more interactive, and the content can vary dramatically from page-to-page and can differ even more between end-users or browsing sessions. Much of the content is pulled in dynamically from external sources and most of us fuel the Internet’s growth by creating and uploading content of our own daily and at extremely high upload rates. Making sense of it all can be quite the challenge for technology vendors “needing to know” and following are insights into zvelo’s content categorization approach.
Ad blocking has gained wide consumer acceptance over the past couple of years and a PageFair report suggests it could be costing web-based businesses hundreds of thousands of dollars in lost advertising revenue. In some instances, ad blocking negatively impacted a select number of websites so much they are no longer online. With the use of ad blocking software on the rise, there exists a significant requirement by the ad-tech market to make the most of those actual ad placements that make the cut. In other words, it’s more important than ever for ad units to be in-context with content on web pages, no matter how deep within a website the placements land.
In early 2013, zvelo deployed a new approach to detect spam web pages. These web pages have little value and consist mostly of meaningless content and links, sometimes objectionable in nature, or worse yet they can be used to host and spread malware. Spam web pages continue to sprout online and following are some interesting trends about the types of web content spammers are targeting, which zveloLABS has mapped out.
Web spam is the bombardment of mostly unsolicited advertising messages or links sent across a wide array of media, including social networking websites, instant messaging applications, online newsgroups or forums, mobile phones, and blogs. Web spam has even been found stuffed within the results pages of popular search engines like Google. While the majority of web spam is benign, certain campaigns are tied to particular types of web pages disguised to contain valuable information. In actuality, these spam web pages are often littered with irrelevant and meaningless content, sometimes inappropriate in nature, or worse yet they can be used to host and spread malware.
Imagine for a second you were presented with a superhuman baby having the ability to learn and retain vast amounts of information. We’ll make it a girl super baby as a tribute to fem-heroes of comic book past. Now, what if on your shoulders lays the opportunity to raise her up and teach her the sum of all human knowledge that ever existed? Like every good mentor, you watch her closely making sure her misunderstandings and confusions are always kept checked, corrected, and resolved. You take pride in how accurate she becomes and are quick to reply “Bring it!” to anyone who wants to test her knowledge. Here at zvelo this what-if situation is a reality and I’d like to share with you the experience of training and working with an intelligent being day after day.