Content Categorization of a Dynamic Website
Of the hundreds of billions of URL queries zvelo received for categorization in 2013, an estimated 27% have been classified as being dynamic (see image 1). Dynamic categories in this data sample included Social Networking, News, Search Engines, Personal Pages & Blogs, Community Forums, Technology (General), and Chat.
Other categories can certainly be deemed dynamic, and are dictated by zvelo technology partners for whatever application is being developed or enhanced. A parental controls offering may want to account for “Sex Education & Pregnancy,” for instance, to ensure content is age-appropriate. An advertising network, on the other hand, may require “Online Ads” to improve ad targeting or “Botnets” detection to help combat ad fraud.
The terms “static” and “dynamic” can have different meanings depending on the context. A blog post, for example, most often is served by a dynamically-generated web page and can be driven by scripting languages and content management systems (CMS). As such, it’s not a static HTML page. That blog post can then be featured on a slideshow on a website’s home page thus concluding that it’s dynamically displayed (with animation or interaction). So, while the blog post may look static, it is dynamically generated by a server.
Because of the wide variety of content, relying on the category values of the Top-Level Domain (TLD) is not enough, especially since content can vary drastically between sub-pages. Categorization requires a unique approach. To visualize zvelo content categorization methodology, let’s start by analyzing the dynamic elements of a popular news home page (see image 2).
The infographic highlights structured areas of dynamically-generated content. With each refresh, the headlines, images, ads and other media can change. Text areas or widgets can stem from a CMS, images from a third-party photo gallery like Picasa, and banner and video ads from an ad network or exchange. The difficulty of categorizing all of the content on a single page of this type becomes quite evident.
When queried, the zveloDB® URL database considers the entire page (URL) and can return multiple category values. For some of zvelo customers, this is sufficient. For others, however, additional granularity is required. This is where the zveloCAT® content + context categorization engine can help supplement the process.
zveloCAT can categorize individual or multiple streams of short-text. Many of zvelo’s technology partners conduct queries in this fashion. “A “stream” can either be the text extracted from its HTML constructs or the raw HTML document itself. For high volume applications, like those tied to placing online ads or analytics, a tiered solution could be extremely beneficial.
In a tiered integration, an application first queries URLs to the zveloDB for the initial categories set. After which, all dynamic URLs identified can be stripped down into multiple streams and pushed through zveloCAT for classification. The news headlines, or blocks of short-text, may be one or a series of streams. Others can include individual tweets from a Twitter feed integration, alternate text from images, and meta data from embedded videos. zveloCAT can also gauge sentiment towards named entities – persons, places or things.
As a result, and in contrast to evaluating the content as a whole and in aggregate, zvelo’s technology partners can choose to process the dynamic elements of the same page separately, providing a virtually infinite level of granularity, accuracy and flexibility.
As stated, truly static HTML web pages are almost obsolete. Technology vendors needing to identify web usage behaviors and trends in order to effectively and accurately account for the dynamic content that may be served to a single user regardless of the device, browser or app, they must leverage a URL database and content categorization engine capable of making sense of variety.