Full Path URL Categorization and Content Distribution Networks (CDNs)

By Eric Watkins, Senior Malicious Detection Researcher at zvelo

Earlier this  month, I came across a use case that capitalizes on the value of full path content categorization. Before discussing this use case in detail, let’s go over the definition of a content distribution network (CDN) and also highlight a few key strengths of full path URL categorization.  

A content distribution network (CDN) is designed to optimize web usage by distributing the content from a single web server to multiple diverse data centers, spread out over the Internet. This design places the content located within a datacenter as close as possible to the viewer/user. As a result, the content loads more quickly and is more responsive to interaction, thus enhancing the user experience. CDNs can also protect content against Distributed Denial of Service (DDoS) attacks. When content is located in a single location,  a DDoS can block users from accessing the content during the attack.smaller-tile-world

Full path URL categorization goes further than categorizing www.cnn.com at its highest-level as “News” and to actually categorizing the entire URL. For example: www.cnn.com/sports/ is “News, Sports.” Full path URL categorization enables greater granularity on the categorization and allows for diverse categorizations based on the full path. If a given path contains malware,  but rest of the site is clean, zvelo’s full path URL categorization method can accurately reflect the malware on specific page within the domain name.

When benefiting from a CDN, but neglecting to utilize full path URL categorization, issues may arise when mingling diverse content from different customers of the CDN provider.

For example, when www.customerA.com uploads valid non-malicious content to the CDN and then www.customerB.com uploads malware or objectionable content, both customer A’s and B’s content may end up being hosted on the same servers inside the CDN infrastructure. As a result of being on the same server, a domain-only content categorization engine would find malware inside the CDN and therefore designate the CDN host as malicious. Unfortunately, customer A’s service would be impacted as various consumers of the feed filter the content.

This scenario can still lead to serious issues for any customer residing on the CDN,  when full path URL categorization is not used.  When using full path content categorization, the CDN URL with malicious content is designated correctly while the rest of the domain continues to be labeled appropriately.

Clearly, there are many important benefits of implementing a CDN to host your content worldwide and protect yourself from DDoS. But also, you should remember to choose a full path URL categorization solution over a domain-only content categorization engine.

Learn more about zvelo’s content dataset with 99.9% coverage of the active web.