Base domain vs full-path URLs: What’s the Difference?
For the average web surfer, the URL bar provides an easy search bar and “fuzzy” match tool—revealing the treasures of the internet at the stroke of ‘enter’. For those of us in IT roles, software development, or network administration—we know it gets much more complicated. We quickly slip down the rabbit hole and into OSI, DNS, TLS, HTTPS, subdomains… ok. Focus. For our partners in network security, web filtering, ad tech, and more—there is a critical distinction that we make between two basic types of URLs: base domain URLs vs full-path URLs.
A Quick Refresher on URLs
As a brief refresher, a URL (Uniform Resource Locator) is a specific subtype of URI (Uniform Resource Identifier)—along with URN (Uniform Resource Name). If you’re confused—you’re not alone. It’s complicated, and that’s probably why you’re here. So here’s a quick breakdown of the elements within a URL:
Elements of a URL Include:
- Protocol (Scheme)
- Subdomain
- Domain Name
- Top Level Domain (TLD)
- Path/File (w/ extension)
- Parameters
For a more detailed explanation, check out the notes and references links at the end of this blog.
Base Domain URLs
The base domain is the umbrella under which an entire site resides. A base domain consists of only two of the URL elements:
- Domain Name
- Top Level Domain (TLD)
When a user enters a base domain in the URL bar (i.e. google.com, amazon.com, etc.), DNS directs them to the server IP location for the homepage of the respective website (a specific address such as /index.htm or /home.html).
For categorization purposes, the base domain is assigned an appropriate category value that is representative of all pages and files contained within the website. The base domain and it’s respective category value can then be committed to a URL database, which can be cached on a device—in a data center, on a computer, or accessible via the cloud.
In the real world, by implementing a URL classification database, all internet traffic can be filtered, protected, or analyzed in real-time by referencing the cached URL database. But what level of category granularity do you need?
Full-Path URLs
In contrast, a full-path URL refers to an EXACT location (i.e. page, article, file, etc.) and allows for a highly specific analysis and categorization of web content at the specified address. A full-path URL not only includes the domain and TLD, it must also include the protocol (aka scheme), subdomain (i.e. blog, support, etc.), path/destination, and potentially a file extension as well as parameters.
That is, it can include all of the following:
- Protocol
- Subdomain
- Domain Name
- Top Level Domain (TLD)
- Path/File
- Parameters (optimal)
Only with all of these elements can a comprehensive analysis be made to identify the most relevant category for the page, post, etc. With nearly two (2) billion websites on the internet—each capable of including millions of indexed pages—that’s would require a fairly hefty data storage requirement (to put it lightly).
For URL categorization purposes, each individual page visited would need to be analyzed and categorized with a high degree of accuracy. Sound fairly complicated? It is.
In the real world, web content changes regularly. Websites are put up, are retired, content changes, and the world keeps on spinning—posing a challenge for high accuracy at the page/post/file level. That means that each page must be analyzed and categorized regularly (if not AS it is visited).
zvelo accomplishes real-time, full-path URL classifications by leveraging zvelo’s advanced AI platform for highly accurate web content categorizations at the page, post, and/or article level. We help our partners identify the important sites, behaviors, and categories for their applications—and help outline a workflow and deployment infrastructure that suits their needs.
Want more information? Contact us.
Let’s Break It Down
Now that we’ve explored the distinctions between a base domain URL vs full-path URL, let’s delve into what that actually means.
Let’s take a base domain, for example, cnn.com. CNN’s website could be categorized as ‘International News’ and ‘Streaming & Downloadable Video’ (which it is by our systems). But that only provides a high-level classification. CNN has millions of pages and articles about everything ranging from Tennis and Sports, to Politics, to Technology.
One category (the base domain) compared to millions (the individual pages and articles)? That’s an important distinction—especially if you’re looking to classify web content at a highly granular level—whether for cyber threat intelligence for malicious detection, phishing detection, or web filtering to protect children from objectionable content, or an ad publisher seeking brand safety solutions to display advertisements for deodorant only on fitness, sports, and other relevant pages.
It becomes even more difficult for the behemoth social networks, platforms, and search engines like facebook, reddit, or google. To achieve a higher level of granularity and accuracy for the specific content on each page/file—you need to look at the full-path.
Other Considerations
For many applications, base domain provides an effective high-level and cacheable (high-speed) solution for basic web filtering (whitelisting/blacklisting) capabilities. For others, particularly those in cybersecurity, network devices (routers, gateways, etc.), and online advertising—contextual relevance is critical to providing protection and understanding user behavior and intent.
Adding to the level of complexity, the internet is constantly changing—both the content on webpages and technologies connecting users to them. Even existing content is subject to change and updates.
What’s more—malicious and objectionable content (the types of content that cybersecurity and advertising really want to have some control over) are the content types that are most likely to change with new domains and pages popping up for short periods of time to serve a specific purpose—before being decommissioned to cover the tracks of bad actors or because it no longer serves a purpose.
Stay tuned for more on full-path URLs and real-time categorization. If you’re interested in more information about URL categorization for any the following applications, please see the links below. If you’re interested in speaking to representative or scheduling an evaluation, visit our Contact Page.
Applications that benefit from control and analysis at the full-path include:
- Cyber Threat Intelligence
- Phishing Threat Detection
- Malicious Threat Detection
- Web & DNS Filtering
- Subscriber Analytics
- Brand Safety & Contextual Targeting
Want to test drive some URLs of your own with zveloLIVE?