Estimated Reading Time: 6 minutes
In a previous blog, we explored the differences between base domains and full path URLs and their importance in web filtering—particularly how full path URL categorization is critical to malicious detection and phishing detection tools that identify and protect users against malicious websites and sources.
In this post, we wanted to take a step back and cover the basics—the individual structural elements of a URL (Uniform Resource Locator). We hope that you’ll find this resource helpful—whether as a refresher or as material to add to your ongoing training processes for employees. After all, understanding how a URL is structured is an important step to identifying malicious and unsafe sites.
Basics: What is a URL?
A URL, or Uniform Resource Locator, is effectively a unique web address. It represents the “location” of a specific resource on the internet. The term URL refers to a subset of Uniform Resource Indicators (URIs) that, in addition to identifying the web resource—also provides the means of locating it.
All webpages are resources online and all have a URL. But not all URLs point to webpages. Every resource on the internet has a URL including: webpages, files, images, media, web applications, services, etc.
Depending on the URL, it may contain some or all of the following:
- Domain Name (always present)
- Top Level Domain (TLD) (always present)
- Protocol (always present, not always visible)
- Path/File (always present, not always visible)
- HTML Anchors
So let’s jump in.
The Hostname (TLD, Domain Name, Subdomain)
Let’s start with the three elements that are the foundation of every URL. Domain Name, Top Level Domain (TLD), and Subdomain.
Top Level Domain
A top-level domain, or TLD, is—as its name suggests—one of the domains at the highest level of the Domain Name System hierarchy. It is the last labelled part in a fully qualified domain name.
Management and responsibility for TLDs is delegated to organizations by the Internet Corporation for Assigned Names and Numbers (ICANN) and the Internet Assigned Numbers Authority (IANA), which maintains the DNS root zone.
Examples of TLDs include:
- .com, .net, .org, .edu
- .us, .uk, .de
- And more..
A domain name is the registered identification “string” (or word/phrase) used by the Domain Name System to define a specific area of control and autonomy (aka a website location). A domain name is a level below the TLD. A single TLD may contain hundreds of thousands and even millions of individual “second-level” domains. Domain may also be referred to as second-level domain.
Examples of domain names are (in bold):
Subdomains a bit more nuanced—expressing a relative dependence and representing part of a higher level domain. For instance, in zvelo.com, zvelo is a subdomain of the .com domain. As another example, our zveloLIVE tool is located on the ‘tools’ subdomain of the zvelo.com domain. The latter example is the more commonly referenced use of subdomain.
In this way, subdomains are commonly used to divide up domains into smaller segments for communication purposes, content type, internationalization (language translation), or for other reasons.
Some of the most common examples of subdomains include (in bold):
NOTE: A URL can exclude or omit the submit domain and the hostname may not include subdomain. This is referred to as a “naked domain“. For example, our website does not show a subdomain:
Together, these three elements of a URL are the most commonly understood and recognized.
Protocol, Path/File, and More…
Whether you’re familiar with them or not, the following URL elements are ubiquitous across the internet and have significant impact on your daily browsing life, privacy, and computer security.
The protocol (also referred to as transfer protocol or scheme) in a URL determines how data is transferred between the host and a web browser (or client). HTTP and HTTPS (secure) are two of the most common protocols you’ll find in most URLs, though there are other internet protocols such as FTP, DNS, DHCP, IMAP, SMTP, etc.
You’ll see the communication protocol (also known as scheme) before any subdomain in a URL. The protocol is followed by a colon and two forward slashes separating it from the hostname.
The path refers to the exact location of a page, post, file, or other asset. It is often analogous to the underlying file structure of the website. The path resides after the hostname and is separated by “/” (forward slash). The path/file also consists of any asset file extension, such as images (.jpg or .png, etc.), documents (.pdf or .docx), and more.
But not all URLs will display a path.
For instance when you visit the homepage of a site on many modern website, you may not see a path or file name. This is because many modern websites can “rewrite” URLs (like the homepage) for simplicity and elegance—such as omitting the typical “index.html”.
In some cases, websites use a naming structure that incorporates the date (month/day/year) or categories, or other separators to organize content or just the URL itself.
HTML Anchors (or fragments) are used on websites to implement “bookmarks” and internal page navigation elements. These can be used to to provide links to specific locations within a page. Anchors will be located immediately after the file/path of a URL (when present).
For example, here is an internally linked URL with an HTML anchor that would take you to the next section:
https://zvelo.com/anatomy-of-full-path-url-hostname-protocol-path-more#parameters (Try it)
Parameters are found at the very end of the URL or within the path , depending on the implementation. URL Parameters are represented in key/value pairs, beginning with a ‘?’ and separated by an ampersand ‘&’. They can also be dynamically set in the path as values separated by slashes and other characters (depending on the system being used and how it’s implemented). Parameters are commonly used for tracking and analytics—as well as encoding specific information for use within websites and applications.
For example, a URL with parameters:
Alternatively, you might find a URL with a Google campaign tracking parameter such as:
These URL elements instruct your web browser how to connect to a web host, acquire resources, interact with back-end systems, and display web content for you. Knowing what to look for, particularly with transfer protocol and hostname, can help web users spot and identify scams and malicious websites before falling victim.
With web filtering in mind, most “basic” and “consumer” solutions do not support full-path URL categorization —and therefore do not account for path/file, parameters, or protocol. These offerings rely on DNS filtering to protect users and network traffic from accessing “undesirable” websites. This is limiting because it only allows for classification(s) or objectionable/malicious identifiers to be assigned at the site level (or base domain, aka hostname).
What happens when a site is compromised and that otherwise “clean”, high-traffic site begins to disseminate a malicious file from a single source or page? For that, check out our blog on the differences between base domain and full Path URLs. Or check out our other solutions for Cyber Threat Intelligence and Network Security.
Below are a couple of articles to help you learn more about why it’s crucial for malicious detection and cyber threat intelligence solutions to have visibility and blocking capabilities on full-path URLs to ensure you achieve maximum protection as well as value.
- In-House Cyber Threat Intelligence Begs a Multi-Million Dollar Question
- Whitelisted Sites Are Being Used to Deliver Malware
- Cyber Threat Intelligence and Network Security
We hope you’ve found this helpful! Now, go forth and browse safely!