Estimated Reading Time: 8 minutes
For any Web or DNS filtering vendor pursuing market leadership, they need a clear vision of their target market, how they will differentiate their offering in the market, the ability to focus on execution and technology partners that share their commitment to excellence.
Not all filtering vendors have a vision of being a market leader. Some organizations trail the market and try to stay afloat by offering low price with a “checkbox” set of features and functions, combined with poor customer service and hope that the strategy allows them to remain afloat. This article is not for those types of organizations and they can stop reading now.
For any organization who is committed to being a web or DNS filtering market leader, whether you are a UTM vendor, a service provider/ISP/MSSP, an anti-virus vendor, CASB provider, or some other type of network security provider—you understand the essential nature and commitment necessary to focus each and every day on providing your customers with the most effective, safe, and secure web access.
For these companies, protecting hardware AND users (both employees and customers) is central to the success of the business—ultimately impacting the value of products and services, as well as customer trust. For companies safeguarding sensitive data or in the cybersecurity space—you understand the constantly changing and evolving nature of online malicious threats such as malware, phishing, botnets, and more, as well as the staggering amount of new objectionable content that is published online every minute of the day. You understand what it takes to be successful and deliver a product that exceeds expectations.
A critical element of any successful filtering solution is the URL database and classification technology that powers the filtering offering. Due to the extreme challenges and focus required to provide the highest quality database, most of the leading web filtering vendors conclude they need to partner with a third party, who is focused day and night on website classifications, malicious and phishing detections, updates and more.
URL Database Evaluation Criteria
The goal of this blog is to provide you with important considerations and criteria for performing an evaluation and finding a technology partner for the URL database and classification technology that shares your commitment to success. This blog will walk you through the most important criteria for grading a URL database and classification technology and how to prepare and test the following:
- Coverage
- Accuracy
- Speed & Performance
- Protection / Malicious Detection
- Ease of Integration
We hope this will save you time and help you to perform a more effective evaluation—imparting you with confidence and trust that the web filtering database that you select will provide the appropriate the protection, coverage, and accuracy.
What are your Business Goals and Technical Requirements For a URL Database and Classification Technology?
First step? Define your business goals and requirements.
Before getting underway with an evaluation, we recommend clearly defining your goals, expectations, and requirements specific to your database and classification needs. This includes how/where it will be implemented, general performance goals (queries/second, etc.), hardware requirements (storage space), etc.
Outlining your goals and requirements up front can significantly improve communication and understanding of all needs between your executive, technical, and business personnel involved in the evaluation.
Below, we’ve outlined some common questions that will help you along the path to identifying the best technology for you:
What are your primary business goals for URL Database and Classification technologies?
- Power a market-leading Web Filtering, Parental Controls, or DNS filtering solution
- Identify a partner that is committed to market excellence
- Identify a partner with a culture of responsiveness
- Secure a pricing arrangement that shares in success and allows for strong margins
What are your technical requirements for URL Database and Classification Technologies?
- Excellent classification accuracy
- Excellent coverage for our target market and geographies
- Excellent detection of malicious, phishing, objectionable and dangerous content
- Fast and Flexible deployment options – local, cloud, hybrid models
- Fast time to categorize
- Support at the domain, sub-domain and/or full-path level
- Support for IPs
- Real-time updates as new URLs are classified
- Specific taxonomies for the various target markets
- Fast response times for customer service and support
By gathering and determining these details up front, your team will be prepared to answer questions specific to their contributing role in your evaluation and implementation needs.
What Criteria Should Be Used To Measure a URL Database and Classification Technology?
Now that you’ve defined requirements and expectations for an evaluation, let’s cover the important criteria to use in evaluating a URL database and classification technology for a market-leading filtering solution.
Coverage
Coverage is defined by the total number of URLs queried that return a category as compared to the total number of URLs tested.
Coverage % = (# of Categorized URLs) / (Total URLs Tested)
When evaluating a URL category database, Coverage is one of the most critical quality indicators. A high coverage rate ensures that the technology or service provider has and maintains systems that continuously monitor, analyze, and categorize new sites and pages. For protection against malicious threats a high coverage rate is paramount. Web filtering and categorization services aren’t working to actively fight virus or quarantine malicious code—they are identifying and blocking threats before you connect to them. So, if the threat hasn’t been found and identified as malicious—the filtering solution itself won’t offer any protection against it.
GOAL: A Coverage rate (percentage) in the upper 90’s is a sign of a high-quality web filtering solutions. zvelo proudly maintains over 99.9% Coverage of the ActiveWeb.
Accuracy
Accuracy is defined as the percentage of categorized URLs that are verified as being correctly classified.
Accuracy % = (# of Accurately Categorized URLs) / (Total URLs Tested)
This indicator above all others, is what separates the best URL Database and Classification technology. Accuracy should be measured using human verification to qualify the categories returned for your test corpus of URLs. Uncategorized URLs and miscategorizations should be considered inaccurate. Accuracy may vary based on the source language of web content, as well as other factors.
GOAL: Similar to Coverage, an Accuracy rate in the upper 90’s indicates both quality and protection. For example, an Accuracy of 99% demonstrates that the web filtering technology’s systems and processes are finely tuned and managed to return correct classifications.
Speed & Performance
The speed and performance of a URL Database and Classification Technology is critical and must meet the demands of web filtering vendor who has their eyes set on market leadership—making it another of the most critical evaluation criteria.
In many cases, it is prudent to perform shorter, focused tests to determine the overall viability of a URL Database and Classification technology (i.e. for Coverage and Accuracy). Once complete, we recommend running longer tests with an API, local SDK, or other implementation on a network with real-world traffic in order to measure performance. Some important test metrics and things to think about include:
- Identify peak resource usage
- Identify maximum number of queries per second
- Identify any blockages
- Measure latency and calculate the time to return a URL category
- Measure CPU and disk usage
Other Considerations
Coverage, Accuracy, and Performance are among the most important aspects to evaluate—but depending on your application or use case, you may wish to take a closer look at the following:
Number of Categories Supported: A greater number of unique categories supports increased precision and filtering capabilities based on domain/URL classifications.
Full Path URL Support: Full path refers to the complete URL (Universal Resource Locator), indicating the individual and specific page, article, or file on the site. This includes the base domain—as well as the protocol, subdomain, path, file, and any parameters included in the URL. Particularly for malicious sources which can reside in just one file—or on a single page of a website—full path URL support is critical. If you’d like to know more about the difference between full path and base domain, check out our blog post on the Anatomy of a Full-Path URL.
Malicious Detection: High Coverage and Accuracy marks generally indicate that a URL Database and Classification technology has malicious detection capabilities. The lifespan of online threats varies significantly—especially for URLs used in phishing attacks—requiring continuous analysis and re-evaluation of compromised threats to keep up with status changes.
Language Support: The internet is a global—therefore effective URL Database and Classification technologies must support categorization of all websites and pages, regardless of language. zvelo’s categorization services support nearly 200 languages worldwide—providing the highest level of coverage, regardless of native language.
How to Prepare URLs For Testing a URL Database and Classification Technology
Understanding how to build a corpus of URLs for testing can save you a significant amount of time and energy. It will also ensure you are prepared for questions that will arise and allow you to better compare results between multiple vendors/solutions.
Here are test-specific recommendations:
Prepare a Testing Set for Coverage
Be sure to gather/include URLs that are good examples of your actual traffic and needs. This may mean working with your network administrator to pull recent traffic logs and working together to define all of the various expected traffic. By doing this, the Coverage Rate(s) you see during an evaluation will closely match what you can expect once it’s fully implemented.
- Pull URLs that are representative of your network’s traffic, but also from popular “known” sites.
- Include a combination of domain-only and full path URLs.
- Remove any duplicates from your test corpus.
- Remove any unwanted parameters or Personal Identifiable Information (PII) from URLs.
Additionally, you’ll want to run a Coverage test of this same corpus of URLs on several occasions to see how coverage changes/improves over time. Be sure to track your test results.
Prepare a Testing Set For Accuracy
For measuring accuracy, you can use the same corpus of URLs from your coverage testing to verify categories. You may wish to define what categories are considered accurate and have the same team perform accuracy verification across your tests—since different vendors have different names for the same category.
Remember to pay careful attention to malicious and objectionable categories. You may wish to build a secondary test corpus of URLs for this. Because of the nature of malicious and objectionable content—it is critical to test URLs that are very recent and up to date. You may wish to pull a list of malicious/objectionable URLs from email quarantine or an up-to-date malicious feed.
NOTE: If you test objectionable content (e.g. child pornography, terrorist related, etc.), be aware of any legal or compliance requirements—and follow appropriate and applicable local laws. In many countries, it is illegal to even possess a list of these URLs.
Testing For Performance
When testing for performance you may wish to use network activity monitors, “sniffers”, and other diagnostic tools to determine overall performance. You should run a variety of tests on relevant networks and devices—and for a variety of durations to determine any limitations or shortcomings of the technology.
Final Thoughts & Considerations
At the time of writing this blog, there are nearly 2 billion websites on the internet and counting. It’s important to remember that no web filtering technology will achieve 100% accuracy—AND that content and malicious threats are constantly changing. Importantly, finding a URL Database and Classification technology partner that shares your vision, demonstrates the commitment to excellence you require, and provides a business model for shared success will be critical to becoming and remaining a web filtering market leader. The result? Peace of mind.
For more information about the zveloDB URL Database and our other data solutions, contact our support team. If you’re ready to schedule an evaluation, contact us.
Schedule an Evaluation