CTI: Collection and Processing

Estimated Reading Time: 6 minutes

Cyber Threat Intelligence (CTI): Collection and Processing

In a blog post published earlier this month, we introduced the concept of the Cyber Threat Intelligence (CTI) Process Feedback Loop, outlining key requirements and outcomes for the CTI Planning and Direction stage. This post will focus on the CTI Collection and Processing stages of the feedback loop and demonstrate how zvelo leverages each stage to advance its Cyber Threat Intelligence strategies and solutions.

CTI Planning and Direction: Key Outcomes
As a brief recap, below are the three key requirements which an organization must establish during the first stage in the process, Planning and Direction.
Priority Cyber Intelligence Requirements (PCIR).
Friendly Cyber Information Requirements (FCIR).
C-Suite Critical Cyber Information Requirements (C3IR).

CTI-collection-and-processing

The Roles of the Collection and Processing Stages in the CTI Cycle

The CTI Collection and Processing stages are crucial to the Analysis stage, where the focus is to produce actionable intelligence. In addition to providing the necessary foundation for the following stages of the process, the CTI Collection and Processing stages reveal vast Intelligence opportunities hidden in ‘Big Data’. With that in mind, let’s explore the CTI Collection and Processing stages in greater detail.

CTI: Collection

From a Military Intelligence perspective, Collection can be defined as gathering raw data from across the Operational Environment (OE) in various disciplines — Signals Intelligence, Open-Source Intelligence, Geospatial Intelligence, etc. The data collection strategy can vary depending on the data’s intended purpose, as well as the Intelligence discipline used. From an Enterprise or Organizational CTI perspective, Collection is no different than Military Intelligence except that CTI providers gather raw data about an organization’s Cyber Operating Environment (COE), rather than the OE.

Collection Methods/Sources

In general, there are three ways to conduct CTI Collection — Third party data, self-sourced data, or a combination of the first two options.

Third-Party Data. Organizations can procure third party data feeds and information. These feeds can be opensource or free but may come with the risk of higher False Positives (FPs). Alternatively, when accuracy is critical, third party data may be purchased from a trusted vendor for a premium.

Self-Sourced Data. Organizations often decide to gather data on their own from various internal sources, sensors, honeypots/nets, crowdsourcing, etc. The downside of this method is that the data collected must be stored, which can be costly.

Combination of Third-Party and Self-Sourced Data. Many organizations employ a mix of feeds plus their own proprietary data. In theory, this may be the most effective approach to maximize CTI coverage. Research by the zvelo Cybersecurity Team shows both gaps and differences across the unique third-party feeds available. And, while combining data sources is unlikely to result in 100% CTI coverage, organizations should maximize sources as much as possible.

The Secret to Collection — Volume, Visibility, and Location

Volume. It requires billions of raw data points to produce just a few hundred pieces of actionable intelligence. According to Cisco, Zettabytes (~1 Billion Terrabytes) of data transits the internet every day. Since collecting the data from the whole of the internet is untenable, CTI providers like zvelo, target and gather the raw data crucial to specific needs which have been previously defined and documented in Priority Cyber Intelligence Requirements (PCIR). The figure below shows the relationship between data, information, and intelligence.

Visibility. Visibility is crucial because you must be able to “see” what it is that you want to collect. Unfortunately, Malicious Cyber Actor (MCA) obfuscation techniques, such as browser header tracking to redirect standard bots, often hinder visibility for CTI.

Location. Geographical location matters. For example, it is entirely possible that a site may show offline in North America but appear as perfectly functional in Europe — information which may result in actionable intelligence later in the process.

zvelo’s Role in CTI Collection

zvelo’s collection coverage spans the ProActiveWeb, ActiveWeb, and InActiveWeb gathering the raw data needed to generate actionable intelligence. Specific to the ActiveWeb, zvelo’s supported coverage includes 650 million users across the globe which supplies the volume, visibility, and location data required to observe MCAs and potential threats as early as possible. These potential early detections provide key insights as the data is processed, analyzed, and disseminated as actionable CTI derived from zvelo’s collection network.

CTI: Processing

Once an organization has amassed the necessary raw data about the COE, the next step is to process it. Joint Pub 2-0: Intelligence defines Processing as “a system of operations designed to convert raw data into useful information.” The key words in this statement are ‘useful information’. What makes information useful goes all the way back to Planning & Direction stage, where the PCIR were defined. Similar to Collection, where it is not possible to gather every single piece of information in the COE, processing all of the raw data with the same degree of granularity would ultimately take so much time that information produced would no longer be actionable.

CTI Processing starts with data aggregation and normalization. An organization must aggregate all the data into a single information set which can then be placed into a common schema. This is a critical step as the CTI Analysis stage is dependent upon aggregation and normalization for the pattern analysis, assessment, and scoring which will ultimately produce actionable intelligence from the raw data.

Processing Methodologies

There are multiple ways to process data in CTI. The three most common are Rules-based, Artificial Intelligence/Machine Learning (AI/ML), and Manual Analysis.

Rules-Based. Rules-based processing can be viewed as simple “business-logic”. If a URL/domain meets specific criteria, certain actions are performed.

Artificial Intelligence/Machine Learning (AI/ML). AI/ML is increasingly popular in the CTI arena — largely because humans just cannot keep up with processing billions of data points. However, unsupervised AI/ML engines are potentially subject to biases and False Positives (FPs), so it is best to deploy an AI/ML engine which incorporates human supervised Machine Learning.

Manual Analysis. As the most tedious and resource intensive method, manual analysis is used for edge cases when neither the rules-based nor AI/ML engines can reliably process an input. Manual analysis can be complicated and time consuming which does not meet the speed of today’s MCAs.

The Secret to Processing — Data Validation

Data validation is a crucial step necessary to derive actionable intelligence from the raw data. Unfortunately, it’s also frequently skipped. Just because (pick an input, internal source, third-party feed, crowdsourced, etc.) one source claims something is malicious/phishing/evil, it may or may not be accurate. The old saying “trust but verify!” is particularly true when it comes to CTI processing.

zvelo’s Role in CTI Processing

zvelo’s methodology for CTI processing uses a balanced mix of Rules-Based, Human Supervised AI/ML, and Manual Analysis by its expert Cybersecurity Team. The blended approach to CTI Processing ensures that our web content categorization, curated CTI datasets, and other solutions deliver results that are high veracity and accuracy with low FP rates.

Where zvelo Fits…

zvelo supports the entire CTI cycle starting with Planning & Direction that is driven by our partners (external) and deep subject matter expertise (internal). zvelo collects billions of data points across the ProActiveWeb, ActiveWeb, and InActiveWeb, in addition to integrating multiple proprietary sources so your organization does not have too. zvelo then processes the raw data into information bins for Analysis in Web Content Categorization, Suspicious New Domain Registrations, Malicious Detections, and Phishing Detections. The zvelo Cybersecurity Team processes all this information and prepares it for Analysis and Dissemination — which will be covered in an upcoming blog. If you want to learn more about zvelo and its expanding CTI offerings, please contact us!

Links to additional articles on how zveloCTI supports the entire process feedback loop for our partners across the areas of web content categorization, malicious detection, and phishing may be found below. And, as always, please let us know if, or how, we can assist with your organizations threat intelligence or needs.

Part 1 – zveloCTI: Planning and Direction

Part 3 – zveloCTI: Analysis, Dissemination, & Feedback

READ: CTI: Analysis, Dissemination, and Feedback