Malicious Detection in Depth

Today’s threat actors are masters of evasion, constantly developing new versions of malware to outmaneuver cyber defenders and forging new paths to expand the threat landscape. As global cybercrime continues to escalate and plague both organizations and individuals, becoming the target of a cyber attack at one point or another is no longer a question of ‘if’, but ‘when’. The importance of effective malicious threat detection cannot be overstated. From malware, to phishing, to social engineering, to ransomware, and beyond, this article explores the various types of malicious threats, use cases for malicious threat detection, as well as some of the most common security tools and methodologies used for detection.

What is Malicious Detection?

Malicious detection refers to the identification of potentially malicious URLs and IPs known to be associated with threats and exploits like malware, phishing, social engineering, etc. Once detected, the malicious URLs and IPs are used in security tools and applications to protect networks, endpoints, and users from domains, web pages, or IPs classified as malicious, phishing, fraud, botnet or one of the other types of exploits listed below.

Types of Malicious Threats

Ad Fraud. Sites that are being used to commit fraudulent online display advertising transactions using different ad impression boosting techniques including but not limited to the following, ads stacking, iframe stuffing, and hidden ads. Sites that have high non-human web traffic and with rapid, large, and unexplained changes in traffic.
Botnet. Botnets are made up of a large number of compromised machines running software that has been installed by hackers to send spam, phishing attacks, and denial of service attacks. In many cases, these computers (called bots or zombies) have been infected unbeknownst to their owners.
Compromised Pages & Links To Malware. Compromised web pages are pages that appear to be legitimate, but house malicious code or link to malicious websites hosting malware. These sites have been compromised by someone other than the site owner. If Firefox blocks a site as malicious, use this category. Examples are defaced, hacked by etc.
Cryptocurrency Mining. Websites that use cryptocurrency mining (“cryptojacking”) technology without seeking the user’s permission.
Malware Command and Control (C2). Malware C2 is when infected machines report information back to a particular URL or check a URL for updates. Attackers use C2 internet servers to communicate with, and remotely manipulate infected machines to execute cyber attacks like Distributed Denial of Service (DDoS), Intellectual Property (IP) data exfiltration, ransomware, etc. Examples of C2 channels used today include TCP, HTTP, HTTPS, DNS, DoH, ICMP, FTP, IMAP, MAPI, or SMB (great reference: https://www.thec2matrix.com/).
Malware Distribution Point. Web pages that host viruses, exploits, and other malware are considered Malware Distribution Points. Web Analysts may use this category if their anti-virus program triggers on a particular website.
Phishing/Fraud. Web pages that impersonate other web pages usually with the intent of stealing passwords, credit card numbers, or other information. Also includes web pages that are part of scams such as a “419” scam where a person is convinced to hand over money with the expectation of a big payback that never comes. Examples con, hoax, scam etc.
Spam URLs. URLs that frequently occur in spam messages.
Spyware & Questionable Software. Software that reports information back to a central server such as spyware or keystroke loggers. Also includes software that may have legitimate purposes, but some people may object to having on their system.

Tools Used for Malicious Threat Detection

Just as there is a broad range of use cases for malicious threat detection, there are also numerous options for security tools that can offer varying types of protection throughout the different access points of an organization’s network. Below are some of the most commonly used to address the primary security layers.

Firewalls. Since malicious attacks begin with the threat actors gaining access to a network, a firewall is often the first line of defense. The firewall blocks certain ports from accessing the network, and uses behavioral and/or rule based detections to stop an attacker from gaining access to the network.
Network Intrusion Prevention System (NIDS). NIDS provide another layer of security by using behavioral and rule based detection for potential malicious threats at the network level. Additionally, NIDS are equipped to provide the data granularity necessary for cyber analysts to detect an attack so that they can respond with a targeted approach to block an attacker from accessing the network.
Endpoint Detection and Response (EDR). EDRs provide crucial visibility for network security teams to detect threats, like Ransomware, if the attackers manage to evade the other security layers and penetrate the network. EDRs provide host based detection, investigation and remediation against malware to contain threats before the nefarious actions of an attacker can be fully executed.
Logging. Logging network traffic enables security teams to monitor the network for known Indicators of Compromise (IOCs) and activities that may indicate potential malicious threats, like abnormal network activity, sudden spikes in traffic, or suspicious traffic patterns. Traffic logs can also help identify new IOCs — including malicious URLs, domains, or IPs — or suspicious activity on endpoints to aid security teams with threat hunting activities. Additionally, logging helps identify the scope of an attack as it can be used to reconstruct the events leading up to an incident.

Malicious Threat Detection Methodologies

Signature-Based Detection

Signature-based malware detection is the most commonly used cybersecurity method that involves scanning files and comparing their attributes to a database of known malware signatures. When a match is found, the file is identified as malicious and can be blocked or removed. While it is reliable for detecting known malware, signature-based detection has limited effectiveness against emerging threats and fileless attacks, which require additional behavioral analysis.

Heuristic Based Detection

Heuristic-based malicious detection examines code for suspicious properties and commands that could indicate malicious intent. This approach uses a variety of rules and/or algorithms to identify potentially malicious behavior rather than relying on specific malware signatures. Heuristic-based scanning is used to identify and detect new and emerging malware that is not already a known threat. Heuristic-based detection is commonly used along with signature-based detection to further enhance detection rates and decrease gaps in security. However, it should be noted that along with enhanced detection rates, heuristic-based detection has the potential for reporting false positives and often misses malware that uses obfuscation techniques to evade detection in the first place.

Behavior-Based Detection

Behavior-based malicious detection evaluates an object’s intended actions before it executes that behavior to identify malicious or suspicious activity. Behavior-based detection may examine requests to access specific files, processes, connections, or services — including low-level code hidden by rootkits. This detection methodology includes Static Malware Analysis and Dynamic Malware Analysis.

Static Malware Analysis

Static malware analysis is a methodology for evaluating malware by examining its code and structure without executing it. This approach is useful for identifying dangerous capabilities within the code of a malware sample and can help malware analysts gain insights into the functionality of the sample. However, it has limitations since malware authors often obfuscate their code to bypass static detection technologies in emulation by security products.

Depending on the sophistication of the malware, static analysis may be all that an analyst needs to determine the functionality of a particular sample. It is important to note that static analysis is not just reviewing the code. If the malware author has packed, encoded, or otherwise obfuscated their code, the analyst must solve the puzzle with static analysis. Crowdsourced platforms like VirusTotal, Malpedia and MalShare are often used by malware analysts to aid in the assessment.

Dynamic Malware Analysis

If the static analysis does not yield results, the next step is dynamic analysis which is performed to examine malicious behavior as it executes. Dynamic analysis is performed in two stages: basic and advanced.

Basic Dynamic Malware Analysis

READ: Tycoon 2FA: Protect Against Phishing Kits That Bypass MFA

Basic dynamic analysis is nothing more than assessing how malware interacts with the victim system which involves performing a controlled detonation on a comparable victim system. Depending on information gleaned from static analysis, this could be a virtual machine (VM) or physical hardware, as some malware is plumbed to check for VMs. Similar to static analysis, the answers to the malware riddle may come from basic dynamic analysis, if not the malware analyst must go deeper.

Advanced Dynamic Malware Analysis

In cases where the malware sample is well written and evades basic dynamic analysis techniques, advanced dynamic malware analysis is used. As an example, a malware sample may check to see if there is an “active” internet connection with Domain Name Services (DNS) and then self-delete if it cannot reach 8.8.8.8 (Google DNS). The analyst would then need to set up a simulation infrastructure to “fool” the sample into thinking it is actually on the internet and give up its secrets. This technique is valuable when the analyst is looking to understand behavioral aspects/heuristics of the malware as a whole, not just on the victim system but in the network as well. While dynamic analysis may seem easy, it takes a specialized skill set to prevent a live malware sample from getting out of hand.

Machine Learning Based Detection

Machine learning based malicious detection refers to the use of supervised and/or unsupervised machine learning algorithms to detect and identify malware. This approach involves training a model using pre-categorized malicious websites and their corresponding features, which can include low-level data such as images or text. The model is then tested against new malware samples to predict the likelihood of the sample being malicious. Human supervised machine learning is essential to constantly evaluate and improve the model’s accuracy.

Supervised Machine Learning

Supervised machine learning involves collecting pre-categorized websites (training labels) along with the corresponding html and images for those websites (training features). We then “train” a model to create a mapping from the large number of features to the labels. Feedback is provided to the supervised model in the form of a loss function, where the model is penalized for incorrect answers and rewarded for correct answers. In this way, the machine learning algorithm slowly gets better as more and more labeled data enters the model.

Unsupervised Machine Learning

On the other hand, unsupervised learning involves using only the training features WITHOUT labels to determine useful trends and “clusters” in the data. This method can work well if you have lots and lots of data and need a place to start; however, models will be much less accurate. And yes, it is exhaustive work to analyze and label the amount of websites that is required to achieve a state-of-the-art model.

Human Supervised Machine Learning (HS/ML)

Machine Learning must constantly evaluate models against humans and vice versa to make sure that they are always up to date and accurate. In HS/ML, when a human finds that the algorithm has made a mistake, the data is automatically incorporated back into the system so that the model in question can be retrained to avoid such mistakes in the future. Constant monitoring, flagging, and retraining process is key to building a high degree of accuracy to minimize false positives that can plague many security tools.

Detections Using Natural Language Processing

Natural Language Processing (NLP) is an area within Artificial Intelligence that is specifically focused on interactions between machines and humans, using software that can interact with human language to extract details like sentiment, named people or places, intent, topics, etc. NLP is leveraged for malicious detection to analyze large volumes of data and detect patterns that can be missed by using traditional security tools. The NLP algorithms analyze the language used in email communications, in website content or on social media platforms to look for signs of malicious activity — like botnets, spam, fake accounts, etc. — to identify patterns and trends that may indicate emerging threats or new attack vectors.

Neural Machine Translation (NMT) is a subfield of NLP that focuses on using artificial neural networks to translate text from one language to another. As the threat landscape is not bound by language or geographical regions, the use of NMT to analyze and translate large volumes of data into multiple languages is critical to malicious detection. Security analysts can leverage NMT to translate communications between attackers, website content, social media posts and other digital content to identify patterns in language that may be indicative of malicious activity.

Crowdsourced Detection

Crowdsourced malicious detection is exactly what it sounds like. It involves leveraging the greater community of individuals and cybersecurity professionals to collectively examine or evaluate suspicious files for malicious activity to help identify and respond to potentially malicious threats. Crowdsourced malicious detection can include everything from encouraging individual end users to report suspicious activity, to leveraging technical details about known malware shared through platforms like VirusTotal, Malpedia and MalShare.

Hybrid Detection

Hybrid malicious detection is a combination of some or all of the different types of methodologies mentioned above. As each methodology has pros and cons, a hybrid approach can maximize coverage and accuracy when it comes to protection against malicious threats. The degree to which different security vendors choose to hybridize their approach can vary greatly so it’s absolutely critical to understand exactly which of the above methodologies are used.

Use Cases for Malicious Threat Detection

Malicious threat detection is critical for cloud service providers, businesses and security vendors because these are the threats which can compromise networks leading to data breaches, ransomware attacks, malware infections, etc.

DNS and Web Filtering. Block high risk or potentially dangerous DNS connections to malicious, phishing, and non-sanctioned content domains. Premium solutions offer filtering with protection and support at the domain, page-level, and full-path URL. Read the Case Study.
Enrich Security Tools. Enrich and automate SIEM, SOAR, and other security platforms with malicious threat intelligence data to improve efficiency and speed of response.
SWG, FWaaS, CASBs. Secure cloud native environments with full-path URL phishing and malicious datasets to protect users, networks and devices in the modern hybrid workforce.
SASE. Whether you use a URL database, threat intelligence feeds, or both, having full-path malicious URL visibility and blocking capabilities power maximum protection for SASE security solutions.
XDR/MDR. Augment in-house threat intelligence ingestion, aggregation and curation with better and faster threat detections to power your XDR/MDR offerings. Read the Case Study.
Endpoint Security. Stop threats at the endpoints and IoT devices with premium phishing and malicious URL datasets.
Browser Security/Remote Browser Isolation. Allow users at any location to safely browse the internet while blocking access to malicious or compromised websites. Read the Case Study.
Email/SMS Security. Malicious threat prevention for endpoint, email, and text message security. Read the Case Study.
Cyber Threat Intelligence. Threat intelligence data on malicious URLs and IOCs enable defenders to block adversaries at the initial access point for comprehensive malicious threat protection. Read the zveloCTI Case Study.
Threat Research. Understand the contextual relevance of potential threats with key metadata that map to the malicious threat signals in your environment. Malicious threat intelligence data can be used for research, forensic analysis, historical lookback, and more.

AV-TEST Institute registers over 450,000 new malicious programs (malware) and potentially unwanted applications (PUA) every day. As we stated in the beginning of this post, becoming the target of a cyber attack at one point or another is no longer a question of ‘if’, but ‘when’. And when you become the target, your ability to defend yourself or your organization is entirely dependent upon the quality — accuracy, coverage, veracity — of a security vendor’s malicious detection capabilities. Understanding the basics of malicious detection will help you find the right level of protection. While the best malicious threat protection comes at a cost, it is, without question, a vital investment.