ML technologies enable the creation of intelligent systems that can adapt to new types of attacks and learn from past incidents. This improves the efficiency of security teams, enables rapid threat response, and minimizes potential risks. However, cybercriminals also actively use machine learning, which requires appropriate defense methods.


Machine learning in cybersecurity
Machine learning has revolutionized cybersecurity. In the past, cybersecurity relied on rule-based protection systems and analysts. However, with the advent of machine learning, security incident detection and response have become much more effective. By analyzing vast amounts of data and learning from it, ML algorithms can identify patterns and anomalies that indicate potential threats and take measures to prevent or mitigate them.
This text was generated by artificial intelligence (AI)
Machine learning solves all applied cybersecurity tasks
Why ML in cybersecurity?
What security tasks does ML solve?
ML at Positive Technologies
We strive to ensure our products automatically prevent, detect, and respond to threats. ML models in Positive Technologies products continuously learn based on our expertise and user data, including self-learning. Thanks to machine learning, security teams eliminate repetitive tasks, analysts gain valuable insights for threat hunting, and managers can effectively prioritize fixing infrastructure weaknesses.
We have developed ML models that detect hackers' most dangerous tactics:
Why we use ML technologies in products
Protection systems begin by collecting raw data, such as logs, traffic, and executable files. This information must be standardized to detect attacks, identify security incidents, and conduct investigations. Machine learning should be applied at every stage, from working with raw data to creating incident reports.
Key vectors of ML development at Positive Technologies
ML in Positive Technologies products
MaxPatrol SIEM
Expert rules in SIEM systems help track suspicious behavior. However, many attack scenarios cannot be described or detected this way. ML models handle this task effectively.
The BAD (Behavioral Anomaly Detection) module in MaxPatrol SIEM acts as a "second-opinion" system that improves attack detection effectiveness through alternative event analysis methods and by assessing each trigger’s reliability on a 100-point scale. BAD also independently detects targeted attacks, serving as a second layer of defense.
The 49 ML models are divided into several types and subtypes:
- Process activity
- Process execution activity
- Network process activity
- Process access to local pipes
- Relationships between processes on different hosts
- Access activity
- Network share access
- Network pipe access
The BAD module incorporates ML model verdicts and correlation rules, enabling security teams to make prompt and accurate decisions when analyzing triggers.
In MaxPatrol SIEM, the BAD module:
- Independently detects targeted attacks and previously unknown anomalies.
- Collects event and user data, assigns risk scores, and provides alternative assessments based on its algorithms.
- Helps analysts make faster security decisions.
- Enables rapid detection of previously unknown threats that are invisible in fragmented data streams.
PT NAD
In PT NAD, machine learning helps:
- Detect anomalous node activity using profiling rules.
- Identify applications hiding from network traffic analysis systems.
User profiling rules (UPR)
UPRs allow you to configure filters and monitor the behavior of network participants within the traffic of interest. Machine learning identifies anomalies in traffic and automates the decision-making process for identifying malicious activity. You can create your own filters or use baseline rules developed in collaboration with the PT Expert Security Center (PT ESC).
In each filter, you can specify a single feature (such as the number of bytes sent or unique connections), group data by object (client, server, or client-server pair), or select data for the entire network, as well as define a time interval. An anomaly is defined as exceeding a specific threshold in one or multiple time series. The ML model has three sensitivity levels (low, medium, and high) and can trigger on both minor and significant deviations.
PT Sandbox
The ML model in PT Sandbox performs part of the behavioral analysis of files. Dynamic analysis involves running files in a virtual environment, logging their behavior, and analyzing the resulting log. Each running process leaves behind a sequence of system calls (a trace) through which it interacts with the operating system. The ML team at Positive Technologies has analyzed numerous malicious and clean traces to identify sequences characteristic of malware, including network requests to the internet, file operations, and registry accesses. These calls are reduced to a final feature vector that is processed by the ML model, which then classifies the behavior as "bad" or "good."
To implement these ML models in the product, a specific technology stack is used: PT Sandbox utilizes Python code, the ML model is serialized using ONNX, and MLflow is used for experiment tracking and as an artifact repository. Additionally, the model is trained on a daily stream of examples and a reference dataset that excludes false positives, delivering highly accurate detection results.
Tasks the ML model in PT Sandbox helps handle:
- Detecting anomalous subprocess chains. A large number of branching sequences is legitimate on its own. However, the number of nodes, nesting depth, and the repetition or uniqueness of process names can only be effectively analyzed by the ML model.
- Detecting non-standard values of call parameters. In most cases, analysts focus on significant function parameters when searching for malware. The ML model effectively analyzes the remaining parameters.
- Investigating atypical sequences of function calls. Sometimes individual functions or combinations of functions may appear benign, but their sequence is not found in legitimate software. An analyst would need extensive experience to notice such a pattern manually. The ML model detects these patterns through classification using features that were not predefined as indicators of maliciousness.
The main task of ML in PT Sandbox is to continuously improve the accuracy of verdicts when determining if an object is malicious. By analyzing over 8,500 features of object behavior, the ML model ensures high detection quality that is unattainable for systems that use standard malware detection methods.
MaxPatrol VM
Assessing the trending potential of vulnerabilities (CVEs) based on the number of mentions in databases (a statistical approach) has a significant drawback: there is a risk that a vulnerability will be recognized as trending only when it is already being actively exploited.
The machine learning approach includes the following stages:
- The database of publications about CVEs is updated regularly.
- Once a day, the model computes predictions for vulnerabilities based on a dozen parameters, including publication time, number of comments, reposts, likes, post text, and reactions.
- The top 20 predicted CVEs are sent to experts for analysis.
The ML model is trained on both textual (post content) and quantitative features (such as number of subscribers, reactions, and vulnerability mentions) and predicts trending vulnerabilities before the number of their mentions exceeds a threshold value. Experts provide the final evaluation of the model’s performance using quality metrics.
Using the ML model in MaxPatrol VM allows experts to efficiently and promptly determine which CVEs require attention and to rapidly deliver information about trending vulnerabilities into the product.
PT Application Firewall
Products that analyze HTTP traffic receive a large volume of payloads, which may include command shells for remote web server management. In PT Application Firewall, ML models that detect web shells separate legitimate data from malicious data. One model prevents illegitimate scripts from loading, while another detects web shell activity. These models are trained using data on web shells from open sources and examples encountered during Standoff cyberbattles. This diversity increases detection coverage and makes it possible to detect new web shells that cannot be identified using a rule-based approach.
To evaluate detection accuracy, the system uses holdout sets prepared by experts. The initial quality assessment occurs during CI/CD. After model training, a continuous machine learning (CML) process is launched, allowing developers to see the difference in model performance on holdout data within a merge request.
What tasks does ML perform in PT Application Firewall?
- Detecting malicious shells in requests and responses. The ML model determines the probability that a downloaded file is malicious by comparing it with a threshold value. The system uses a convolutional neural network (CNN) model is for classification.
- Detecting shellcode generated by the Metasploit Framework in various formats and encodings.
- The models are trained using payloads created with the Metasploit Framework and data from the Microsoft Malware Prediction competition.
Thinking about the best way to protect your company?
Contact us.
During the consultation we'll propose a solution precisely tailored to your organization.


