Monotonic Malware Classifiers
Microsoft has deployed their first monotonic classifier to detect malware attacks since late 2018. The attacker cannot evade the model by adding redundant features. It is inspired by the paper from Incer et al. on robust malware detection using monotonic classifier, based on the following idea:
If we select features that capture malicious functionalities well, it is useful to train a monotonic classification function that has increasing (or nondecreasing) prediction scores as the feature values increase. Let’s say that the prediction score falls between 0 and 1, and any score > 0.5 is malicious. To generate an evasive malware variant from a given malware seed with > 0.5 score, the attacker will have to decrease feature values to decrease the score, which can potentially remove malicious functionalities.
Wouldn’t it be great to enforce that the malware author has to delete some malicious functionality to successfully evade the classifier? To study the robustness of monotonic malware classifiers, I did some experiments on monotonic PDF malware detection in the USENIX Security ’20 paper.
Training monotonic PDF malware classifier
I used the monotonic constraints from XGBoost to train tree ensemble models. Here are some figures from the XGBoost documentation to visualize the monotonic prediction function for a feature.
I specified every feature to be monotonically increasing, and trained monotonic PDF malware classifier. The monotonic PDF malware classifier has 99% accuracy and 1.78% false positive rate, while achieving monotonicity for all classification features. Previously, I wrote another post on how XGBoost enforces such monotone constraint. In short, the monotonicity property is provably correct. Along every feature dimension, the output of the malware classifier looks like the above figure on the right side when the feature value changes. Specifically, the classification features are the path to every node in a parsed PDF tree. For example, a PDF malware may have the following features:
Based on the training set, I have a total of 3514 such paths as features. Every feature is a binary value to indicate whether the path exists in the PDF. If we increase an arbitrary path from 0 to 1, and fix the values for the rest 3513 paths, the monotonic classifier should have the same or higher prediction score, i.e., ∀i, f(x₁, x₂, …, xi = 1, … ) ≥ f(x₁, x₂, …, xi = 0, …).
Is the monotonic PDF malware classifier more robust against evasion attacks?
To evaluate how robust the monotonic classifier really is, I used EvadeML to generate real evasive PDF malware variants against the classifier. EvadeML uses a genetic evolutionary algorithm to evade the classifier. It uses mutation operations including insertion, deletion, and replacement of PDF objects, guided by a fitness function (the higher the classification score, the lower the fitness), to evolve a population of malware variants that retain malicious functionalities. The evolutionary attack succeeds when any of the malicious variants is predicted as benign. I ran the EvadeML attack over the benchmark set of 500 malware seeds with malicious network signatures.
The first thing I noticed is, EvadeML is very effective at adapting the attack strategy to target the monotonicity property using deletion operations. Whenever the mutation chooses to insert a PDF object, the fitness score decreases. Thus, the variants with higher fitness scores are those generated with only deletion operations, and these variants are more likely to survive many generations of the evolutionary process. As long as deletion does not remove the network signature, the attack can keep deleting things until it succeeds. However, by running EvadeML, I could only evade half of the 500 malware seeds. For the other half of the PDF malware, deletion eventually removes the network signature, effectively removing the malicious functionality.
So I thought, wow, if the monotonic classifier is robust for 50% of the PDF malware, even for a toy dataset, this is huge!
Can we do better to evade the monotonic PDF malware classifier?
I thought about this problem for a long time. I tried to answer the following question:
If deletion removes the malicious behavior, can we do deletion but keep the malicious behavior somewhere else in the PDF?
Move Exploit Attack: So I came up with a new strategy to mutate the other half of the PDF malware seeds that are deletion-resistant: the move exploit mutator. The move exploit mutator can move the exploit object to different trigger points in the PDF. Essentially, we can view it as first doing deletion along a feature dimension, and then do insertion along a different feature dimension to keep the exploit in the PDF. For example, deleting the exploit object from /Root/OpenAction/JS (the path feature changes from 1 to 0) may decrease the classification score from 0.9 to 0.3, which also removes the malicious behavior. If we move the exploit from /Root/OpenAction/JS to /Root/Pages/Kids/AA, equivalent to doing an insertion after deletion, the classification score may increase from 0.3 to 0.4, resulting in a successful evasion. Note that for each feature dimension, the monotonicity property holds. By exploring the prediction function along different feature dimensions, the move exploit attack can successfully evade all the deletion-resistant PDF malware.
- Monotonicity property is very meaningful for malware classification tasks, which eliminates insertion-only attacks.
- For some datasets, it is possible to achieve high accuracy and low false positive rate to train the monotonic malware classifiers.
- It is very hard to evade the monotonic malware classifier by doing deletion-only attacks, which may remove malicious behavior. In my experiments, it removes malicious behavior 50% of the time.
- However, it is possible to design new attacks that utilize the deletion operation while keeping the malicious behavior. Using PDF malware as the example, I designed a new move exploit attack to evade the monotonic classifier.