Monotonic Malware Classifiers

Training monotonic PDF malware classifier

I used the monotonic constraints from XGBoost to train tree ensemble models. Here are some figures from the XGBoost documentation to visualize the monotonic prediction function for a feature.

From XGBoost Doc. Left: fitting the data without enforcing monotonic constraint. Right: fitting the data by enforcing monotonic increasing constraint.
Left: the parsed PDF tree structure of a real PDF malware. Right: the path to every node

Is the monotonic PDF malware classifier more robust against evasion attacks?

To evaluate how robust the monotonic classifier really is, I used EvadeML to generate real evasive PDF malware variants against the classifier. EvadeML uses a genetic evolutionary algorithm to evade the classifier. It uses mutation operations including insertion, deletion, and replacement of PDF objects, guided by a fitness function (the higher the classification score, the lower the fitness), to evolve a population of malware variants that retain malicious functionalities. The evolutionary attack succeeds when any of the malicious variants is predicted as benign. I ran the EvadeML attack over the benchmark set of 500 malware seeds with malicious network signatures.

Can we do better to evade the monotonic PDF malware classifier?

I thought about this problem for a long time. I tried to answer the following question:


  • Monotonicity property is very meaningful for malware classification tasks, which eliminates insertion-only attacks.
  • For some datasets, it is possible to achieve high accuracy and low false positive rate to train the monotonic malware classifiers.
  • It is very hard to evade the monotonic malware classifier by doing deletion-only attacks, which may remove malicious behavior. In my experiments, it removes malicious behavior 50% of the time.
  • However, it is possible to design new attacks that utilize the deletion operation while keeping the malicious behavior. Using PDF malware as the example, I designed a new move exploit attack to evade the monotonic classifier.



Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store