Robust Trees for Security

Tree models are widely used for security, such as detecting malicious autonomous system, social engineering, malware distribution, phishing emails, advertising resources for ad blocker, and online scams, etc. Despite their popularity, the robustness of tree models has not been thoroughly studied in the context of security applications. In this post, I will show how to train robust trees to detect Twitter spam. Our most exciting result is that we can increase the feature manipulation cost for adaptive attackers to evade the robust tree ensemble by 10.6X.

Classify malicious URLs on Twitter

We used the dataset from Kwon et al. and re-extracted 25 features. The features capture the spammers tendency to:

We trained a baseline classifier with 99.38% accuracy and 0.89% false positive rate, using XGBoost Gradient Boosted Decision Trees (GBDT).

Cost-aware Threat Model

How can we make the Twitter spam detector more robust? Can we use the existing L_inf norm threat model to capture attacker’s capability?

Based on feature semantics, we observe that L_inf norm threat model is not suitable in many security applications including spam detection. First, the feature manipulation cost is different across features. For example, it is cheaper for attackers to register a few more domain names, than to rent new servers to increase attack agility. Second, there are different cost to increase or decrease a feature value. For instance, it is more expensive to increase the number of bulletproof hosting servers than to decrease it. If we use L_inf norm to capture attacker’s capability, the model may not be robust against realistic adversaries. In the following two figures, we demonstrate that we can obtain better model performance and cost-aware robustness if we use cost-aware threat model in robust training.

We assume that it is easier to increase feature 1 than to decrease it. The dashed lines are the classification boundary. The left figure shows that robust training using L_inf-norm bound (solid square box) achieves 75% accuracy. Under the cost-aware perturbation (dashed red rectangular box), the model has only 50% accuracy under attack. The right figure shows that using cost-aware threat model, we can achieve 100% accuracy with and without attack.

Therefore, we proposed a cost modeling method to capture the domain knowledge of different feature manipulation cost. For Twitter spam detection, we used high-dimensional box to capture how much an attacker is able to increase and decrease each feature, by four categories: negligible, low, medium, and high cost. This can be generalized to arbitrary constraint-based threat model, as described in our paper. Our cost model has the same expressiveness as the attack action rule-based model in TREANT. However, our approach directly maps each feature value to perturbed ranges, which can be easily integrated in the robust training.

Robust Training Algorithm

Given the cost model of perturbing data points with lj and hj changes, we know that x4 and x5 will cross the split η. The split is 100% accurate without attacks, but only 66.6% accurate under attacks. Therefore, we need to find a better split.

We designed a new training algorithm that is more effective at finding robust splits in trees compared to the state-of-the-art. Briefly, when growing the trees, we evaluate the quality of each split xʲ < η using some gain metric such as information gain, Gini impurity or loss reduction.

A better split here is η’. The split is always robust,
but has a 83.3% accuracy.

The cost model tells us the set of data points that are likely to be perturbed to cross the split, which decreases the gain, and then we need to find the best robust split with the highest gain. This is formulated as the max min problem here.

Since it is intractable to enumerate all the possible robust splits, we designed and implemented a greedy algorithm to find the best robust split, which can be generally applied to different types of tree ensembles and different gain metrics. Our algorithm works with random forest in scikit-learn and GBDT in XGBoost. We have open sourced our implementation here.

Adaptive Attacks

What if the attacker knows the cost model we use to train the robust Twitter spam classifier? Can we still improve the cost-aware robustness against adaptive attackers?

We proposed a new adaptive attack objective in the strongest whitebox attack (MILP) to directly target the trained cost model. The adaptive attacker minimizes the weighted sum of feature value differences. Each weight represents the unit cost (e.g., some dollar amount) for the attacker to increase the feature or to decrease it. If we allow increasing a low-cost feature to be twice the amount of a medium-cost feature during training, we set the weights for the adaptive attacker such that the cost of changing one unit of a medium cost feature is equivalent to changing two units of a low-cost feature.

We extensively evaluated our cost-aware robust training algorithms, by training 19 different configurations of the cost models and studying how they can increase the adaptive attack cost compared to baseline models. Our results show that, compared to regularly trained model, our robust model increases the adaptive attack cost to evade them by 10.6×.

Our paper “Cost-Aware Robust Tree Ensembles for Security Applications” will appear at USENIX Security 2021, and our code is available on github.

Security Researcher