Robust Trees for Security

Classify malicious URLs on Twitter

We used the dataset from Kwon et al. and re-extracted 25 features. The features capture the spammers tendency to:

  • Reuse underlying hosting infrastructure (e.g., reuse URL redirectors and bulletproof hosting servers, register many domains hosted on each IP)
  • Have heterogeneous resources (e.g., compromised machines tend to spread over larger geographical distances than benign ones)
  • Prefer flexibility of the spam campaigns (e.g., use many different initial URLs to make the posts look distinct)
  • Spread URLs to many users (e.g., use a lot of @ and #)

Cost-aware Threat Model

How can we make the Twitter spam detector more robust? Can we use the existing L_inf norm threat model to capture attacker’s capability?

We assume that it is easier to increase feature 1 than to decrease it. The dashed lines are the classification boundary. The left figure shows that robust training using L_inf-norm bound (solid square box) achieves 75% accuracy. Under the cost-aware perturbation (dashed red rectangular box), the model has only 50% accuracy under attack. The right figure shows that using cost-aware threat model, we can achieve 100% accuracy with and without attack.

Robust Training Algorithm

Given the cost model of perturbing data points with lj and hj changes, we know that x4 and x5 will cross the split η. The split is 100% accurate without attacks, but only 66.6% accurate under attacks. Therefore, we need to find a better split.
A better split here is η’. The split is always robust,
but has a 83.3% accuracy.

Adaptive Attacks

What if the attacker knows the cost model we use to train the robust Twitter spam classifier? Can we still improve the cost-aware robustness against adaptive attackers?

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store