Robust Trees for Security

Classify malicious URLs on Twitter

  • Reuse underlying hosting infrastructure (e.g., reuse URL redirectors and bulletproof hosting servers, register many domains hosted on each IP)
  • Have heterogeneous resources (e.g., compromised machines tend to spread over larger geographical distances than benign ones)
  • Prefer flexibility of the spam campaigns (e.g., use many different initial URLs to make the posts look distinct)
  • Spread URLs to many users (e.g., use a lot of @ and #)

Cost-aware Threat Model

We assume that it is easier to increase feature 1 than to decrease it. The dashed lines are the classification boundary. The left figure shows that robust training using L_inf-norm bound (solid square box) achieves 75% accuracy. Under the cost-aware perturbation (dashed red rectangular box), the model has only 50% accuracy under attack. The right figure shows that using cost-aware threat model, we can achieve 100% accuracy with and without attack.

Robust Training Algorithm

Given the cost model of perturbing data points with lj and hj changes, we know that x4 and x5 will cross the split η. The split is 100% accurate without attacks, but only 66.6% accurate under attacks. Therefore, we need to find a better split.
A better split here is η’. The split is always robust,
but has a 83.3% accuracy.

Adaptive Attacks




Security Researcher

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

The Problem with Accuracy

A fire alarm is triggered by the smoke from a freshly baked cake.

This Is Machine Learning, Part 2: Supervised Learning

Artificial Neural Nets

Regression Models with multiple target variables

Transfer Learning

Detecting Cassava Leaf Disease, Part 1

Reinforcement Learning Value Function

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Yizheng Chen

Yizheng Chen

Security Researcher

More from Medium

PySpark to Sparkify: music users churn prediction

Data-Centric AI: deep dive on Class Imbalance Problem for Supervised Classification

Azure Machine Learning — Training Model

Unsupervised feature selection with eigenvalue clipping and PCA