How XGBoost Enforces Global Monotonicity for Features

  1. Local monotonic split: Whenever the training considers a node: x₁ < some threshold, it computes the optimal left leaf and right leaf values, as if that is the final split. The split is only taken if the left leaf ≤ right leaf, otherwise it is discarded and the algorithm considers other splits.
  2. Local monotonic tree: If there are further children splits, any leaves of the left (right) descendants will be smaller (greater) than the current optimal (left leaf + right leaf)/2, so the local monotonic split is preserved down the road. This value constraint is enforced throughout the trees if the monotone constraint is specified. So within a decision tree, left descendant leaves of x₁ < right descendant leaves of x₁.
  3. Global monotonic sum-ensemble: XGBoost uses sum of all leaves as the final prediction of the ensemble. This preserves monotonicity for the ensemble.
  • If we use fᵢ to represent tree i: ∀ x₁<x₁’, fᵢ(x₁, x₂, …, xₙ) ≤ fᵢ(x₁’, x₂, …, xₙ), where fᵢ maps the input to a concrete value.
  • Therefore, we have Σᵢ fᵢ(x₁, x₂, …, xₙ) ≤ Σᵢ fᵢ(x₁’, x₂, …, xₙ).

--

--

--

Security Researcher http://surrealyz.github.io/

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

Review — Noisy Student: Self-training with Noisy Student improves ImageNet classification

Introduction to Machine learning Using Tensorflow.js

Building a Train Horn Detection Neural Network

Computer vision — creating a classifier using convolutions, pooling and TensorFlow

Review — RoBERTa: A Robustly Optimized BERT Pretraining Approach

A Search Engine for Academic Computer Vision Papers

What is Machine Learning?

Striking a Balance between Exploring and Exploiting

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Yizheng Chen

Yizheng Chen

Security Researcher http://surrealyz.github.io/

More from Medium

Demystifying Hypothesis Testing and Statistical Tests

How to deal with Imbalanced datasets?

Understanding Decision Trees and Cost Complexity Pruning

image-4.png

Anomaly Detection with Negative Sampling (with Batman)