# How XGBoost Enforces Global Monotonicity for Features

--

XGBoost can train monotonicity for user-specified features, in the Gradient Boosted Decision Trees model. I have used linear programming to verify that the trained monotonicity is sound. This is how they enforce it.

**Preliminary:** To learn a node in a decision tree of the ensemble, the training procedure enumerates all possible splits for all features. Each split is in the form: *feature j< threshold*, and the training enumerates all *j* and all *threshold* values*. *Then, it* *picks the split with the best score, which is the *gain metric*.

WLOG, let’s assume that we specify the first feature to be monotonically increasing. The goal is to achieve:

∀ x₁≤x₁’, f(x₁, x₂, …, xₙ) ≤ f(x₁’, x₂, …, xₙ)

XGBoost does the following.

**Local monotonic split:**Whenever the training considers a node:*x₁ < some threshold*, it computes the optimal left leaf and right leaf values, as if that is the final split.**The split is only taken if the left leaf ≤ right leaf**, otherwise it is discarded and the algorithm considers other splits.**Local monotonic tree:**If there are further children splits, any leaves of the left (right) descendants will be smaller (greater) than the current optimal (left leaf + right leaf)/2, so the local monotonic split is preserved down the road. This value constraint is enforced throughout the trees if the monotone constraint is specified. So within a decision tree, left descendant leaves of*x₁*< right descendant leaves of*x₁*.**Global monotonic sum-ensemble:**XGBoost uses sum of all leaves as the final prediction of the ensemble. This preserves monotonicity for the ensemble.

- If we use fᵢ to represent tree i: ∀ x₁<x₁’, fᵢ(x₁, x₂, …, xₙ) ≤ fᵢ(x₁’, x₂, …, xₙ), where fᵢ maps the input to a concrete value.
- Therefore, we have Σᵢ fᵢ(x₁, x₂, …, xₙ) ≤ Σᵢ fᵢ(x₁’, x₂, …, xₙ).

For code, see src/tree/split_evaluator.h.