0%

A Gradient Boosting Machine

GBDT = Gradient Boosting + Decision Tree
another names:

  • MART(Multiple Additive Regression Tree)
  • GBRT(Gradient Boosting Regression Tree)
  • Tree Net

Ensemble learning

why ensemble?

  • statistics
    Alt text
  • calculation
    Alt text
  • representation
    Alt text

ensemble type

bagging

Bootstrap AGGregatING
train weak learner parallelly
low variance

boosting

train weak learner iteratively
low bias

  • adaboosting
  • gradient boosting

Gradient boosting

optimization in parameter space

optimization goal:

solution for the parameters in the form:

where $p_0$ is an initial guess and $\{P_m\}_1^m$ are successive increments (“steps” or “boosts”)

steepest-descent

define the increments $\{p_m\}_1^M$ as follows

  1. current gradient $g_m$ is computed
    where
  2. linear search learning rate
  3. the step is taken to be

optimization in function space

we consider $F(x)$ evaluated at each point x to be a “parameter” and seek to minimize

we take solution to be

where $f_0(x)$ is an initial guess, and $\{f_m(x)\}_1^M$ are incremental functions

current gradient:

and

the multiplier $\rho_m$ is given by the linear search

and the steepest-descent:

finite data

above approach breaks down when the joint distribution of (y, x) is represented by a finite data sample

and then

train gradient model by:

get learning rate $\beta$ by linear search

conclusion

Alt text

Reference