This article considers a measure of variable importance frequently used in variable selection methods based on decision trees and tree-based ensemble models. These models include CART, random forests, and gradient boosting machine. The measure of variable importance is defined as the total heterogeneity reduction produced by a given covariate on the response variable when the sample space is recursively partitioned. Despite its popularity, some authors have shown that this measure is biased to the extent that, under certain conditions, there may be dangerous effects on variable selection. Here we present a simple and effective method for bias correction, focusing on the easily generalizable case of the Gini index as a measure of heterogeneity.
A bias correction algorithm for the Gini variable importance measure in classification trees
SANDRI, Marco;ZUCCOLOTTO, Paola
2008-01-01
Abstract
This article considers a measure of variable importance frequently used in variable selection methods based on decision trees and tree-based ensemble models. These models include CART, random forests, and gradient boosting machine. The measure of variable importance is defined as the total heterogeneity reduction produced by a given covariate on the response variable when the sample space is recursively partitioned. Despite its popularity, some authors have shown that this measure is biased to the extent that, under certain conditions, there may be dangerous effects on variable selection. Here we present a simple and effective method for bias correction, focusing on the easily generalizable case of the Gini index as a measure of heterogeneity.File | Dimensione | Formato | |
---|---|---|---|
A Bias Correction Algorithm for the Gini Variable Importance Measure in Clasification Trees - JCGS - Sandri Zuccolotto.pdf
gestori archivio
Tipologia:
Full Text
Licenza:
NON PUBBLICO - Accesso privato/ristretto
Dimensione
107.73 kB
Formato
Adobe PDF
|
107.73 kB | Adobe PDF | Visualizza/Apri Richiedi una copia |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.