This article considers a measure of variable importance frequently used in variable selection methods based on decision trees and tree-based ensemble models. These models include CART, random forests, and gradient boosting machine. The measure of variable importance is defined as the total heterogeneity reduction produced by a given covariate on the response variable when the sample space is recursively partitioned. Despite its popularity, some authors have shown that this measure is biased to the extent that, under certain conditions, there may be dangerous effects on variable selection. Here we present a simple and effective method for bias correction, focusing on the easily generalizable case of the Gini index as a measure of heterogeneity.

A bias correction algorithm for the Gini variable importance measure in classification trees

SANDRI, Marco;ZUCCOLOTTO, Paola
2008-01-01

Abstract

This article considers a measure of variable importance frequently used in variable selection methods based on decision trees and tree-based ensemble models. These models include CART, random forests, and gradient boosting machine. The measure of variable importance is defined as the total heterogeneity reduction produced by a given covariate on the response variable when the sample space is recursively partitioned. Despite its popularity, some authors have shown that this measure is biased to the extent that, under certain conditions, there may be dangerous effects on variable selection. Here we present a simple and effective method for bias correction, focusing on the easily generalizable case of the Gini index as a measure of heterogeneity.
File in questo prodotto:
File Dimensione Formato  
A Bias Correction Algorithm for the Gini Variable Importance Measure in Clasification Trees - JCGS - Sandri Zuccolotto.pdf

gestori archivio

Tipologia: Full Text
Licenza: NON PUBBLICO - Accesso privato/ristretto
Dimensione 107.73 kB
Formato Adobe PDF
107.73 kB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11379/22151
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 110
  • ???jsp.display-item.citation.isi??? 96
social impact