Statistical Methods for Reinforcement Learning Policy Comparison

Casesa, Romeo

This work departs from the Reinforcement Learning (RL) setting and analyzes the implications of statistically informed decisions when comparing different intelligent systems, i.e. different algorithms or policies. To this aim, we introduce the concepts of statistical blocking, inferential confidence intervals (ICIs) and the Skillings-Mack test. Although these approaches are not new within the statistical literature, their applications to reinforcement learning is innovative. The use of statistical blocking stems from the intuition that policy classification is often performed based on a multitude of tasks: this added source of variability can be removed with statistical blocking, leading to more powerful tests. This is shown with the Skillings-Mack test procedure which is applied on a sample dataset from the reinforcement learning literature and compared, through synthetic data, against other state of the art policy comparison methods. Our results show that the Skillings-Mack test performs better than currently available state of the art methods providing statistically significant results for lower sample sizes; in addition, the procedure outperforms other methods when tasks with different mean scores are compared. We further propose the use of inferential confidence intervals within the field of RL. These confidence intervals are calculated based on a correction factor which scales the confidence intervals to allow overlap only when two policies are not statistically significantly different; in this way we allow an inference "by-eye" approach. A novel correction factor for ICIs is introduces which is well behaved even when the two confidence intervals are very close.

Inserito nel contesto dell'apprendimento per rinforzo (reinforcement learning - RL), questo lavoro analizza le implicazioni di decisioni statisticamente informate quando si confrontano diversi sistemi intelligenti, cioè diversi algoritmi o policy. A tal fine, vengono introdotti il concetto di statistical blocking, di intervalli di confidenza inferenziali (ICI) e il test di Skillings-Mack. Sebbene questi approcci non siano nuovi nella letteratura statistica, la loro applicazione al reinforcement learning risulta innovativa. L'uso del statistical blocking deriva dall'intuizione che la classificazione delle policy avviene spesso sulla base di una moltitudine di compiti o task: questa fonte aggiuntiva di variabilità può essere eliminata, risultando in test più potenti. Ciò viene dimostrato tramite la procedura di test Skillings-Mack applicata prima a un set di dati campione della letteratura sul reinforcement learning e confrontata poi, attraverso dati sintetici, con altri metodi correntemente utilizzati in letteratura per il confronto delle policy. I risultati ottenuti mostrano come il test di Skillings-Mack risulti migliore rispetto ai metodi attualmente utilizzati in letteratura, fornendo risultati statisticamente significativi per campioni di dimensioni inferiori; inoltre, la procedura supera gli altri metodi quando vengono confrontati task le cui medie risultino differenti. Proponiamo inoltre l'uso di intervalli di confidenza inferenziali nel campo della RL. Questi intervalli di confidenza sono calcolati sulla base di un fattore di correzione che ridimensiona gli intervalli di confidenza per permettere la sovrapposizione solo quando la differenza tra due policy non è statisticamente significativa. È introdotto un nuovo fattore di conversione per gli ICI che risulta non singolare anche quando i due intervalli di confidenza sono molto vicini.

Statistical Methods for Reinforcement Learning Policy Comparison / Casesa, Romeo. - (2024 Jul 11).