Talks and presentations

Approximate Leave-one-out CV for Regression with L1 regularizers

May 01, 2024

Talk, AISTATS 2024, Valencia, Spain

The out-of-sample error (OO) is the main quantity of interest in risk estimation and model selection. Leave-one-out cross validation (LO) offers a (nearly) distribution-free yet computationally demanding method to estimate OO. Recent theoretical work showed that approximate leave-one-out cross validation (ALO) is a computationally efficient and statistically reliable estimate of LO (and OO) for generalized linear models with twice differentiable regularizers. For problems involving non-differentiable regularizers, despite significant empirical evidence, the theoretical understanding of ALO’s error remains unknown. In this paper, we present a novel theory for a wide class of problems in the generalized linear model family with the non-differentiable L1 regularizer. We bound the error $|ALO−LO|$ in terms of intuitive metrics such as the size of leave-i-out perturbations in active sets, sample size n, number of features p and signal-to-noise ratio (SNR). As a consequence, for the L1 regularized problems, we show that $|ALO−LO|\to0$ when $n,p\to\infty$ while $n/p$ and SNR remain bounded.

The bias of Gini coefficient

October 16, 2022

Talk, INFORMS 2022, Indianapolis, IN, USA

The Gini coefficient is a crucial statistical measure used widely across various fields. The interest in the study of the properties of the Gini coefficient is highlighted by the fact that every year the World Bank ranks the level of income inequality between countries using it. In order to calculate the coefficient, it is common practice to assume a Gamma distribution when modeling the distribution of individual incomes in a given population. The asymptotic behavior of the sample Gini coefficient for populations with Gamma distributions has been well-documented in the literature. However, research on the finite sample bias has been absent due to the challenge posed by the denominator. This study aims to fill this gap by demonstrating that the sample Gini coefficient is an unbiased estimator of the population Gini coefficient for a population with Gamma distribution. Furthermore, our findings provide an expectation of the downward bias due to grouping when group sizes are equal.