Approximate Certified Data Removal in High Dimensional R-ERM
Talks, Minghui Yu Memorial Conference, Columbia University, New York, NY, USA
Slides can be found here.
Talks, Minghui Yu Memorial Conference, Columbia University, New York, NY, USA
Slides can be found here.
Talks, Student Seminar, Department of Statistics, Columbia University, New York, NY, USA
Talk, AISTATS 2024, Valencia, Spain
The out-of-sample error (OO) is the main quantity of interest in risk estimation and model selection. Leave-one-out cross validation (LO) offers a (nearly) distribution-free yet computationally demanding method to estimate OO. Recent theoretical work showed that approximate leave-one-out cross validation (ALO) is a computationally efficient and statistically reliable estimate of LO (and OO) for generalized linear models with twice differentiable regularizers. For problems involving non-differentiable regularizers, despite significant empirical evidence, the theoretical understanding of ALO’s error remains unknown. In this paper, we present a novel theory for a wide class of problems in the generalized linear model family with the non-differentiable L1 regularizer. We bound the error $|ALO−LO|$ in terms of intuitive metrics such as the size of leave-i-out perturbations in active sets, sample size n, number of features p and signal-to-noise ratio (SNR). As a consequence, for the L1 regularized problems, we show that $|ALO−LO|\to0$ when $n,p\to\infty$ while $n/p$ and SNR remain bounded.
Talks, Minghui Yu Memorial Conference, Columbia University, New York, NY, USA
Talk, INFORMS 2022, Indianapolis, IN, USA
The Gini coefficient is a crucial statistical measure used widely across various fields. The interest in the study of the properties of the Gini coefficient is highlighted by the fact that every year the World Bank ranks the level of income inequality between countries using it. In order to calculate the coefficient, it is common practice to assume a Gamma distribution when modeling the distribution of individual incomes in a given population. The asymptotic behavior of the sample Gini coefficient for populations with Gamma distributions has been well-documented in the literature. However, research on the finite sample bias has been absent due to the challenge posed by the denominator. This study aims to fill this gap by demonstrating that the sample Gini coefficient is an unbiased estimator of the population Gini coefficient for a population with Gamma distribution. Furthermore, our findings provide an expectation of the downward bias due to grouping when group sizes are equal.