Publications

You can also find my articles on my Google Scholar profile.

Under Submission


A Complete Error Analysis of the K-fold Cross Validation for Regularized Empirical Risk Minimization in High Dimensions.

Published in Working in progress., 2025

This paper studies the error of k-fold cross validation in estimating the out-of-sample error of regularized empirical risk minimization (R-ERM) under proportional high dimensional settings, where the number of observations $n$ and the number of parameters $p$ both go to infinity proportionally. We provide a stochastic bound for the MSE of k-CV under mild assumptions. In contrast with common belief that the MSE decreases when the number of folds $k$ increases, we found that it actually stops decreasing anymore when $k$ exceeds a certain boundary, when $n,p$ are fixed. The manuscript will be finished and submitted soon.

A Scalable Formula for the Moments of a Family of Self-Normalized Statistics

Published in Submitted to the Journal of Applied Probability, 1900

Following the student t-statistic, normalization has been a widely used method in statistic and other disciplines including economics, ecology and machine learning. We focus on statistics taking the form of a ratio over (some power of) the sample mean, the probabilistic features of which remain unknown. We develop a unified formula for the moments of these self-normalized statistics with non-negative observations, yielding closed-form expressions for several important cases. Moreover, the complexity of our formula doesn’t scale with the sample size $n$. Our theoretical findings, supported by extensive numerical experiments, reveal novel insights into their bias and variance, and we propose a debiasing method illustrated with applications such as the odds ratio, Gini coefficient and squared coefficient of variation.

Journal Articles


Certified Machine Unlearning under High Dimensional Regime

Published in Journal of Machine Learning Research (JMLR), 2025

Machine unlearning focuses on the computationally efficient removal of specific training data from trained models, ensuring that the influence of forgotten data is effectively eliminated without the need for full retraining. Despite advances in low-dimensional settings, where the number of parameters p is much smaller than the sample size n, extending similar theoretical guarantees to high-dimensional regimes remains challenging. We study an unlearning algorithm that starts from the original model parameters and performs a theory-guided sequence of Newton steps $ T \in { 1,2}$. After this update, carefully scaled isotropic Laplacian noise is added to the estimate to ensure that any (potential) residual influence of forget data is completely removed. We show that when both $ n, p \to \infty $ with a fixed ratio $ n/p $, significant theoretical and computational obstacles arise due to the interplay between the complexity of the model and the finite signal-to-noise ratio. Finally, we show that, unlike in low-dimensional settings, a single Newton step is insufficient for effective unlearning in high-dimensional problems—however, two steps are enough to achieve the desired certifiebility. We provide numerical experiments to support the theoretical claims of the paper.

Unbiased Estimation of the Gini Coefficient.

Published in Statistics and Probability Letters, 2025

This paper establishes the unbisedness of the classical Gini coefficient for Gamma distribution, with applications to data grouping.

Recommended citation: Baydil, A., de la Peña, V., Zou, H. and Yao, H. (2025). "Unbiased Estimation of the Gini Coefficient." Prob. Stats. Letters. 222:110376.
Download Paper

Prediction and Estimation of Random Variables with Infinite Mean or Variance

Published in Communications in Statistical - Theory and Methods, 2024

An estimator for distributions with infinite mean or variance using transformation, and the construction of confidence intervals.

Recommended citation: de la Peña, V., Gzyl, H., Mayoral, S., Zou, H., and Alemayehu, D. (2009). "Prediction and Estimation of Random Variables with Infinite Mean or Variance." Commun.Stat-Theory and Methods. 1(1).
Download Paper

Conference Papers


Newfluence: Boosting Model Interpretability and Understanding in High Dimensions

Published in ICML2025, Workshop: Assessing World Models, Methods and Metrics for Evaluating Understanding, 2025

The increasing complexity of machine learning (ML) and artificial intelligence (AI) models has created a pressing need for tools that help scientists, engineers, and policymakers interpret and refine model decisions and predictions. Influence functions, originating from robust statistics, have emerged as a popular approach for this purpose. However, the heuristic foundations of influence functions rely on low-dimensional assumptions where the number of parameters p is much smaller than the number of observations n. In contrast, modern AI models often operate in high-dimensional regimes with large p, challenging these assumptions. In this paper, we examine the accuracy of influence functions in high-dimensional settings. Our theoretical and empirical analyses reveal that influence functions cannot reliably fulfill their intended purpose. We then introduce an alternative approximation, called Newfluence, that maintains similar computational efficiency while offering significantly improved accuracy. Newfluence is expected to provide more accurate insights than many existing methods for interpreting complex AI models and diagnosing their issues. Moreover, the high-dimensional framework we develop in this paper can also be applied to analyze other popular techniques, such as Shapley values.

Download Paper

Approximate Leave-one-out Cross Validation for Regression with L1 Regularizers

Published in Proceedings of The 27th International Conference on Artificial Intelligence and Statistics, 2024

Selected for oral presentation.

Recommended citation: Auddy, A., Zou, H., Rahnama Rad, K. and Maleki, A. (2024). "Approximate Leave-one-out Cross Validation for Regression with L1 Regularizers." Proceedings of The 27th International Conference on AISTATS. 238:2377-2385.
Download Paper