Working Paper 2022-616
The measurement of income inequality is affected by missing observations, especially if they are concentrated on the tails of an income distribution. This paper conducts an experiment to test how the different correction methods proposed by the statistical, econometric and machine learning literature address measurement biases of inequality due to item non response. We take a baseline survey and artificially corrupt the data employing several alternative non-linear functions that simulate patterns of income non-response, and show how biased inequality statistics can be when item non-responses are ignored. The comparative assessment of correction methods indicates that most methods are able to partially correct for missing data biases. Sample reweighting based on probabilities on non-response produces inequality estimates quite close to true values in most simulated missing data patterns. Matching and Pareto corrections can also be effective to correct for selected missing data patterns. Other methods, such as Single and Multiple imputations and Machine Learning methods are less effective. A final discussion provides some elements that help explaining these findings.
Authors: Paolo Brunori, Pedro Salas-Rojo, Paolo Verme.