Working Paper 2023-642
Poverty prediction models are used by economists to address missing data issues in a variety of contexts such as poverty profiling, targeting with proxy-means tests, cross-survey imputations such as poverty mapping, or vulnerability analyses. Based on the models used by this literature, this paper conducts an experiment by artificially corrupting data with different patterns and shares of missing incomes. It then compares the capacity of classic econometric and machine learning models to predict poverty under these different scenarios. It finds that the quality of predictions and the choice of the optimal prediction model are dependent on the distribution of observed and unobserved incomes, the poverty line, the choice of objective function and policy preferences, and various other modeling choices. Logistic and random forest models are found to be more robust than other models to variations in these features, but no model invariably outperforms all others. The paper concludes with some reflections on the use of these models for predicting poverty.
Authors: Paolo Verme.