Maria Karlsson

Analysis of demand equations using mixtures of Tobit models

The aim of the proposal is to develop statistical methods useful for inference on systems of demand equations. Such a system describes how household expenditure (or demand) is distributed over goods and services and how the distribution varies with household income as well as household demographics. Understanding and measuring consumer behavior is essential for planning and policy making. Estimates of demand functions are necessary inputs to these kinds of analyses, as they provide the link between theoretical models of consumer behavior and their real behavior.
This project especially focuses on methods to estimate demand equations using data from Statistics Sweden's annual survey on household expenditures. The estimation is complicated by observations of households with zero expenditure on one or several goods and/or services. This is called censoring and biases standard regression estimation methods. Moreover, a system of demand equations should be estimated instead of one equation at a time since demand of a good/service is affected by the demand of other goods/services via the budget restriction.
We suggest the use of a finite mixture of multivariate Tobit models which addresses both censoring and the budget restriction. This approach will be less sensitive to distribution assumptions then earlier suggested methods. The project provides with theoretical and empirical results on the properties of the estimator, and a comparison is made with estimators earlier suggested.
Final report

PROJECT AIM AND DEVELOPMENT

The aim of the project was to develop statistical methods for analysis of demand equations using data from Statistic Sweden's Household Budget Survey (HBS). Demand equations describe the distribution of household expenditure (demand) on goods/services and how it varies with household characteristics. Estimation of demand equations are complicated by observations of households with zero expenditure on some goods/services. This problem is called censoring and as a consequence standard estimation methods do not work. Estimation is further complicated if systems of demand equations, instead of one demand equation at a time, are to be analysed.  

In the project plan, four sub-projects were described: 1) Estimation of a single demand equation under censoring, 2) Estimation of systems of demand equations under censoring, 3) Analysis of HBS data, and 4) Implementation of proposed methods in an R-package.

In empirical research, it is important that the statistical methods are robust against model miss-specification. The Tobit model [17] is a regression model often used when the response, e.g., household expenditure on a particular product, is censored. The Tobit model is estimated with a maximum likelihood (ML) estimator that is not robust against the error term not having a normal distribution.

In the project, a finite mixture model (FMM) where a finite number of Tobit models are mixed, i.e., a finite mixture of Tobit models (FMT), was proposed. It is well known that an FMM of normal distributions can, arbitrarily well, approximate any other continuous distribution [e.g., 15]. This means that the FMT estimator (a weighted ML estimator of the FMT) is potentially robust against model miss-specification.

In order to develop estimators of systems of demand equations, the FMM of multivariate regression models [16] and of SUR models [5] were modified for censored multivariate data. However, in the process of evaluating these models, we encountered numerical problems with the optimization routines used; they did not, in too many cases, converge to a (correct) solution.

Non-response is a problem in HBS and Heckman'’s [7] two-stage estimator for sample selection bias has been studied for correction for non-response.

As one of the reviewers of the project application recommended, the forth sub-project has been postponed. There are, however, R-packages for FMM, e.g., FlexMix [14,6], that perhaps could be modified for censored data too.

IMPLEMENTATION

The implementation of a research project with the purpose to develop statistical methods can, in brief, be described as follows: 1) "invent" the method, 2) investigate its properties, 3) compare, with respect to properties, the method with existing methods, and 4) study how the method works in practice. For example, when developing estimation methods, properties such as bias and precision are considered, and when developing a test, size and power of the test are of interest. These properties are proved mathematically. Monte Carlo simulations can also be used to investigate properties and to make comparisons between methods.

The implementation of this project is no exception to this general description. In Karlsson & Laitila [9] and Karlsson & Laitila [11], e.g., bias and mean square error of the FMT estimator and other estimators for censored regression models are compared by means of simulation. The FMT estimator is better than the other estimators, especially when the censoring rate is high. In Karlsson & Laitila [11], the FMT estimator is benchmarked with the ML estimator of correctly specified models, in order to investigate if the result, saying that it is enough to include two components in the FMM to get a good approximation for regression models with non-censored data [1], also holds for censored data. An extended FMT estimator with heteroskedastic variance functions in the components (FMT.vf) and a model selection test between the FMT and the new FMT.vf are also suggested. The proposed methods are illustrated with data from the 2007 HBS.

In Karlsson & Laitila [10], three different covariance matrix estimators are derived and evaluated in a simulation study. A Hessian based estimator works best. Based on Boldea & Magnus [3], this is an unexpected result.

In Karlsson & Laitila [12], two new likelihood ratio (LR) tests for the normality assumption in the Tobit model are proposed: a modification of the test in Caudill & Mixon Jr [4] and a test based on an FMT. The size and power of the tests are studied and compared to the test in Bera et al. [2], which, previously, in Holden [8], was shown to work best. One of the new LR tests works well with respect to size and high power. Overall, the new test is an alternative to Bera et al. [2] and may in some situations be easier to calculate.

Heckman's sample selection model [7] is in Laitila [13] adapted for design based inference in sample surveys with non-response. Results show the Heckman estimator to be directly applicable if model distribution assumptions are assumed for the distribution over the finite population. This result opens for consistent treatment of non-response in design based estimation of population totals and model based inference on regression models.

MOST IMPORTANT RESULTS

The main contribution is a new modelling tool for analysing censored regression models which can handle non-normality and heteroscedasticity. Karlsson & Laitila [9] already has nine citations (Google Scholar, 20 / 9-17), indicating that it meets the need for flexible modelling of censored data. The contribution also contains a new test of the normality assumption in the Tobit model; a test that can be further developed for testing the number of components in FMM. In addition, there are results on the choice of method for estimating standard errors. Together, these parts contribute to a more complete methodology that does not rely on a normality assumption.

FURTHER RESEARCH
 
A new interesting issue is the development of the LR test of normality in the Tobit model to a general test of the number of components in FMMs. The test is simple in its design and has potential for better properties than previously suggested criteria.

It is also of interest to better understand the properties of estimates based on FMT, e.g., robustness against outliers and consequences of the explanatory variables' distribution. Answers to these questions can, e.g., elucidate observed deviations from theory regarding statistics’ distributions, work out the difficulties in identifying the optimum of the likelihood function with numerical search algorithms, and provide faster optimization routines.

INTERNATIONAL DIMENSIONS

Cooperation with Professor Myong-jae Lee, Korea University, was initiated in connection with a visit from him at the start of the project period (Jan-13). The collaboration was planned within the third subproject, but theoretical results were necessary before the data analysis could begin. These results have been delayed mainly because of unforeseen numerical issues (see above).

DISSEMINATION OF RESULTS AND COOPERATION WITH THE SURROUNDING SOCIETY

Results have been published in peer-reviewed journals and presented at conferences. The project members, Maria Karlsson and Thomas Laitila, have held seminars at, e.g., The School of Technology and Business Studies, Dalarna University (2014), Department of Statistics, Uppsala University (2015), and Department of Economics, Umeå University (2016). Karlsson was also guest blogger at RJ's blog, where some of the posts were related to the project.

During the project period, Karlsson has been elected to the European Regional Committee of the Bernoulli Society and has been board member of the Swedish Statistical Society and Cramérsällskapet. Laitila has been elected to Baltic-Nordic-Ukrainian Network on Survey Statistics and is member of Statistics Sweden's scientific council.

REFERENCES

[1] Bartolucci, F, Scaccia, L (2005). The use of mixtures for dealing with non-normal regressions errors. Comput Stat Data An 48, 821-834.
[2] Bera, AK, Jarque, CM, Lee, L-F (1984). Testing the normality assumption in limited dependent variable models. Int Econ Rev 25, 1055-1063.
[3] Boldea, O, Magnus, JR (2009). Maximum likelihood estimation of the multivariate normal mixture model. JASA 104, 1539-1549.
[4] Caudill, SB, Mixon Jr, FG (2009). More on testing the normality assumption in the Tobit model. J Appl Stat 36, 1345-1352.
[5] Galimberti, G, Scardovi, E, Soffritti, G (2016). Using mixtures in seemingly unrelated linear regression models with non-normal errors. Stat Comput 26, 1025-1038.
[6] Grün, B, Leisch, F (2008). FlexMix Version 2: Fitting mixtures with concomitant variables and varying and constant parameters. J Stat Softw 28, 1-35.
[7] Heckman, J (1979). Sample selection bias as a specification error. Econometrica 47, 153-161.
[8] Holden, D (2004). Testing the normality assumption in the Tobit model. J Appl Stat 31, 521-532.
[9] Karlsson, M, Laitila, T (2014). Finite mixture modelling of censored regression models. Stat Pap 55, 627-642.
[10] Karlsson, M, Laitila, T (2017). Computation of covariance matrix estimates for the FMT estimator. Submitted.
[11] Karlsson, M, Laitila, T (2017). Finite mixture of Tobit models with heteroskedastic
Components. Submitted.
[12] Karlsson, M, Laitila, T (2017). Likelihood ratio tests of the normality assumption in the Tobit Model. Submitted.
[13] Laitila, T (2017). Heckman’s sample selection model and quasi-randomization. Manuscript.
[14] Leisch, F (2004). FlexMix: A general framework for finite mixture models and latent class regression in R. J Stat Softw 11, 1-18.
[15] McLachlan, GJ, Peel, D (2000). Finite mixture models. Wiley, Chichester.
[16] Soffritti, G, Galimberti, G (2011). Multivariate linear regression with non-normal errors: a solution based on mixture models. Stat Comput 21, 523-536.
[17] Tobin, J (1958). Estimation of relationships for limited dependent variables. Econometrica 26, 24-36.

Grant administrator
Umeå University
Reference number
P12-1086:1
Amount
SEK 2,800,000
Funding
RJ Projects
Subject
Probability Theory and Statistics
Year
2012