High dimensional variable selection and covariance matrix estimation. The sample covariance matrix scm is an unbiased and efficient estimator of the covariance matrix if the space of covariance matrices is viewed as an extrinsic convex cone in r p. Regularized estimation of high dimensional covariance matrices by yilun chen a dissertation submitted in partial ful llment of the requirements for the degree of doctor of philosophy electrical engineering. Highdimensional covariance estimation provides accessible and comprehensive. High dimensional inverse covariance matrix estimation via. Note that this is also equivalent to recovering the underlying graph structure of a sparse gaussian markov random field gmrf. Estimating high dimensional covariance matrices and its. Robust covariance estimation for approximate factor models. Regularized estimation of highdimensional covariance matrices. In section 2 the problem formulation is introduced. However, the sample covariance matrix is an inappropriate estimator in high dimensional settings.
The ultra high dimensional setting where pn nis important due to many contemporary applications. Jun 27, 2014 we propose two fast covariance smoothing methods and associated software that scale up linearly with the number of observations per function. Perhaps the most natural candidate for estimating is the empirical sample covariance matrix, but this is known to behave poorly in high dimensional settings. Spatial data are encountered in a wide range of disciplines. Estimating the structure of this p by p matrix usually comes at a computational cost of op3 time and op2 memory for solving a nonsmooth logdeterminant minimization problem, thus for large p both storage and computation. The abundance of high dimensional data is one reason for the interest in the problem. Finally, we present a novel method for estimating higher moments of multivariate elliptical distributions. Matlab software for disciplined convex programming, version 2. Advances in highdimensional covariance matrix estimation. Estimating a highdimensional covariance matrix and. To achieve reliable estimation in the highdimensional setting, an effective technique is to exploit the intrinsic structure of the covariance matrix, e. Highdimensional covariance matrix estimation in approximate. The minimax upper bound is obtained by constructing a class of tapering esti. Inspired by ideas of random matrix theory, we also suggest a change of point of view when thinking ab out estimation of highdimensional vectors.
Methods for estimating sparse and large covariance matrices covariance and correlation matrices play fundamental roles in every aspect of the analysis of multivariate data collected from a variety of fields including business and economics, health care, engineering, and environmental and physical sciences. Many applications require precise estimates of highdimensional covariance matrices. Estimation of covariance matrices then deals with the question of how to approximate the actual covariance matrix on the basis of a sample from the multivariate distribution. Existing estimators typically require a good estimate of the precision matrix, which assumes strict structural assumptions on the covariance or the precision matrix when data is high dimensional. Estimation of covariance, correlation and precision. High dimensional multilevel covariance estimation and kriging. Highdimensional covariance matrix estimation in approximate factor models article pdf available in the annals of statistics 396. Robust estimation of high dimensional covariance and precision matrices by marcoavellamedina sloan school of management, massachusetts institute oftechnology, 30 memorial drive, cambridge, massachusetts 02142, u. Simple cases, where observations are complete, can be dealt with by using the sample covariance matrix. Software for computing a covariance shrinkage estimator is available in r. Testing and estimating changepoints in the covariance matrix.
High dimensional covariance estimation provides accessible and comprehensive coverage of. We examine covariance matrix estimation in the asymptotic framework that the dimensionality p tends to 1 as the sample size n increases. Estimating a high dimensional covariance matrix and its inverse, the precision matrix, is becoming a crucial problem in many applications including functional magnetic resonance imaging, analysis of gene expression arrays, risk management and portfolio allocation. In this paper, we study the problem of highdimensional covariance matrix estimation with missing observations. In this paper, we study the problem of high dimensional covariance matrix estimation with missing observations. Regularized estimation of precision matrix for high. It has many real applications including statistical tests and information theory. Many applications of modern science involve a large number of parameters. Minimax rates of convergence for estimating several classes of structured covariance and precision matrices, including bandable, toeplitz, sparse, and sparse spiked covariance matrices as well as. We examine covariance matrix estimation in the asymptotic framework. Our approach transforms high dimensional illconditioned covariance matrices to numerically stable multilevel covariance matrices without compromising accuracy. Highdimensional sparse inverse covariance estimation using greedy methods ali jalali, chris johnson, pradeep ravikumar abstract. High dimensional covariance matrix estimation using a factor model.
The standard estimator is the sample covariance matrix, which is conceptually simple, fast to compute and has favorable properties in the limit of in nitely many observations. High dimensional inverse covariance matrix estimation via linear programming. Highdimensional sparse inverse covariance estimation using. High dimensionality comparable to sample size is common in many statistical problems. Classical multivariate statistics are based on the assumption that the number of parameters is fixed and the number of observations is large. In this paper, we describe and study a class of linear shrinkage estimators of the covariance matrix that is wellsuited for high dimensional matrices, has a rather wide domain of applicability, and is rooted into the gaussian conjugate framework of chen 1979. Rate optimal estimation for high dimensional spatial. Robust high dimensional volatility matrix estimation for high frequency factor model. Highdimensional covariance estimation provides accessible and comprehensive coverage of. I it is invertible and extensively used in linear models and time series analysis. In this paper we consider the task of estimating the nonzero pattern of the sparse inverse covariance matrix of a zeromean gaussian random vector from a set of iid samples. Popular regularization methods of directly exploiting sparsity are not directly applicable to many financial problems. Estimating covariance or precision matrix is more challenging in the multivariate case as the positivedefiniteness constraint on the covariance matrix and high dimensionality where now the number of parameters grows quadratically with the number of outcomes and time points.
The assumed framework allows for a large class of multivariate linear processes including vector autoregressive moving average varma models of growing dimension and spiked covariance models. The dissertation makes contributions in two main areas of covariance estimation. With the increasing complex data model being investigated, for example in climate sciencebenestad et al. Most available methods and software cannot smooth covariance matrices of dimension \j500\. High dimensional covariance matrix estimation using a factor. Inverse covariance estimation for high dimensional data in linear time and space.
While under the highdimensional covariance matrices estimation framework. We propose a simple procedure computationally tractable in high dimension and that does not require imputation of the missing data. Spectrum estimation for large dimensional covariance. In this paper, we consider the speci c high dimensional problem of recovering the covariance matrix of a zeromean gaussian random vector, under the low dimensional structural constraint of sparsity of the inverse covariance, or concentration matrix. We propose a novel framework to first estimate the initial joint covariance matrix of the observed data and the factors, and then use it to recover the covariance matrix of the observed data. Estimating a highdimensional covariance matrix and its inverse, the precision matrix, is becoming a crucial problem in many applications including functional magnetic resonance imaging, analysis of gene expression arrays, risk management and portfolio allocation.
For sparsity regularization, the lasso penalty is popular and convenient due to its convexity but has a bias problem. Estimating a highdimensional covariance matrix and its inverse, the precision matrix, is becoming a crucial problem in many applications including functional magnetic resonance imaging, analysis of gene expression arrays, risk. Discussion of large covariance estimation by thresholding principal orthogonal complements. The abundance of highdimensional data is one reason for the interest in the problem. Mar 27, 2018 the following proposition lays the foundations for the analysis of highdimensional covariance or precision matrix estimation with infinite kurtosis. Problem 3 can therefore be reformulated as a linear program just like the.
Fast and positive definite estimation of large covariance. Fast covariance estimation for highdimensional functional. Covarianceandprecisionmatricesprovide a useful summary of such structure, yet the performance of popular matrix estimators typically hinges upon a subgaussianity assumption. Taking advantage of the connection between multivariate linear regression and entries of the inverse covariance matrix, we propose an estimating procedure that can effectively exploit such sparsity. High dimensional covariance estimation focuses on the methodologies based on shrinkage, thresholding, and penalized likelihood with applications to gaussian graphical models, prediction, and meanvariance portfolio management. Dissertation, department of mathematics, princeton university. In statistics, sometimes the covariance matrix of a multivariate random variable is not known but. By jianqing fan, yuan liao and martina mincheva princeton. The limitations of the sample covariance matrix are discussed. This paper considers the problem of estimating a high dimensional inverse covariance matrix that can be well approximated by sparse matrices. I its eigenvalues are well behaved and good estimators of their population counterparts. Highdimensional covariance estimation by minimizing. However, with the increasing abundance of high dimensional datasets, the fact that the number of parameters to estimate grows with the square of the dimension suggests that it is important to have robust alternatives to the standard sample covariance matrix estimator. Another relation can be made to the method by rutimann.
Systems in the university of michigan 2011 doctoral committee. Due to the statistical and computational challenges with high dimensionality, little work has been proposed in the literature for estimating the determinant of highdimensional. In this paper, we describe and study a class of linear shrinkage estimators of the covariance matrix that is wellsuited for high dimensional matrices, has a rather wide domain of applicability. Robust estimation of highdimensional covariance and. Estimating structured highdimensional covariance and. Nuclearnorm penalization and optimal rates for noisy lowrank matrix completion koltchinskii, vladimir, lounici, karim, and tsybakov, alexandre b. Another reason is the ubiquity of the covariance matrix in data analysis tools. The minimax risk of estimating the covariance matrix over the class p. Estimating highdimensional covariance matrices is intrinsically challenging.
Aggregation of nonparametric estimators for volatility matrix. In this paper, we study robust covariance estimation under the approximate factor model with observed factors. Journal of american statistical association, 1, 12681283. Rp, we study the problem of estimating both its covariance matrix, and its inverse covariance or concentration matrix. Covariance and precision matrices play a central role in summarizing linear relationships among variables. Sample covariance matrix estimator low dimensional setting. Taking advantage of the connection between multivariate linear regression and entries of the inverse covariance matrix, we propose an estimating procedure that can effectively exploit such. Highdimensional covariance estimation based on gaussian. We propose two fast covariance smoothing methods and associated software that scale up linearly with the number of observations per function. The structure of gaussian graphical models is directly connected to the sparsity of its pxp inverse covariance matrix. By jianqing fan, yingying fan and jinchi lv princeton university august 12, 2006 high dimensionality comparable to sample size is common in many statistical problems.
High dimensional inverse covariance matrix estimation via linear. In many cases, the number of parameters, p, exceeds the number of observations, n. We examine covariance matrix estimation in the asymptotic. Robust highdimensional volatility matrix estimation for highfrequency factor model. Random matrix theory predicts that in this context, the eigenvalues of the sample covariance matrix are not good estimators of the eigenvalues of the population covariance. High dimensional covariance matrix estimation using a. Abstractthe determinant of the covariance matrix for highdimensional data plays an important role in statistical inference and decision. Estimating covariance or precision matrix is more challenging in the multivariate case as the positivedefiniteness constraint on the covariance matrix and highdimensionality where now the number of parameters grows quadratically with the number of outcomes and time points. Covariance estimation for high dimensional data vectors using. Large scale inverse covariance estimation center for big. Highdimensional data are often most plausibly generated from distributions with complex structureandleptokurtosisinsomeorallcomponents. Estimating covariance structure in high dimensions ashwini maurya michigan state university east lansing, mi, usa thesis director. Sparse covariance matrix estimation in highdimensional deconvolution belomestny, denis, trabs, mathias, and tsybakov, alexandre b.
However, in the high dimensional setting, including too many or irrelevant controlling variables may distort the results. In statistics, sometimes the covariance matrix of a multivariate random variable is not known but has to be estimated. An overview on the estimation of large covariance and. Estimating covariance matrices is an important part of portfolio selection, risk management, and asset pricing. Clifford lam department of statistics, london school of economics and political science. The book relies heavily on regressionbased ideas and interpretations to connect and unify many existing methods and. Estimating high dimensional covariance matrices and its applications. Estimating structured highdimensional covariance and precision matrices. Law of log determinant of sample covariance matrix and. This estimator shrinks stoward the covariance matrix implies by the capm model. A state space model approach to integrated covariance matrix estimation with high frequency data. High dimensional covariance estimation by minimizing l1penalized logdeterminant divergence pradeep ravikumar, martin wainwright, bin yu, garvesh raskutti abstract.
Pdf highdimensional covariance matrix estimation in. Optimal rates of convergence for covariance matrix estimation. Minimax rates of convergence for estimating several classes of. It is a common practice in high dimensional statistical inference, including compressed sensing and covariance matrix estimation, to impose structural assumption such as sparsity on the target in order to e ectively estimate the quantity of. The following proposition lays the foundations for the analysis of highdimensional covariance or precision matrix estimation with infinite kurtosis. Estimating structured high dimensional covariance and precision matrices.
For the semidefinite program 24, yuan and lin 2007 solved the problem using interior. Beijing university of chinese medicine a thesis submitted for the degree of doctor of philosophy department of statistics and applied probability national university of singapore 2017 supervisor. Sparse estimation of highdimensional covariance matrices. High dimensional covariance matrix estimation lse statistics. Battey department of mathematics, imperial college london, 545 huxley building. Our focus is on estimating these matrices when their dimension is large relative to the number of observations. Rp, estimate both its covariance matrix, and its inverse covariance or concentration matrix.
For example, in portfolio allocation and risk management, the number of stocks p, which is typically of the same order as the sample size n, can well be in the order of hundreds. High dimensional covariance estimation by minimizing l1. Covariance estimation for high dimensional data vectors using the sparse matrix transform guangzhi cao charles a. The variancecovariance matrix plays a central role in the inferential theories of highdimensional factor models in finance and economics. Highdimensional sparse inverse covariance estimation. High dimensional sparse inverse covariance estimation using greedy methods ali jalali, chris johnson, pradeep ravikumar abstract. Many applications require precise estimates of high dimensional covariance matrices.
1017 1098 497 71 1181 38 48 467 1559 699 382 914 421 917 1063 854 434 314 484 1301 400 1126 532 81 482 1163 344 688 6 1171 1257