This monograph is an attempt to provide a mathematical treatment for the procedure known as kriging, which is a popular method for interpolating spatial data. Kriging is superficially just a special case of optimal linear prediction applied to random processes in space or random fields. However, optimal linear prediction requires knowing the covariance structure of the random field. When, as is generally the case in practice, the covariance structure is unknown, what is usually done is to estimate this covariance structure using the same data that will be used for interpolation. The properties of interpolants based on an estimated covariance structure are not well understood and it is common practice to ignore the effect of the uncertainty in the covariance structure on subsequent predictions. My goal in this monograph is to develop the mathematical tools that I believe are necessary to provide a satisfactory theory of interpolation when the covariance structure is at least partially unknown. This work uses these tools to prove a number of results, many of them new, that provide some insight into the problem of interpolating with an unknown covariance structure. However, I am unable to provide a complete mathematical treatment of kriging with estimated covariance structures. One of my hopes in writing this book is that it will spur other researchers to take on some of the unresolved problems raised here.
I would like to give a bit of personal history to help explain my devotion to the mathematical approach to kriging I take here. It has long been recognized that when interpolating observations from a random field possessing a semivariogram, the behavior of the semivariogram near the origin plays a crucial role (see, for example, Matheron (1971, Section 2-5)). In
the mid 1980s I was seeking a way to obtain an asymptotic theory to support this general understanding. The asymptotic framework I had in mind was to take more and more observations in a fixed and bounded region of space, which I call fixed-domain asymptotics. Using this approach, I suspected that it should generally be the case that only the behavior of the semivariogram near the origin matters asymptotically for determining the properties of kriging predictors. Unfortunately, I had no idea how to prove such a result except in a few very special cases. However, I did know of an example in which behavior away from the origin of the semivariogram could have an asymptotically nonnegligible impact on the properties of kriging predictors. Specifically, as described in 3.5, the semivariograms corresponding to exponential and triangular autocovariance functions have the same behavior near the origin, but optimal linear interpolants under the two models do not necessarily have similar asymptotic behavior. I believed that there should be some mathematical formulation of the problem that would exclude the "pathological" triangular autocovariance function and would allow me to obtain a general theorem on asymptotic properties of kriging predictors. Soon after arriving at the University of Chicago in the fall of 1985, I was browsing through the library and happened upon Gaussian Random Processes by Ibragimov and Rozanov (1978). I leafed through the book and my initial reaction was to dismiss it as being too difficult for me to read and in any case irrelevant to my research interests. Fortunately, sitting among all the lemmas and theorems and corollaries in this book was a single figure on page 100 showing plots of an exponential and triangular autocovariance function. The surrounding text explained how Gaussian processes corresponding to these two autocovariance functions could have orthogonal measures, which did not make an immediate impression on me. However, the figure showing the two autocovariance functions stuck in my mind and the next day I went back to the library and checked out the book. I soon recognized that equivalence and orthogonality of Gaussian measures was the key mathematical concept I needed to prove results connecting the behavior of the semivariogram at the origin to the properties of kriging predictors. Having devoted a great amount of effort to this topic in subsequent years, I am now more firmly convinced than ever that the confluence of fixed-domain asymptotics and equivalence and orthogonality of Gaussian measures provides the best mathematical approach for the study of kriging based on estimated covariance structures. I would like to thank Ibragimov and Rozanov for including that single figure in their work.

This monograph represents a synthesis of my present understanding of the connections between the behavior of semivariograms at the origin, the properties of kriging predictors and the equivalence and orthogonality of Gaussian measures. Without an understanding of these connections, I believe it is not possible to develop a full appreciation of kriging. Although there is a lot of mathematics here, I frequently discuss the repercussions of the mathematical results on the practice of kriging. Readers whose main
interests are in the practice of kriging should consider skipping most of the proofs on a first reading and focus on the statements of results and the related discussions. Readers who find even the statements of the theorems difficult to digest should carefully study the numerical results in Chapters 3 and 6 before concluding that they can ignore the implications of this work. For those readers who do plan to study at least some of the proofs, a background in probability theory at the level of, say, Billingsley (1995) and some familiarity with Fourier analysis and Hilbert spaces should be sufficient. The necessary second-order theory of random fields is developed in Chapter 2 and results on equivalence and orthogonality of Gaussian measures in Chapter 4. Section 1.3 provides a brief summary of the essential results on Hilbert spaces needed here.

In selecting topics for inclusion, I have tried to stick to topics pertinent to kriging about which I felt I had something worthwhile to say. As a consequence, for example, there is little here about nonlinear prediction and nothing about estimation for non-Gaussian processes, despite the importance of these problems. In addition, no mention is made of splines as a way of interpolating spatial data, even though splines and kriging are closely related and an extensive literature exists on the use of splines in statistics (Wahba 1990). Thus, this monograph is not a comprehensive guide to statistical approaches to spatial interpolation. Part I of Cressie (1993) comes much closer to providing a broad overview of kriging.

This work is quite critical of some aspects of how kriging is commonly practiced at present. In particular, I criticize some frequently used classes of models for semivariograms and describe ways in which empirical semivariograms can be a misleading tool for making inferences about semivariograms. Some of this criticism is based on considering what happens when the underlying random field is differentiable and measurement errors are negligible. In some areas of application, nondifferentiable random fields and substantial measurement errors may be common, in which case, one could argue that my criticisms are not so relevant to those areas. However, what I am seeking to accomplish here is not to put forward a set of methodologies that will be sufficient in some circumscribed set of applications, but to suggest a general framework for thinking about kriging that makes sense no matter how smooth or rough is the underlying random field and whether there is nonnegligible measurement error. Furthermore, I contend that the common assumption that the semivariogram of the underlying random field behaves linearly in a neighborhood of the origin (which implies the random field is not differentiable), is often made out of habit or ignorance and not because it is justified.

For those who want to know what is new in this monograph, I provide a summary here. All of 3.6 and 3.7, which study the behavior of predictions with evenly spaced observations in one dimension as the spacing between neighboring observations tends to 0, are new. Section 4.3 mixes old and new results on the asymptotic optimality of best linear predictors under
an incorrect model. Theorem 10 in 4.3, which shows such results apply to triangular arrays of observations and not just a sequence of observations, is new. So are Corollaries 9 and 13, which extend these results to cases in which observations include measurement error of known variance. The quantitative formulations of Jeffreys's law in 4.4 and the plausible approximations in 6.8 giving asymptotic frequentist versions of Jeffreys's law are published here for the first time, although some of these ideas appeared in an NSF grant proposal of mine many years ago. Section 6.3, which points out an important error in Matheron (1971), is new, as is 6.7 on the asymptotic behavior of the Fisher information matrix for a periodic version of the Matern model. Finally, the extensive numerical results in 3.5, 6.6 and 3.8 and the simulated example in 6.9 are new.

This work grew out of notes for a quarter-long graduate class in spatial statistics I have taught sporadically at the University of Chicago. However, this book now covers many more topics than could reasonably be addressed in a quarter or even a semester for all but the most highly prepared students. It would be a mistake not to get to Chapter 6, which has a much greater focus on practical aspects of kriging than the preceding chapters. I would recommend not skipping any sections entirely but instead judicially omitting proofs of some of the more technical results. The proofs in 3.6 and 6.7 depend critically on evenly spaced observations and do not provide much statistical insight; they are good candidates for omission. Other candidates for omission include the proofs of Theorem 1 and Theorems 10-12 in Chapter 4 and all proofs in 5.3 and 5.4. There are exercises at the end of most sections of highly varying difficulty. Many ask the reader to fill in details of proofs. Others consider special cases of more general results or address points not raised in the text. Several ask the reader to do numerical calculations similar to those done in the text. All numerical work reported on here, unless noted otherwise, was done in S-Plus.

There are many people to thank for their help with this work. Terry Speed pointed out the connection between my work and Jeffreys's law (see 4.4) and Wing Wong formulated the Bayesian version of this law described in 4.4. Mark Handcock calculated the predictive densities given in 6.10 using programs reported on in Handcock and Wallis (1994). Numerous people have read parts of the text and provided valuable feedback including Stephen Stigler, Mark Handcock, Jian Zhang, Seongjoo Song, Zhengyuan Zhu, Ji Meng Loh, and several anonymous reviewers. Michael Wichura provided frequent and invaluable advice on using TEX; all figures in this text were produced using his PICTEX macros (Wichura 1987). Mitzi Nakatsuka typed the first draft of much of this work; her expertise and dedication are gratefully acknowledged. Finally, I would like to gratefully acknowledge the support of the National Science Foundation (most recently, through NSF Grant DMS 95-04470) for supporting my research on kriging throughout my research career.

I intend to maintain a Web page containing comments and corrections regarding this book. This page can be reached by clicking on the book's title in my home page http://galton.uchicago.edu/faculty/stein.html.

Chicago, Illinois

December 1998