Flexible hierarchical Bayesian modeling of massive data is challenging due to poorly scaling computations in large sample size. This work is motivated by spatial process models for analyzing geostatistical data, which typically entail computations that become prohibitive as the number of spatial locations becomes large. We propose a divide-and-conquer strategy called “Distributed Kriging” (DISK) within the Bayesian paradigm to achieve massive scalability for spatial process models. We partition the data into a large number of subsets, apply a readily available Bayesian spatial process model on each subset in parallel, and combine the posterior distributions estimated across all the subsets into a pseudo posterior distribution. The combined posterior is used for inference on the model parameters and the residual spatial surface, as well as predicting the responses at arbitrary locations.
Theoretically, we provide the posterior convergence rate of DISK for estimating the true residual spatial surface, and explore a new bias-variance tradeoff phenomenon related to the data partition. A variety of simulations and a geostatistical analysis of the Pacific Ocean sea surface temperature data validate our theoretical results, where DISK outperforms state-of-the-art competitors in inference on the model parameters, prediction of the spatial surface, and computational scalability.