**Geostatistics**

Geostatistics midterm and final exams are both written. The following is a compilation of sample questions made for both exams. Try to do them or find the answers by yourself and later check against the answers posted below.

### Questions

- What steps are in cross validation procedure?
- Cross validation produces a set of residuals. Explain various ways you can use the residuals to evaluate the quality of your kriging estimates.
- Cross validation produces a set of kriging variances. Explain how you can use this output to evaluate the quality of your variogram model and estimates of uncertainty.
- What is meant by sample bias?
- Explain what is meant by clustering and outline one method of “declustering”.
- What is block kriging used for?
- What is meant by “anisotropy” in spatial correlation? Give an example of a scenario that would cause anisotropy.
- What is the assumption allows us to use variogram model to estimate the covariance between variables at different spatial locations?
- What is meant by the “hole effect”?
- Why do we need the principle of stationary and what does the assumption allow us to do?
- Explain what R
^{2}value in terms of variability or variance. - What is the significance of a liner regression R
^{2}value? - Why is kriging described as a best linear estimate?
- In what sense is kriging is “best”?
- In kriging, what is the definition of unbiased estimate?
- What does it meant by conditionally unbiased estimate in kriging?
- What is the significance of variogram range and what does range represent?
- We often have to deal with data that have a trend. What are the various options that we can consider to deal with trends?
- How is search ellipsoid used in kriging process?
- What is the property of a kriged map that leads to the need for stochastic simulations?
- What are the desired attributes of good conditional stochastic simulations?
- What are the advantages of using indicator geostatistics?
- What is the difference between estimation and simulation?
- What is the difference between kriging and cokriging?
- Briefly define simple kriging, ordinary kriging and indictor kriging.
- What does it mean by equiprobability models in stochastic simulations?
- What is residual in statistics?
- What are the properties of residual?
- Define second order stationarity?
- What is the difference between parametric and non-parametric distribution?
- What are the advantages and disadvantages of Boolean indicators?
- What is staged simulation in Boolean method?
- What is the difference between unconditional and conditional stochastic simulations?
- What is collocated cokriging?
- What is coregionalization?
- What is the difference between “hard data” and “soft data”? Provide some examples.
- What is the purpose of the LaGrange Multiplier in kriging?
- What are the three fundamental concepts that lead to the kriging equations?
- What is a nugget?
- What is semivariance?
- What is coefficient of variance?

**Credits: Based on the excellent class notes provided by, Dr. Larry Bentley and student contributors, Ariana Pumo and Richard De Hoop during Fall 2015.FAQ | Report an Error**

### Answers

- Steps in cross validation:

1. Eliminate a measurement point from the data set

2. Approximate the value of the missing point by kriging

3. Compare the value of the estimate with the measured value

4. Calculate the estimation variance with (7)

5. Put the measurement point back in the data set

6. Chose another point and go back to step 1

7. Repeat for all measurement points - There are three fundamental methods residual analysis.

– Histogram (variance, spread, how much uncertainty, smaller variance = better estimation. Frequency plots of the value of the estimation error)

– Map (spatial correlation) of residual errors to look for trends or groups of points with consistently high or low estimates

– Cross-plot (conditional bias, any strange things, heterogeneity) Estimated vs. measured values - The desirable kriging variance is 1 because it represents no bias in the kriged results. As bias increases, the value of kriging variance shifts away from 1. Additionally, if <1 (algorithm better than statistics), if >1 (kriged output is poor compared to the statistics output)
- Sample bias occurs when the systematic sampling of a location result in one area represented more than another area or systematically sample locations of highs or lows resulting statistics that are not representative of the entire population. In other words, sample bias is a bias in which a sample is collected in such a way that some members of the intended population are less likely to be included than others.
- Sampling often unevenly distributed in space due to logistical considerations such as access or sampling strategies. An uneven spatial distribution of samples “clustering” is problematic. Areas with cluster of points have greater influence on the final average value than the sparsely sampled regions.

__Method 1__: Polygonal – Area of polygon (bigger the area, the more weight and vice versa) Clustered samples (dense) = less weight because of their small polygon on influence, sparse = large polygon on influence = more weight.

__Method 2__: Cell Declustering – (weight of sample inversely proportional to number of samples that fall within the same cell) Clustered (dense = lower weight, while sparse = more weight) - Block kriging is used for estimating an average over area instead of point(kriging). The average value over a block is given by; 1. A set of measurements and 2. A model variogram. The methodology produces the same value as would be obtained by estimating the values at a series of points within the block by point kriging estimation and then taking a linear average of the estimated values. Block kriging is significantly different from point estimation by how weights calculated, and calculation of kriging variance.
- In geological settings, the most prominent form of anisotropy is a strong contrast in ranges in the (stratigraphically) vertical and horizontal directions, with the vertical semivariogram reaching the sill in a much shorter distance than the horizontal semivariogram.

– Some directions have greater ranges or more spatial correlation than other distances.

– Layered sedimentary sequence (correlation greater along bedding than across bedding, length > thickness of beds) - Second order stationarity (refer to question 28 for more information).
- Hole effect is the situation where alternating highs and lows that is periodic with the same length is observed in semi variance function.
- Stationary pertains to the model, has nothing to do with the reality of the situations. Additional samples always improve the estimation regardless of their distances from the point being estimated. But does not mean that the same is true from the perspective of reality. By using the principles of stationary, we build variogram models to explain reality.
- An R
^{2}value of 1 indicates the regression line perfectly fits the data, while an R^{2}of 0 indicates that the regression line does not fit the data at all. The latter condition means that data is more non-liner than the curve used for the analysis or the data is random. - Refer to the question 11 above.
- To be updated
- To be updated
- To be updated
- In certain locations, the kriging estimation can be too high or too low. Hence this creates conditionally bias estimates.
- To be updated
- To be updated
- Search ellipsoid is used because the validity because more uncertain and appropriateness of stationarity random function model become more doubtful as we more further away from the estimation point. Search ellipsoid ensures the data points closer to the estimation point are taken. Orientation is dictated by anisotropy pattern of spatial continuity of sample variogram.
- Kriged estimated fields/maps are smooth versions of reality and they minimize flow and transport effects due to less variability. Furthermore, kriged maps represent unrealistic high and low values and not enough heterogeneity to show geologic structures. In other words, kigged maps have less variability and greater spatial correlation than the true field.
- Attributes of a good conditional stochastic simulation:

1. Honours the histogram of the univariate data

2. Honours the measured data

3. Honours the spatial variability of the probability model (variogram)

Above attributes requires:

1. Data values, 2. Histogram, 3. Probability model (ie. variogram) - To be updated
- To be updated
- Kriging is a statistical methodology that uses data from one or more sources to interpolate information. Cokriging uses the same principles as kriging but uses a secondary-data control points to estimate the values for unknown primary data points.
- Simple kriging – The global mean is known (or can be supplied by the user) and is held constant over the entire area of interpolation.

Ordinary kriging – The local mean varies and is re-estimated on the basis of nearby (or local) data values.

Indicator kriging – Estimates the probability of a discrete attribute at each grid node (e.g., lithology, productivity) and requires a binary coding of the attribute. - The stochastic simulations honor the data or the statistical model and produce multiple possible outcomes. Hence, any one of these multiple outcomes obtained through stochastic simulations has an equal potential to be the closest model to the reality.
- Residual is the mathematical difference between the estimated value and the expected value of a calculation. In other words, the error in calculation is known as residual. On linear regression, it is the deviation statistical distance from the regression line.
- 28. Residual properties of interest;

– If r_{i}> 0, then the estimate is too large

– If r_{i}< 0, then the estimate is too small

– r_{i}= v̂_{i}– v_{i} - Second order stationarity has the following conditions:

1. Mean is same everywhere in space

2. Variogram and covariance function is the same everywhere in space

3. Spatial correlation is same everywhere

4. The points 1 and 2 allow us to build semi-variogram models

IMAGE HERE - To be updated
- Advantages and disadvantages of Boolean indicators

__Advantages__

– Treats geologic units as coherent features

– input parameters are easier to understand

– relatively fast

__Disadvantages__

– not suitable for rock properties

– difficult to integrate soft data and related information

– difficult to honour well information if the proportion of geometric shapes is high

– limited to the geometric shapes that have been designed

– difficult to generate distributions of channel shapes. - Staged simulation in Boolean method is performed based on a Boolean framework which then is populated using geostatistics using properties such as porosity, permeability, etc.
- Unconditional and conditional stochastic simulations:

– Unconditional simulation honors the statistical model, but does not produce models of random fields that match the data at measurement points.

– Conditional simulation is a simulation in which the values of the model are the same as the hard data. *Conditioned on the data *Honors the data - To be updated
- Coregionalization refers to the fact, that we must develop a statistical model for two random functions and their joint probability distribution. In other words we need a model that specifies the spatial correlation of the primary variable, the secondary and the relation between them.
- “Hard Data” – data that are direct measurements of the primary variable

“Soft Data” – measurements of the secondary variable that is correlated with the primary variable.

__Examples__

-Hard: Formation porosity measured in a borehole. Soft: Reflection amplitude that increases with increasing porosity

Primary: Formation Porosity Secondary: Reflection amplitude

-Hard: Total meq/l measured in a groundwater sample Soft: Electrical conductivity measured with an EM 31

Primary: meq/l Secondary: Electrical conductivity - LaGrange Multiplier is a mathematical trick to enforce the unbiased condition which is often used for finding the local maxima and minima of a function subject to equality constraints.
- Three fundamentals behind kriging equations are the intrinsic hypothesis, the minimum estimation variance criterion and the he unbiased criterion.
- A nugget is the height of the jump of the semivariogram at the discontinuity at the origin. It encompasses the micro-scale variations in the measurement errors.
- Semivariance is a measure of the dispersion of all observations that fall below the mean or target value of a data set. Larger semivariance value indicates a large spread of data.
- Coefficient of variance (CV) is a standardized measure of dispersion of a probability distribution or frequency distribution. Distributions that contain all positive values will have a large CV if they have a large positive tail. May also indicate some large erratic values in the dataset or how bias the spread is compared to the center.