Statistically Valid Interpolation with R

This example uses R to spatially interpolate an input dataset and then plot it in Maptitude as both a point layer and a gridded layer.

R is a popular scripting language for statistical computing and data mining. Whilst lacking the general purpose scripting abilities of Python, it has many built-in features that aid with data cleaning and analysis. It also has a large library of statistical processing packages. These include the sp (spatial processing) and gstat (geospatial statistics) packages which we use below. We also use the rgdal package for coordinate transformations, and RDCOMClient for COM communication with Maptitude.

Microsoft recently released their own version of R (Microsoft R Open) and their R Tools for Visual Studio. Both were used for this article. The Winwaed Blog has an article describing how to use these with RDCOMClient to communicate with a COM server.

Maptitude is already capable of geospatial interpolation and gridding of arbitrary point locations. This is through the three step process of triangulation, interpolation within each triangle, followed by smoothing. This can be effective and is quick. However the basic algorithm cannot extrapolate beyond the input data’s convex hull, and there is no measure of the estimated errors (or residuals) of the interpolation.

In this example we use the gstat package in R to krige an input point dataset. Kriging has the advantage that it can calculate the residuals which estimate the errors of the final model. There are different kinds of kriging, but we will use ordinary kriging. gstat also supports universal kriging which can interpolate a variable which has a known spatial trend. Another popular geospatial interpolation algorithm is that of Inverse Distance Weighting (IDW), and this is also supported by gstat.

For this example, we are using 2013 USGS well data for the Ogwalla Aquifer (data downloads are at http://ne.water.usgs.gov/ogw/hpwlms/data.html ). This is a major aquifer for the High Plains of the USA, and its long-term depletion is of concern. Rather than aquifer thickness, the data records the depth to the top of the aquifer. The USGS’s own maps use this data combined with a historic thickness measurement, to derive a modern aquifer thickness and hence track depletion over time. Unfortunately the historic measurements are not available, but a simple plot of depth is sufficient for this example. It is also tempting to combine the depth with altitude, to give a “Height of aquifer top above sea level”, but altitude varies much more than depth-to-aquifer.

Before we get started with the actual script, we need to install some R packages (libraries):

install.packages("devtools")
install.packages("RDCOMClient")

install.packages("sp")
install.packages("gstat")

install.packages("RColorBrewer")   # optional, useful for diagnostic plots
install.packages("plotrix")        # optional, useful for diagnostic plots

install.packages("rgdal")    # May require ogr/gdal and proj4 installing first (eg. on Linux)

Next is the main R script. This works with an existing Maptitude instance, but loads an existing map. For this sample, this is just a map of the US, but it could contain your own data if you wished. By starting Maptitude yourself, you ensure Maptitude is visible throughout processing. Here is the script:

library(devtools)
library(RDCOMClient)

library(sp)
library(gstat)

library(rgdal) # used for coord transforms

library(RColorBrewer)
library(plotrix)

# File locations and names for layers and themes
mapFileName <- "C:\\Data\\howto\\maptitude\\aquifer\\aquifermap.map"
layer_name <- "OgallaPoints"
layer_nameDB <- "OgallaPointsDB"
grid_layer_name <- "AquiferGrid"
geogFileName <- "C:\\Data\\howto\\maptitude\\aquifer\\OgallaPoints.dbd"
dbFileName <- "C:\\Data\\howto\\maptitude\\aquifer\\OgallaPointsDB.ffa"
gridFileName <- "C:\\Data\\howto\\maptitude\\aquifer\\aquifergrid.dbd"
grid_theme_name <- "OgallaGridPointTheme"

# Start our Maptitude connection and open the Map file
mapt <- COMCreate("Maptitude.AutomationServer")
mapt$Function("OpenMap", mapFileName, list("Auto Project", "True"))
mapname <- mapt$Function("GetMap")

# Delete any existing aquifer data
layers <- mapt$Function("GetLayerNames")
if (layer_name %in% layers) {
    mapt$Function("DropLayer", mapname, layer_name)
    mapt$Function("DeleteDatabase", geogFileName)
    mapt$Function("DropLayer",gridFileName, grid_layer_name)
}


# Load the Ogalla Aquifer data into memory and Krige it
# Data is downloaded from: http://ne.water.usgs.gov/ogw/hpwlms/data.html

# First Define Some Coordinate Systems

# Geographic, NAD83, GRS80 (as used by the data and Maptitude)
geoNAD83_def <- CRS("+proj=longlat +datum=NAD83 +ellps=GRS80") # +towgs84=0,0,0")

#EPSG:102003 USA_Contiguous_Albers_Equal_Area_Conic  (used for Kriging)
albers_def <- CRS("+proj=aea +lat_1=29.5 +lat_2=45.5 +lat_0=37.5 +lon_0=-96 +x_0=0 +y_0=0 +datum=NAD83 +ellps=GRS80 +units=m +no_defs")

# Load the aquifer data
myframe <- read.table("C:\\Data\\howto\\maptitude\\aquifer\\MOF.used.wl13.all.txt", sep = "\t", header = TRUE, skip = 66, na.strings = c("-99", "-999"), quote = '"')

# Depth to top of water is in the lev_va_ft field => only use points where this field is set
dframe <- myframe[!(is.na(myframe$lev_va_ft)),]

# convert the points to a geographic (gstat) format.
# Ie. define the coordinates and the projection system
coordinates(dframe) <- cbind(dframe$long_dd_NAD83, dframe$lat_dd_NAD83)
proj4string(dframe) <- geoNAD83_def

# Get the data extents. We use this to define out grid size
bb <- bbox(dframe)
x.range <- c(bb[1, 1], bb[1, 2])
y.range <- c(bb[2, 1], bb[2, 2])

# create the required grid in NAD83 space; 0.1deg increment
# we will also use this information when we create the Maptitude DEM
grd <- expand.grid(lng = seq(from = x.range[1], to = x.range[2], by = 0.1), lat = seq(from = y.range[1], to = y.range[2], by = 0.1))
coordinates(grd) <- cbind(grd$lng, grd$lat)
proj4string(grd) <- geoNAD83_def
num_lng = length(seq(from = x.range[1], to = x.range[2], by = 0.1) )
num_lat = length(seq(from = y.range[1], to = y.range[2], by = 0.1))

# transform the grid to albers coords
# (geographic coords are non-linear so tend to be poor for interpolation)
grd_albers <- spTransform(grd, albers_def)

# transform the input data to albers coords
in_df <- spTransform(dframe, albers_def)

# remove duplicate input locations (these break the kriging)
in_df <- in_df[ - zerodist(in_df)[, 1],]

# This is how we use the IDW (Inverse Distance Weight)  Algorithm
#idw.out <- idw(formula = lev_va_ft ~ 1, locations = in_df, newdata = grd_albers)

# Fit a variogram for Ordinary Kriging
# gstat supports multiple functions for the variogram, and we can specify a list to try
v <- variogram(lev_va_ft ~ 1, in_df)
v.fit <- fit.variogram(v, vgm(c("Sph", "Exp", "Wav", "Gau", "Exc")))

# Perform the Ordinary Kriging using this variogram
krig.out <- krige(formula = lev_va_ft ~ 1, locations = in_df, newdata = grd_albers, v.fit)

# Transform the results back to geog NAD83
out_nad <- spTransform(krig.out, geoNAD83_def)

# Diagnostic plot (if required)
#plot(out_nad, col = plotrix:::rescale(out_nad$var1.pred, c(1, 6)), pch = 19)

# Calculate and display the residuals (ie. error estimates)
#cv155 <- krige.cv(formula = lev_va_ft ~ 1, in_df, v.fit, nfold = 5, verbose = FALSE)
#summary(cv155)

# Create the new Maptitude layer
mapt$Function("CreateDatabase", geogFileName, "Point", list(list("Layer Name", layer_name)))

ly <- mapt$Function("AddLayer", mapname, layer_name, geogFileName, layer_name, list(list("Shared", "True")))

# The layer will be joined to a table with the lev_va_ft (depth) field
field_defs <- list(list("ID", "Integer", 8L, 0L, "Yes"),
                  list("lev_va_ft", "Real", 10L, 4L, "No")
                  )

view_name <- mapt$Function("CreateTable", layer_nameDB, dbFileName, "FFA", field_defs)
mapt$Function("SetLayer", ly)
mapt$RunUIMacro("apply_method", "gis_ui", "G30 new layer default settings", ly)

# write the data out. Use the frame's row number as the Maptitude ID 
for (pp in 1:nrow(out_nad)) {
    cd <- mapt$Function("Coord", as.integer(out_nad$coords.x1[pp] * 1000000), as.integer(out_nad$coords.x2[pp] * 1000000))
    new_id <- mapt$Function("AddPoint", cd, pp) # For literals, ID must be forced to integer with L suffix

    recval <- list(list("ID", new_id), list("lev_va_ft", out_nad$var1.pred[pp]))
    mapt$Function("AddRecord", view_name, recval)
}

mapt$Function("SetLayer", ly)

# Join the table and the layer

joined_name <- paste(ly, view_name, sep = ".")
db_id <- paste("[", view_name, "].ID", sep = "")
ly_id <- paste("[", ly, "].ID", sep = "")
new_view <- mapt$Function("JoinViews", joined_name, ly_id, db_id, list(list( "I", 0L)) )

# Triangulate these points
tin <- mapt$Function("CreateTriangulation", ly, list(list("Smooth", 0L), list("Field", "lev_va_ft"), list("Units", "Feet")))

# Create a gridded layer from the triangulation
layer_scope <- mapt$Function("GetLayerScope", ly)
mapt$Function("GenerateDEMGrid", tin, gridFileName, layer_scope, num_lng, num_lat, list(list("Layer Name", grid_layer_name), list("CDF", "No")))

# Add the gridded layer to the map
grid_ly <- mapt$Function("AddLayer", mapname, grid_layer_name, gridFileName, grid_layer_name)

# Create a sample colored dot theme for the grid layer (you can also contour it or create a surface)
mapt$Function("SetIcon", paste(grid_ly, "|", sep = ""), "Font Character", "Caliper Cartographic|10", 36L)

col1 <- mapt$Function("ColorRGB", 65000L, 0L,0L)
col2 <- mapt$Function("ColorRGB", 0L, 32500L, 0L)
colpal <- mapt$Function("GeneratePalette", col1, col2, 20L, list(list("method", "HSV"), list("spiral", "FORWARD")))

theme_name <- mapt$Function("CreateTheme", grid_theme_name, paste(grid_ly,"Value",sep="."), "Equal Steps", 20L, list(list("Pretty Values", "True"), list("Sort Order", "Descending")))
mapt$Function("SetThemeIconColors", theme_name, colpal)
mapt$Function("ShowTheme", "", theme_name)

mapt$Function("SetLayerVisibility", paste(mapname, ly, sep = "|"), "False")
mapt$Function("RedrawMap", mapname)

 

The residual calculation is commented out in the above code, but this is what the residual summary looks like:

Object of class SpatialPointsDataFrame
Coordinates:
                min       max
coords.x1 -749844.6 -26200.92
coords.x2 -611133.8 688756.20
Is projected: NA 
proj4string : [NA]
Number of points: 8230
Data attributes:
   var1.pred           var1.var       observed         residual              zscore               fold      
 Min.   :  0.7675   Min.   :1043   Min.   : -0.38   Min.   :-229.09027   Min.   :-5.028187   Min.   :1.000  
 1st Qu.: 45.6939   1st Qu.:1423   1st Qu.: 33.40   1st Qu.: -16.19370   1st Qu.:-0.405964   1st Qu.:2.000  
 Median : 90.5128   Median :1559   Median : 91.99   Median :  -0.18089   Median :-0.004859   Median :3.000  
 Mean   :116.4700   Mean   :1614   Mean   :116.54   Mean   :   0.07431   Mean   : 0.001694   Mean   :3.011  
 3rd Qu.:158.5859   3rd Qu.:1737   3rd Qu.:165.81   3rd Qu.:  15.94353   3rd Qu.: 0.408931   3rd Qu.:4.000  
 Max.   :546.6920   Max.   :6257   Max.   :563.00   Max.   : 287.52655   Max.   : 6.252231   Max.   :5.000

The above code plots the gridded layer using a colored dot theme. The result looks like this:

A section of the gridded data (Texas and Oklahoma Panhandles). Each grid point is plotted with a colored circle according to depth. Click for larger view.

A section of the gridded data (Texas and Oklahoma Panhandles). Each grid point is plotted with a colored dot according to depth in feet (red is deeper). Click for larger view.

Once you have a gridded layer, you can then create a 3d surface, create a contour map, or perform other 3d analyses. Here is a contour map of the entire gridded data:

Contour map of the kriged Ogwalla Aquifer Depth data. Click for larger view.

Contour map of the kriged Ogwalla Aquifer Depth data. Click for larger view.

One important thing to note here is that we interpolated the data to a rectangular shape, when the aquifer is actually a very irregular shape. Ideally we would create a mask shape of the aquifer’s extents, and use this to limit the plot. For example, the aquifer does not in fact reach Lawton, Wichita Falls, or Abilene, let alone the Dallas-Fort Worth Metroplex!

Some notes about using R with Maptitude:

  • R usually assumes a number without a decimal point is a float point number. This can cause problems with the COM interface. Use the L suffix for all integers that will be passed to Maptitude. E.g. “20L”
  • Although NULL is defined in R, RDCOMClient refused to accept it as a parameter. Maptitude’s interface uses NULL a lot for default settings. Therefore all parameters should be explicitly stated. Sometimes an empty string (“”) or zero is sufficient as an empty parameter placeholder.
  • Maptitude OptionArrays should be passed as nested lists. A vector ( “c(…)” notation ) might seem more logical, but RDCOMClient only accepts lists. This is because lists will accept mixed types.
  • String processing is ugly in R. What string processing exists is geared towards the parsing and generation of text data files (eg. CSV); and to be fair, R is very good at this.  Simple text concatenation requires the use of paste with an empty separator, e.g. db_id <- paste(“[“, view_name, “].ID”, sep = “”)

 

Further Reading

Leave a Reply