Skip to contents

Performance matters because spatial Markov workflows can touch tract panels with thousands of units and many periods. This vignette records the benchmark shape. It is evaluated for pkgdown and skipped during package checks.

Synthetic grid panel

A 60 by 60 queen-contiguity grid over ten time periods exercises the same classification and spatial-lag code paths a tract-scale panel would.

make_panel <- function(nx = 60, ny = 60, years = 2010:2019) {
  cells <- st_make_grid(st_bbox(c(xmin = 0, ymin = 0, xmax = nx, ymax = ny)), n = c(nx, ny))
  grid <- st_sf(id = seq_along(cells), geometry = cells)

  panel <- tidyr::crossing(id = grid$id, year = years) |>
    left_join(st_drop_geometry(grid), by = "id") |>
    mutate(value = id + as.integer(factor(year)) + rnorm(n())) |>
    left_join(select(grid, id, geometry), by = "id") |>
    st_as_sf()

  list(grid = grid, panel = panel)
}

fx <- make_panel()

geom <- fx$grid |>
  mutate(
    nb = st_contiguity(geometry),
    wt = st_weights(nb)
  )

listw <- nb2listw(geom$nb, glist = geom$wt, style = "W")

microbenchmark(
  classify = classify_dynamics(fx$panel, id, year, value, k = 5),
  markov = {
    cls <- classify_dynamics(fx$panel, id, year, value, k = 5)
    markov_dynamics(cls, id, year, class)
  },
  spatial = spatial_markov(fx$panel, id, year, value, geometry = geom, k = 5),
  times = 5
)
## Warning in microbenchmark(classify = classify_dynamics(fx$panel, id, year, :
## less accurate nanosecond times to avoid potential integer overflows
## Unit: milliseconds
##      expr       min        lq      mean    median        uq       max neval cld
##  classify  17.12053  17.41852  18.02057  17.60466  18.25673  19.70239     5  a 
##    markov 133.28432 138.18427 163.54087 142.03380 184.17532 220.02662     5   b
##   spatial 173.58408 178.38026 195.48954 194.51568 200.62181 230.34587     5   b

Comparison against estdaR

The next chunk runs only when estdaR is installed. It calls griddy::spatial_markov() and estdaR::sp.mkv() on the same wide panel, matched class breaks, and identical weights. Any difference at this point reflects implementation overhead, not differing inputs.

library(estdaR)
## Loading required package: ggplot2
## Warning: package 'ggplot2' was built under R version 4.5.2
## 
## Attaching package: 'estdaR'
## The following objects are masked from 'package:spdep':
## 
##     geary, moran
wide_panel <- fx$panel |>
  st_drop_geometry() |>
  pivot_wider(id_cols = id, names_from = year, values_from = value) |>
  arrange(id)

wide_mat <- as.matrix(wide_panel |> select(-id))

microbenchmark(
  griddy = spatial_markov(fx$panel, id, year, value, geometry = geom, k = 5),
  estdaR = estdaR::sp.mkv(wide_mat, listw, classes = 5, fixed = TRUE),
  times = 5
)
## Unit: milliseconds
##    expr       min        lq      mean    median        uq       max neval cld
##  griddy 175.54281 175.93436 183.38720 179.90948 184.50869 201.04067     5  a 
##  estdaR  31.93428  32.24962  32.73781  32.63485  33.23419  33.63611     5   b

griddy carries the long-format reshape, classification, and label preservation overhead that estdaR::sp.mkv() skips by working directly on a unit-by-period matrix. The expected pattern is that griddy is slower at this panel size; how much slower is the relevant question. If the gap exceeds an order of magnitude on tract-scale panels it should be revisited.