Performance Benchmarks • griddy

Performance matters because spatial Markov workflows can touch tract panels with thousands of units and many periods. This vignette records the benchmark shape. It is evaluated for pkgdown and skipped during package checks.

library(griddy)
library(microbenchmark)
library(sf)
library(sfdep)
library(spdep)
library(dplyr)
library(tidyr)

Synthetic grid panel

A 60 by 60 queen-contiguity grid over ten time periods exercises the same classification and spatial-lag code paths a tract-scale panel would.

make_panel <- function(nx = 60, ny = 60, years = 2010:2019) {
  cells <- st_make_grid(st_bbox(c(xmin = 0, ymin = 0, xmax = nx, ymax = ny)), n = c(nx, ny))
  grid <- st_sf(id = seq_along(cells), geometry = cells)

  panel <- tidyr::crossing(id = grid$id, year = years) |>
    left_join(st_drop_geometry(grid), by = "id") |>
    mutate(value = id + as.integer(factor(year)) + rnorm(n())) |>
    left_join(select(grid, id, geometry), by = "id") |>
    st_as_sf()

  list(grid = grid, panel = panel)
}

fx <- make_panel()

geom <- fx$grid |>
  mutate(
    nb = st_contiguity(geometry),
    wt = st_weights(nb)
  )

listw <- nb2listw(geom$nb, glist = geom$wt, style = "W")

microbenchmark(
  classify = classify_dynamics(fx$panel, id, year, value, k = 5),
  markov = {
    cls <- classify_dynamics(fx$panel, id, year, value, k = 5)
    markov_dynamics(cls, id, year, class)
  },
  spatial = spatial_markov(fx$panel, id, year, value, geometry = geom, k = 5),
  times = 5
)

## Warning in microbenchmark(classify = classify_dynamics(fx$panel, id, year, :
## less accurate nanosecond times to avoid potential integer overflows

## Unit: milliseconds
##      expr      min       lq     mean   median       uq      max neval cld
##  classify 16.92902 17.07187 17.73266 17.37990 17.51819 19.76430     5 a  
##    markov 30.41376 32.34408 32.63432 32.65810 33.51340 34.24226     5  b 
##   spatial 45.71049 46.94279 49.82277 47.84593 50.93086 57.68380     5   c

Comparison against estdaR

The next chunk runs only when estdaR is installed. It calls griddy::spatial_markov() and estdaR::sp.mkv() on the same wide panel, matched class breaks, and identical weights. Any difference at this point reflects implementation overhead, not differing inputs.

library(estdaR)

## Loading required package: ggplot2

## Warning: package 'ggplot2' was built under R version 4.5.2

## 
## Attaching package: 'estdaR'

## The following objects are masked from 'package:spdep':
## 
##     geary, moran

wide_panel <- fx$panel |>
  st_drop_geometry() |>
  pivot_wider(id_cols = id, names_from = year, values_from = value) |>
  arrange(id)

wide_mat <- as.matrix(wide_panel |> select(-id))

microbenchmark(
  griddy = spatial_markov(fx$panel, id, year, value, geometry = geom, k = 5),
  estdaR = estdaR::sp.mkv(wide_mat, listw, classes = 5, fixed = TRUE),
  times = 5
)

## Unit: milliseconds
##    expr      min       lq     mean  median       uq       max neval cld
##  griddy 44.69004 45.61812 66.50568 46.7794 47.48686 147.95399     5   a
##  estdaR 29.66768 30.87497 33.18220 33.3640 35.66041  36.34396     5   a

griddy carries the long-format reshape, classification, and label preservation overhead that estdaR::sp.mkv() skips by working directly on a unit-by-period matrix. Transition tabulation and spatial-lag computation are vectorized internally (no per-unit or per-period grouped operations), so the remaining gap reflects that bookkeeping. The expected pattern is that griddy is modestly slower at this panel size; if the gap exceeds an order of magnitude on tract-scale panels it should be revisited.