Performance matters because spatial Markov workflows can touch tract panels with thousands of units and many periods. This vignette records the benchmark shape. It is evaluated for pkgdown and skipped during package checks.
library(griddy)
library(microbenchmark)
library(sf)
library(sfdep)
library(spdep)
library(dplyr)
library(tidyr)Synthetic grid panel
A 60 by 60 queen-contiguity grid over ten time periods exercises the same classification and spatial-lag code paths a tract-scale panel would.
make_panel <- function(nx = 60, ny = 60, years = 2010:2019) {
cells <- st_make_grid(st_bbox(c(xmin = 0, ymin = 0, xmax = nx, ymax = ny)), n = c(nx, ny))
grid <- st_sf(id = seq_along(cells), geometry = cells)
panel <- tidyr::crossing(id = grid$id, year = years) |>
left_join(st_drop_geometry(grid), by = "id") |>
mutate(value = id + as.integer(factor(year)) + rnorm(n())) |>
left_join(select(grid, id, geometry), by = "id") |>
st_as_sf()
list(grid = grid, panel = panel)
}
fx <- make_panel()
geom <- fx$grid |>
mutate(
nb = st_contiguity(geometry),
wt = st_weights(nb)
)
listw <- nb2listw(geom$nb, glist = geom$wt, style = "W")
microbenchmark(
classify = classify_dynamics(fx$panel, id, year, value, k = 5),
markov = {
cls <- classify_dynamics(fx$panel, id, year, value, k = 5)
markov_dynamics(cls, id, year, class)
},
spatial = spatial_markov(fx$panel, id, year, value, geometry = geom, k = 5),
times = 5
)## Warning in microbenchmark(classify = classify_dynamics(fx$panel, id, year, :
## less accurate nanosecond times to avoid potential integer overflows
## Unit: milliseconds
## expr min lq mean median uq max neval cld
## classify 17.12053 17.41852 18.02057 17.60466 18.25673 19.70239 5 a
## markov 133.28432 138.18427 163.54087 142.03380 184.17532 220.02662 5 b
## spatial 173.58408 178.38026 195.48954 194.51568 200.62181 230.34587 5 b
Comparison against estdaR
The next chunk runs only when estdaR is installed. It
calls griddy::spatial_markov() and
estdaR::sp.mkv() on the same wide panel, matched class
breaks, and identical weights. Any difference at this point reflects
implementation overhead, not differing inputs.
library(estdaR)## Loading required package: ggplot2
## Warning: package 'ggplot2' was built under R version 4.5.2
##
## Attaching package: 'estdaR'
## The following objects are masked from 'package:spdep':
##
## geary, moran
wide_panel <- fx$panel |>
st_drop_geometry() |>
pivot_wider(id_cols = id, names_from = year, values_from = value) |>
arrange(id)
wide_mat <- as.matrix(wide_panel |> select(-id))
microbenchmark(
griddy = spatial_markov(fx$panel, id, year, value, geometry = geom, k = 5),
estdaR = estdaR::sp.mkv(wide_mat, listw, classes = 5, fixed = TRUE),
times = 5
)## Unit: milliseconds
## expr min lq mean median uq max neval cld
## griddy 175.54281 175.93436 183.38720 179.90948 184.50869 201.04067 5 a
## estdaR 31.93428 32.24962 32.73781 32.63485 33.23419 33.63611 5 b
griddy carries the long-format reshape, classification,
and label preservation overhead that estdaR::sp.mkv() skips
by working directly on a unit-by-period matrix. The expected pattern is
that griddy is slower at this panel size; how much slower
is the relevant question. If the gap exceeds an order of magnitude on
tract-scale panels it should be revisited.