Skip to contents

Creates a model that normalizes observations by their expected standard error, accounting for varying sample sizes. This addresses "sampling error bias" where regions with small sample sizes show artificially high variability.

Usage

bs_model_funnel(
  sample_size,
  target_rate = NULL,
  type = c("count", "proportion"),
  formula = c("paper", "poisson"),
  control_limits = c(2, 3),
  name = NULL
)

Arguments

sample_size

Numeric vector of sample sizes (e.g., population)

target_rate

Target rate/proportion. If NULL, estimated from data.

type

Type of data: "count" (Poisson) or "proportion" (binomial)

formula

Formula for likelihood computation:

  • "paper" (default): Uses the funnel score from the paper's unemployment data (dM = Z * sqrt(pop_frac)) and converts it to a two-tailed normal tail probability.

  • "poisson": Uses Poisson-based standard error and converts the resulting z-score to a two-tailed normal tail probability.

control_limits

Numeric vector of control limits (in SDs) for funnel plot. Default is c(2, 3) for warning and control limits.

name

Optional name for the model

Value

A bs_model_funnel object

Details

The de Moivre funnel model uses the insight that sampling variability decreases with sample size according to de Moivre's equation: $$SE = \sigma / \sqrt{n}$$

With formula = "paper": The model uses the formula that matches the paper's unemployment reference data: $$Z = (rate - mean_rate) / stddev_rate$$ $$dM = Z \times \sqrt{population / total\_population}$$ $$P(D|M) = 2 \times \Phi(-|dM|)$$

With formula = "poisson": For count data (Poisson), the model computes z-scores as: $$z = (observed - expected) / \sqrt{expected}$$

For proportion data (binomial): $$z = (observed - expected) / \sqrt{p(1-p)/n}$$

Observations with large z-scores (far from expected after accounting for sample size) are genuinely surprising, while high rates in small regions are discounted as expected variation.

This model is essential for:

  • De-biasing per-capita rate maps

  • Creating funnel plots

  • Identifying genuine outliers vs. sampling noise

Examples

# Population sizes for regions
population <- c(10000, 50000, 100000, 25000)

# Funnel model using the paper's unemployment-reference formula
model <- bs_model_funnel(population, formula = "paper")

# Funnel model with known target rate
model <- bs_model_funnel(population, target_rate = 0.001)

# For proportion data with Poisson-based formula
model <- bs_model_funnel(population, type = "proportion", formula = "poisson")