Latent Variable Measurement Error Correction with lpmec

Introduction

This tutorial demonstrates how to use the lpmec package for measurement error correction in regression models using latent variable estimation. The package implements bootstrapped analyses to account for measurement error in observed indicators and provides corrected regression coefficients.

Installation

First install the required dependencies and the lpmec package:

# Install lpmec from source (replace with appropriate installation method)
# devtools::install_github("cjerzak/lpmec-software", subdir = "lpmec")

Basic Usage

Data Simulation

Simulate data with a latent predictor and observed binary indicators:

set.seed(123)
n <- 1000  # Number of observations
d <- 10    # Number of observable indicators

# Generate latent variable and observed outcomes
x_true <- rnorm(n)
Yobs <- 0.4 * x_true + rnorm(n, sd = 0.35)

# Generate binary indicators of latent variable
ObservablesMat <- sapply(1:d, function(j) {
  p <- pnorm(0.5 * x_true + rnorm(n, sd = 0.5))
  rbinom(n, 1, p)
})

Running the Analysis

Use lpmec to estimate corrected coefficients:

library(lpmec)

# Run analysis with bootstrap uncertainty estimates
results <- lpmec(
  Y = Yobs,
  observables = as.data.frame(ObservablesMat),
  n_boot = 10,      # Use 0 to disable bootstrap resampling
  n_partition = 5,  # Reduced for demonstration
  estimation_method = "em"
)

## Warning: n_boot < 199 gives coarse bootstrap confidence intervals; increase
## n_boot for interval estimation.

## {booti_ 1 of 11} -- {parti_ 1 of 5}

## {booti_ 1 of 11} -- {parti_ 2 of 5}

## {booti_ 1 of 11} -- {parti_ 3 of 5}

## {booti_ 1 of 11} -- {parti_ 4 of 5}

## {booti_ 1 of 11} -- {parti_ 5 of 5}

## {booti_ 2 of 11} -- {parti_ 1 of 5}

## {booti_ 2 of 11} -- {parti_ 2 of 5}

## {booti_ 2 of 11} -- {parti_ 3 of 5}

## {booti_ 2 of 11} -- {parti_ 4 of 5}

## {booti_ 2 of 11} -- {parti_ 5 of 5}

## {booti_ 3 of 11} -- {parti_ 1 of 5}

## {booti_ 3 of 11} -- {parti_ 2 of 5}

## {booti_ 3 of 11} -- {parti_ 3 of 5}

## {booti_ 3 of 11} -- {parti_ 4 of 5}

## {booti_ 3 of 11} -- {parti_ 5 of 5}

## {booti_ 4 of 11} -- {parti_ 1 of 5}

## {booti_ 4 of 11} -- {parti_ 2 of 5}

## {booti_ 4 of 11} -- {parti_ 3 of 5}

## {booti_ 4 of 11} -- {parti_ 4 of 5}

## {booti_ 4 of 11} -- {parti_ 5 of 5}

## {booti_ 5 of 11} -- {parti_ 1 of 5}

## {booti_ 5 of 11} -- {parti_ 2 of 5}

## {booti_ 5 of 11} -- {parti_ 3 of 5}

## {booti_ 5 of 11} -- {parti_ 4 of 5}

## {booti_ 5 of 11} -- {parti_ 5 of 5}

## {booti_ 6 of 11} -- {parti_ 1 of 5}

## {booti_ 6 of 11} -- {parti_ 2 of 5}

## {booti_ 6 of 11} -- {parti_ 3 of 5}

## {booti_ 6 of 11} -- {parti_ 4 of 5}

## {booti_ 6 of 11} -- {parti_ 5 of 5}

## {booti_ 7 of 11} -- {parti_ 1 of 5}

## {booti_ 7 of 11} -- {parti_ 2 of 5}

## {booti_ 7 of 11} -- {parti_ 3 of 5}

## {booti_ 7 of 11} -- {parti_ 4 of 5}

## {booti_ 7 of 11} -- {parti_ 5 of 5}

## {booti_ 8 of 11} -- {parti_ 1 of 5}

## {booti_ 8 of 11} -- {parti_ 2 of 5}

## {booti_ 8 of 11} -- {parti_ 3 of 5}

## {booti_ 8 of 11} -- {parti_ 4 of 5}

## {booti_ 8 of 11} -- {parti_ 5 of 5}

## {booti_ 9 of 11} -- {parti_ 1 of 5}

## {booti_ 9 of 11} -- {parti_ 2 of 5}

## {booti_ 9 of 11} -- {parti_ 3 of 5}

## {booti_ 9 of 11} -- {parti_ 4 of 5}

## {booti_ 9 of 11} -- {parti_ 5 of 5}

## {booti_ 10 of 11} -- {parti_ 1 of 5}

## {booti_ 10 of 11} -- {parti_ 2 of 5}

## {booti_ 10 of 11} -- {parti_ 3 of 5}

## {booti_ 10 of 11} -- {parti_ 4 of 5}

## {booti_ 10 of 11} -- {parti_ 5 of 5}

## {booti_ 11 of 11} -- {parti_ 1 of 5}

## {booti_ 11 of 11} -- {parti_ 2 of 5}

## {booti_ 11 of 11} -- {parti_ 3 of 5}

## {booti_ 11 of 11} -- {parti_ 4 of 5}

## {booti_ 11 of 11} -- {parti_ 5 of 5}

Compare naive and corrected estimates:

print(results)

## Latent Predictor Measurement Error Correction (LPMEC) Model Results
## -------------------------------------------------------------------
## Resampling: n_out_of_n, m = 1000, CI = percentile
## Uncorrected Coefficient (OLS): 0.301 (SE: 0.022)
## Corrected Coefficient: 0.425 (SE: 0.027)
## Bayesian OLS (Outer): NA (SE: NA)
## Use summary() for detailed results.

summary(results)

## Latent Predictor Measurement Error Correction (LPMEC) Model Summary
## ====================================================================
## Resampling: n_out_of_n, m = 1000 (m/n = 1.000), CI = percentile, replace = TRUE
## Bootstrap success rate: 1.000
##                       Estimate         SE  CI_Lower  CI_Upper
## OLS                  0.3008490 0.02179329 0.2608019 0.3227255
## IV                   0.7411981 0.05831891 0.6598075 0.8314390
## Corrected IV         0.4250383 0.02723953 0.3916324 0.4690671
## Corrected OLS        0.4250383 0.02723953 0.3916324 0.4690671
## Bayesian OLS (Outer)        NA         NA        NA        NA
## Bayesian OLS (Inner)        NA         NA        NA        NA

You can visualize the relationship between split-half estimates:

plot(results)

Inference for Median Partition Aggregation

The default partition aggregation is the finite-partition median. Because the median is nonsmooth, the formal route for uncertainty is subsampling or m-out-of-n bootstrap with root-scaled intervals:

results_med <- lpmec(
  Y = Yobs,
  observables = as.data.frame(ObservablesMat),
  n_boot = 499L,
  n_partition = 10L,
  estimation_method = "em",
  partition_aggregation = "median",
  bootstrap_method = "subsampling",
  boot_m_rule = "power",
  boot_m_exponent = 0.70,
  boot_ci_type = "root",
  seed = 123
)

This path holds the same realized partition set fixed across the original sample and all subsamples, reruns the full latent-score and correction pipeline inside each subsample, and rescales the empirical root statistic from m to the original n rate. Inspect boot_m, boot_m_ratio, valid_partitions_by_boot, bootstrap_success_rate, and sensitivity across a grid of boot_m values in applied work.

Advanced Features

Using Different Estimation Methods

The package supports multiple estimation approaches:

# Bayesian MCMC estimation with the default pscl backend
if(FALSE){
mcmc_results <- lpmec(
  Y = Yobs,
  observables = as.data.frame(ObservablesMat),
  estimation_method = "mcmc"
)

# To use the optional NumPyro backend, first run build_backend()
# or provide your own Python environment with JAX and NumPyro.
numpyro_results <- lpmec(
  Y = Yobs,
  observables = as.data.frame(ObservablesMat),
  estimation_method = "mcmc",
  mcmc_control = list(backend = "numpyro"),
  conda_env = "lpmec"
)
}

Conclusion

The lpmec package provides robust measurement error correction through:

Bootstrapped latent variable estimation
Multiple correction methods (OLS, IV, Bayesian)
Flexible estimation backends (EM, MCMC)

Refer to package documentation for advanced configuration options.