| Title: | Measurement Error Analysis and Correction Under Identification Restrictions |
|---|---|
| Description: | Implements methods for analyzing latent variable models with measurement error correction, including Item Response Theory (IRT) models. Provides tools for various correction methods such as Bayesian Markov Chain Monte Carlo (MCMC), over-imputation, bootstrapping for robust standard errors, Ordinary Least Squares (OLS), and Instrumental Variables (IV) based approaches. Supports flexible specification of observable indicators and groupings for latent variable analyses in social sciences and other fields. Methods are described in a working paper (2025) <doi:10.48550/arXiv.2507.22218>. |
| Authors: | Connor Jerzak [aut, cre], Stephen Jessee [aut] |
| Maintainer: | Connor Jerzak <[email protected]> |
| License: | GPL-3 |
| Version: | 1.1.4 |
| Built: | 2026-06-06 13:22:43 UTC |
| Source: | https://github.com/cjerzak/lpmec-software |
A function to build the environment for lpmec. Builds a conda environment in which 'JAX', 'numpyro', and 'numpy' are installed. Users can also create a conda environment where 'JAX' and 'numpy' are installed themselves.
build_backend(conda_env = "lpmec", conda = "auto")build_backend(conda_env = "lpmec", conda = "auto")
conda_env |
(default = |
conda |
(default = |
Invisibly returns NULL; this function is used for its side effects
of creating and configuring a conda environment for lpmec.
This function requires an Internet connection.
You can find out a list of conda Python paths via: Sys.which("python")
## Not run: # Create a conda environment named "lpmec" # and install the required Python packages (jax, numpy, etc.) build_backend(conda_env = "lpmec", conda = "auto") # If you want to specify a particular conda path: # build_backend(conda_env = "lpmec", conda = "/usr/local/bin/conda") ## End(Not run)## Not run: # Create a conda environment named "lpmec" # and install the required Python packages (jax, numpy, etc.) build_backend(conda_env = "lpmec", conda = "auto") # If you want to specify a particular conda path: # build_backend(conda_env = "lpmec", conda = "/usr/local/bin/conda") ## End(Not run)
This helper analyzes observable indicators and returns a numeric vector
of 1 or -1 for use with the orientation_signs
argument in lpmec. Each sign is chosen so that the correlation
between the oriented indicator and either the outcome Y or the
first principal component of the indicators is positive.
infer_orientation_signs(Y, observables, method = c("Y", "PC1"))infer_orientation_signs(Y, observables, method = c("Y", "PC1"))
Y |
Numeric outcome vector. Only used when |
observables |
A matrix or data frame of binary observable indicators. |
method |
Character string specifying how to orient the indicators.
Default is |
A numeric vector of length ncol(observables) containing
1 or -1.
set.seed(1) Y <- rnorm(10) obs <- data.frame(matrix(sample(c(0,1), 20, replace = TRUE), ncol = 2)) infer_orientation_signs(Y, obs)set.seed(1) Y <- rnorm(10) obs <- data.frame(matrix(sample(c(0,1), 20, replace = TRUE), ncol = 2)) infer_orientation_signs(Y, obs)
KnowledgeVoteDuty is a modified set of responses to a small set of questions on the American National Election Study's 2024 Time Series Study. These data only include respondents who had non-missing values on all of the variables included, dropping respondents with one or more missing values.
data(KnowledgeVoteDuty)data(KnowledgeVoteDuty)
A data frame with 3,059 observations and 5 variables:
Whether respondents feel that voting is a duty or a choice. Values range from 1 to 7, with 1 being "Very strongly a duty" and 7 being "Very strongly a choice," created based on variable V241218x.
Dummy variable (0 or 1) for whether respondent correctly stated the length of a U.S. Senate term. Created based on variable V241612.
Dummy variable (0 or 1) for whether respondent correctly identified "Foreign aid" from a list as the category the federal government spends the least on. Created based on variable V241613.
Dummy variable (0 or 1) for whether respondent correctly identified the political party that currently has the most members in the U.S. House of Representatives. Created based on variable V241614.
Dummy variable (0 or 1) for whether respondent correctly identified the political party that currently has the most members in the U.S. Senate. Created based on variable V241615.
American National Election Studies. 2024. ANES 2024 Time Series Study Full Release [dataset and documentation]. Available at electionstudies.org.
data(KnowledgeVoteDuty) voteduty <- KnowledgeVoteDuty$voteduty knowledge <- scale(rowMeans(KnowledgeVoteDuty[ , -1])) summary(lm(voteduty ~ knowledge))data(KnowledgeVoteDuty) voteduty <- KnowledgeVoteDuty$voteduty knowledge <- scale(rowMeans(KnowledgeVoteDuty[ , -1])) summary(lm(voteduty ~ knowledge))
Implements latent variable models with measurement error correction
lpmec( Y, observables, observables_groupings = colnames(observables), orientation_signs = NULL, make_observables_groupings = FALSE, n_boot = 32L, n_partition = 10L, partition_aggregation = "median", partition_aggregation_probs = c(0.01, 0.99), boot_basis = 1:length(Y), return_intermediaries = TRUE, ordinal = FALSE, estimation_method = "em", latent_estimation_fn = NULL, mcmc_control = list(backend = "pscl", n_samples_warmup = 500L, n_samples_mcmc = 1000L, batch_size = 512L, chain_method = "parallel", subsample_method = "full", anchor_parameter_id = NULL, n_thin_by = 1L, n_chains = 2L, outcome_prior = list(calibration = "data"), joint2_prior = list()), conda_env = "lpmec", conda_env_required = FALSE )lpmec( Y, observables, observables_groupings = colnames(observables), orientation_signs = NULL, make_observables_groupings = FALSE, n_boot = 32L, n_partition = 10L, partition_aggregation = "median", partition_aggregation_probs = c(0.01, 0.99), boot_basis = 1:length(Y), return_intermediaries = TRUE, ordinal = FALSE, estimation_method = "em", latent_estimation_fn = NULL, mcmc_control = list(backend = "pscl", n_samples_warmup = 500L, n_samples_mcmc = 1000L, batch_size = 512L, chain_method = "parallel", subsample_method = "full", anchor_parameter_id = NULL, n_thin_by = 1L, n_chains = 2L, outcome_prior = list(calibration = "data"), joint2_prior = list()), conda_env = "lpmec", conda_env_required = FALSE )
Y |
A vector of observed outcome variables |
observables |
A matrix of observable indicators used to estimate the latent variable |
observables_groupings |
A vector specifying groupings for the observable indicators. Default is column names of observables. |
orientation_signs |
(optional) A numeric vector of length equal to the number of columns in 'observables', containing 1 or -1 to indicate the desired orientation of each column. If provided, each column of 'observables' will be oriented by this sign before analysis. Default is NULL (no orientation applied). |
make_observables_groupings |
Logical. If TRUE, creates dummy variables for each level of the observable indicators. Default is FALSE. |
n_boot |
Non-negative integer. Number of bootstrap iterations. Use
|
n_partition |
Positive integer. Number of split-half partitions for each
bootstrap iteration. When |
partition_aggregation |
Aggregation strategy for combining estimates across
partitions within each bootstrap iteration. Default is |
partition_aggregation_probs |
Numeric vector of length 2 used by
|
boot_basis |
Vector of indices or grouping variable for stratified bootstrap. Default is 1:length(Y). |
return_intermediaries |
Logical. If TRUE, returns intermediate results. Default is TRUE. |
ordinal |
Logical indicating whether the observable indicators are ordinal (TRUE) or binary (FALSE). |
estimation_method |
Character specifying the estimation approach. Options include:
|
latent_estimation_fn |
Custom function for estimating latent trait from |
mcmc_control |
A list indicating parameter specifications if MCMC used.
|
conda_env |
A character string specifying the name of the conda environment to use
via |
conda_env_required |
A logical indicating whether the specified conda environment
must be strictly used. If |
This function implements a latent variable analysis with measurement error correction.
It fits the original sample and, when n_boot >= 1, performs bootstrap
resampling for uncertainty estimates. Each original or bootstrap sample is
analyzed with one or more split-half partitions. For each partition,
it calls the lpmec_onerun function to estimate latent variables and apply various correction methods.
The results are then aggregated across partitions and bootstrap iterations to produce final estimates
and, when bootstrap draws are available, bootstrap standard errors.
A list containing various estimates and statistics (in snake_case):
Naive, IV, corrected IV, and corrected OLS estimates:
ols_*, iv_*, corrected_iv_*, and
corrected_ols_*. Bootstrap uncertainty summaries use suffixes
_se, _lower, _upper, and _tstat where
applicable.
var_est_split and var_est_split_se: Aggregated
split-half measurement-error variance and, when bootstrap draws are
available, its bootstrap standard error.
bayesian_ols_*_outer_normed and
bayesian_ols_*_inner_normed: MCMC coefficient summaries. The
*_parametric standard-error fields retain within-run posterior
uncertainty, while the non-parametric standard-error and interval fields
summarize bootstrap variation.
m_stage_1_erv* and m_reduced_erv*: Extreme robustness
values and bootstrap uncertainty summaries for the first-stage and
reduced-form regressions.
mcmc_joint2_*: NumPyro "mcmc_joint2" diagnostics,
including effective-sample-size percentages, maximum R-hat, divergent
transitions, mean accept probability, and orientation diagnostics.
x_est1 and x_est2: Split-half latent variable
estimates from the original sample.
Intermediary_*: Per-run original-sample and bootstrap
outputs, returned only when return_intermediaries = TRUE.
Jerzak, C. T. and Jessee, S. A. (2025). Attenuation Bias with Latent Predictors. arXiv:2507.22218 [stat.AP]. https://arxiv.org/abs/2507.22218
# Generate some example data set.seed(123) Y <- rnorm(1000) observables <- as.data.frame(matrix(sample(c(0,1), 1000*10, replace = TRUE), ncol = 10)) # Run the bootstrapped analysis results <- lpmec(Y = Y, observables = observables, n_boot = 10, # small values for illustration only n_partition = 5 # small for size ) # Use a winsorized mean across partitions results_winsorized <- lpmec(Y = Y, observables = observables, n_boot = 10, n_partition = 5, partition_aggregation = "winsorized_mean") # View the corrected IV coefficient and its standard error print(results)# Generate some example data set.seed(123) Y <- rnorm(1000) observables <- as.data.frame(matrix(sample(c(0,1), 1000*10, replace = TRUE), ncol = 10)) # Run the bootstrapped analysis results <- lpmec(Y = Y, observables = observables, n_boot = 10, # small values for illustration only n_partition = 5 # small for size ) # Use a winsorized mean across partitions results_winsorized <- lpmec(Y = Y, observables = observables, n_boot = 10, n_partition = 5, partition_aggregation = "winsorized_mean") # View the corrected IV coefficient and its standard error print(results)
Runs lpmec_multivariate_onerun over repeated split-half
partitions and optional row bootstrap samples.
lpmec_multivariate( Y, observables, covariates = NULL, observables_groupings = NULL, make_observables_groupings = FALSE, n_boot = 32L, n_partition = 10L, partition_aggregation = "median", partition_aggregation_probs = c(0.01, 0.99), boot_basis = seq_along(Y), return_intermediaries = TRUE, estimation_method = "em", latent_estimation_fn = NULL, ordinal = FALSE, mcmc_control = list(backend = "pscl", n_samples_warmup = 500L, n_samples_mcmc = 1000L, batch_size = 512L, chain_method = "parallel", subsample_method = "full", anchor_parameter_id = NULL, n_thin_by = 1L, n_chains = 2L, outcome_prior = list(calibration = "data"), joint2_prior = list()), min_split_correlation = 0, conda_env = "lpmec", conda_env_required = FALSE )lpmec_multivariate( Y, observables, covariates = NULL, observables_groupings = NULL, make_observables_groupings = FALSE, n_boot = 32L, n_partition = 10L, partition_aggregation = "median", partition_aggregation_probs = c(0.01, 0.99), boot_basis = seq_along(Y), return_intermediaries = TRUE, estimation_method = "em", latent_estimation_fn = NULL, ordinal = FALSE, mcmc_control = list(backend = "pscl", n_samples_warmup = 500L, n_samples_mcmc = 1000L, batch_size = 512L, chain_method = "parallel", subsample_method = "full", anchor_parameter_id = NULL, n_thin_by = 1L, n_chains = 2L, outcome_prior = list(calibration = "data"), joint2_prior = list()), min_split_correlation = 0, conda_env = "lpmec", conda_env_required = FALSE )
Y |
Numeric outcome vector. |
observables |
A list of matrices or data frames, one per latent predictor. |
covariates |
Optional matrix or data frame of observed covariates. |
observables_groupings |
Optional list of grouping vectors, one per latent predictor. Defaults to the column names of each observable matrix. |
make_observables_groupings |
Logical scalar or vector passed to
|
n_boot |
Non-negative integer number of row-bootstrap iterations. |
n_partition |
Positive integer number of split-half partitions per original or bootstrap sample. |
partition_aggregation |
Aggregation strategy across partitions. See
|
partition_aggregation_probs |
Quantile probabilities for winsorized or trimmed partition aggregation. |
boot_basis |
Optional vector of indices or strata for row bootstrap. |
return_intermediaries |
Logical. If |
estimation_method |
Character scalar or vector passed to
|
latent_estimation_fn |
Optional function or list of functions used when
|
ordinal |
Logical scalar or vector passed to |
mcmc_control |
List passed to |
min_split_correlation |
Minimum allowed split-half correlation. The correction is defined only for positive componentwise correlations. |
conda_env |
Character string naming the conda environment for MCMC methods. |
conda_env_required |
Logical indicating whether |
A list of class lpmec_multivariate containing aggregated
uncorrected OLS and corrected IV latent coefficients with bootstrap
uncertainty summaries when n_boot >= 1.
Implements the split-indicator multivariate IV correction for several latent predictors. Each latent predictor is estimated from its own indicator matrix.
lpmec_multivariate_onerun( Y, observables, covariates = NULL, observables_groupings = NULL, make_observables_groupings = FALSE, estimation_method = "em", latent_estimation_fn = NULL, ordinal = FALSE, mcmc_control = list(backend = "pscl", n_samples_warmup = 500L, n_samples_mcmc = 1000L, batch_size = 512L, chain_method = "parallel", subsample_method = "full", anchor_parameter_id = NULL, n_thin_by = 1L, n_chains = 2L, outcome_prior = list(calibration = "data"), joint2_prior = list()), min_split_correlation = 0, conda_env = "lpmec", conda_env_required = FALSE )lpmec_multivariate_onerun( Y, observables, covariates = NULL, observables_groupings = NULL, make_observables_groupings = FALSE, estimation_method = "em", latent_estimation_fn = NULL, ordinal = FALSE, mcmc_control = list(backend = "pscl", n_samples_warmup = 500L, n_samples_mcmc = 1000L, batch_size = 512L, chain_method = "parallel", subsample_method = "full", anchor_parameter_id = NULL, n_thin_by = 1L, n_chains = 2L, outcome_prior = list(calibration = "data"), joint2_prior = list()), min_split_correlation = 0, conda_env = "lpmec", conda_env_required = FALSE )
Y |
Numeric outcome vector. |
observables |
A list of matrices or data frames, one per latent predictor. |
covariates |
Optional matrix or data frame of observed covariates. |
observables_groupings |
Optional list of grouping vectors, one per latent predictor. Defaults to the column names of each observable matrix. |
make_observables_groupings |
Logical scalar or vector passed to
|
estimation_method |
Character scalar or vector passed to
|
latent_estimation_fn |
Optional function or list of functions used when
|
ordinal |
Logical scalar or vector passed to |
mcmc_control |
List passed to |
min_split_correlation |
Minimum allowed split-half correlation. The correction is defined only for positive componentwise correlations. |
conda_env |
Character string naming the conda environment for MCMC methods. |
conda_env_required |
Logical indicating whether |
A list of class lpmec_multivariate_onerun containing
uncorrected OLS coefficients, uncorrected IV coefficients, corrected IV
coefficients, split-half correlations, first-stage diagnostics, and latent
score matrices.
Implements analysis for latent variable models with measurement error correction
lpmec_onerun( Y, observables, observables_groupings = colnames(observables), make_observables_groupings = FALSE, estimation_method = "em", latent_estimation_fn = NULL, mcmc_control = list(backend = "pscl", n_samples_warmup = 500L, n_samples_mcmc = 1000L, batch_size = 512L, chain_method = "parallel", subsample_method = "full", n_thin_by = 1L, n_chains = 2L, outcome_prior = list(calibration = "data"), joint2_prior = list()), ordinal = FALSE, conda_env = "lpmec", conda_env_required = FALSE )lpmec_onerun( Y, observables, observables_groupings = colnames(observables), make_observables_groupings = FALSE, estimation_method = "em", latent_estimation_fn = NULL, mcmc_control = list(backend = "pscl", n_samples_warmup = 500L, n_samples_mcmc = 1000L, batch_size = 512L, chain_method = "parallel", subsample_method = "full", n_thin_by = 1L, n_chains = 2L, outcome_prior = list(calibration = "data"), joint2_prior = list()), ordinal = FALSE, conda_env = "lpmec", conda_env_required = FALSE )
Y |
A vector of observed outcome variables |
observables |
A matrix of observable indicators used to estimate the latent variable |
observables_groupings |
A vector specifying groupings for the observable indicators. Default is column names of observables. |
make_observables_groupings |
Logical. If TRUE, creates dummy variables for each level of the observable indicators. Default is FALSE. |
estimation_method |
Character specifying the estimation approach. Options include:
|
latent_estimation_fn |
Custom function for estimating latent trait from |
mcmc_control |
A list indicating parameter specifications if MCMC used.
|
ordinal |
Logical indicating whether the observable indicators are ordinal (TRUE) or binary (FALSE). |
conda_env |
A character string specifying the name of the conda environment to use
via |
conda_env_required |
A logical indicating whether the specified conda environment
must be strictly used. If |
This function implements a latent variable analysis with measurement error correction. It splits the observable indicators into two sets, estimates latent variables using each set, and then applies various correction methods including OLS correction and instrumental variable approaches.
A list containing various estimates and statistics:
Naive, IV, corrected IV, and corrected OLS estimates:
ols_*, iv_*, corrected_iv_*, and
corrected_ols_*. Split-specific estimates use suffixes
_a and _b.
var_est_split: Estimated split-half measurement-error
variance.
bayesian_ols_*_outer_normed and
bayesian_ols_*_inner_normed: MCMC coefficient summaries when an
MCMC method is used; otherwise NA.
m_stage_1_erv and m_reduced_erv: Extreme robustness
values for the first-stage and reduced-form regressions.
outcome_prior_*: Resolved outcome-prior values used by
NumPyro joint outcome models.
mcmc_joint2_*: NumPyro "mcmc_joint2" diagnostics,
including effective-sample-size percentages, maximum R-hat, divergent
transitions, mean accept probability, and orientation diagnostics.
x_est1 and x_est2: Split-half latent variable
estimates.
The following single-run standard errors and t-statistics are currently
returned as NA because their analytical derivation is not yet
implemented:
corrected_iv_se: Standard error for the corrected IV
coefficient
corrected_ols_se: Standard error for the corrected OLS coefficient
corrected_ols_tstat: T-statistic for the corrected OLS coefficient
corrected_ols_coef_alt: Alternative corrected OLS coefficient
For inference on these quantities, use the bootstrap approach via lpmec, which
provides valid confidence intervals and standard errors through resampling.
# Generate some example data set.seed(123) Y <- rnorm(1000) observables <- as.data.frame(matrix(sample(c(0,1), 1000*10, replace = TRUE), ncol = 10)) # Run the analysis results <- lpmec_onerun(Y = Y, observables = observables) # View the corrected estimates print(results)# Generate some example data set.seed(123) Y <- rnorm(1000) observables <- as.data.frame(matrix(sample(c(0,1), 1000*10, replace = TRUE), ncol = 10)) # Run the analysis results <- lpmec_onerun(Y = Y, observables = observables) # View the corrected estimates print(results)
Creates visualizations of LPMEC model results. Can plot either the latent variable estimates or the bootstrap distribution of coefficients.
## S3 method for class 'lpmec' plot(x, type = "latent", ...)## S3 method for class 'lpmec' plot(x, type = "latent", ...)
x |
An object of class |
type |
Character string specifying the plot type. Either |
... |
No return value, called for side effects (creates a plot).
lpmec, summary.lpmec, print.lpmec
Creates a scatter plot comparing the two split-half latent variable estimates.
## S3 method for class 'lpmec_onerun' plot(x, ...)## S3 method for class 'lpmec_onerun' plot(x, ...)
x |
An object of class |
... |
Additional arguments passed to |
No return value, called for side effects (creates a plot).
lpmec_onerun, summary.lpmec_onerun, print.lpmec_onerun
Prints a concise summary of bootstrapped LPMEC model results.
## S3 method for class 'lpmec' print(x, ...)## S3 method for class 'lpmec' print(x, ...)
x |
An object of class |
... |
Additional arguments (currently unused). |
The input object x, returned invisibly.
lpmec, summary.lpmec, plot.lpmec
Print method for lpmec_multivariate objects
## S3 method for class 'lpmec_multivariate' print(x, ...)## S3 method for class 'lpmec_multivariate' print(x, ...)
x |
An object of class |
... |
Additional arguments (currently unused). |
The input object x, returned invisibly.
Print method for lpmec_multivariate_onerun objects
## S3 method for class 'lpmec_multivariate_onerun' print(x, ...)## S3 method for class 'lpmec_multivariate_onerun' print(x, ...)
x |
An object of class |
... |
Additional arguments (currently unused). |
The input object x, returned invisibly.
Prints a concise summary of single-run LPMEC model results.
## S3 method for class 'lpmec_onerun' print(x, ...)## S3 method for class 'lpmec_onerun' print(x, ...)
x |
An object of class |
... |
Additional arguments (currently unused). |
The input object x, returned invisibly.
lpmec_onerun, summary.lpmec_onerun, plot.lpmec_onerun
Provides a comprehensive summary of bootstrapped LPMEC model results including OLS, IV, corrected, and Bayesian coefficient estimates with confidence intervals.
## S3 method for class 'lpmec' summary(object, ...)## S3 method for class 'lpmec' summary(object, ...)
object |
An object of class |
... |
Additional arguments (currently unused). |
A data frame containing coefficient estimates, standard errors, and confidence intervals, returned invisibly. The data frame has rows for OLS, IV, Corrected IV, Corrected OLS, and Bayesian OLS estimates.
lpmec, print.lpmec, plot.lpmec
Summary method for lpmec_multivariate objects
## S3 method for class 'lpmec_multivariate' summary(object, ...)## S3 method for class 'lpmec_multivariate' summary(object, ...)
object |
An object of class |
... |
Additional arguments (currently unused). |
A data frame of aggregated latent-predictor coefficient estimates.
Summary method for lpmec_multivariate_onerun objects
## S3 method for class 'lpmec_multivariate_onerun' summary(object, ...)## S3 method for class 'lpmec_multivariate_onerun' summary(object, ...)
object |
An object of class |
... |
Additional arguments (currently unused). |
A data frame of latent-predictor coefficient estimates.
Provides a summary of single-run LPMEC model results including OLS, IV, and corrected coefficient estimates.
## S3 method for class 'lpmec_onerun' summary(object, ...)## S3 method for class 'lpmec_onerun' summary(object, ...)
object |
An object of class |
... |
Additional arguments (currently unused). |
A data frame containing coefficient estimates and standard errors, returned invisibly. The data frame has rows for OLS, IV, Corrected IV, and Corrected OLS estimates.
lpmec_onerun, print.lpmec_onerun, plot.lpmec_onerun