Implementing advanced confirmatory factor analysis models in lavaan package

Ava_Books · April 12, 2025, 1:19am

I’m working on a project to replicate a complex CFA model from a psychology paper. The model has 9 observed variables linked to 3 latent variables. I’ve tried two approaches in lavaan but both have issues.

Here’s a simplified version of what I’m trying to do:

# Create mock data
data <- data.frame(
  task1 = rnorm(100),
  task2 = rnorm(100),
  task3 = rnorm(100),
  task4 = rnorm(100),
  task5 = rnorm(100)
)

# Attempt 1: Using cfa()
model1 <- 'factor1 =~ task1 + task2 + task3
           factor2 =~ task3 + task4 + task5'
result1 <- cfa(model1, data = data)

# Attempt 2: Using sem()
model2 <- '
  level: 1
  factor1 =~ task1 + task2 + task3
  level: 2
  factor2 =~ task3 + task4 + task5'
result2 <- sem(model2, data = data)

The first attempt gives errors about standard errors, while the second asks for a ‘cluster’ argument I don’t understand.

Can someone explain:

Which approach is better for this type of model?
How to fix the standard error issue in the first attempt?
What’s the deal with the ‘cluster’ argument in the second attempt?
Is there a better way to set up this kind of model in lavaan?

Thanks for any help!

ExploringOcean · April 20, 2025, 9:35am

hey ava, i used lavaan before. i think your cfa() approach is more apt. standard error issues could be mock data problems - try real data or more obs. the cluster arg is for multilevel analysis. stick with cfa() and adjust your specs.

Hugo_Storm · April 18, 2025, 1:23am

I’ve worked extensively with lavaan for CFA models, and I can offer some insights. For your case, the cfa() approach is indeed more suitable. The standard error issues you’re encountering are likely due to the simulated data lacking the necessary structure or having insufficient observations. Try increasing your sample size to at least 200-300 observations.

Regarding the sem() function, it’s more suited for complex structural equation models. The ‘cluster’ argument is used for multilevel or clustered data structures, which doesn’t seem applicable here.

To improve your model, consider specifying factor covariances and error terms explicitly. Also, ensure your model is identified by fixing one loading per factor to 1. Here’s a refined approach:

model <- '
  # Factor definitions
  factor1 =~ 1*task1 + task2 + task3
  factor2 =~ 1*task3 + task4 + task5
  
  # Factor covariances
  factor1 ~~ factor2
'

result <- cfa(model, data = data)

This should provide a more stable estimation. Remember to check model fit indices and modification indices for potential improvements.

GracefulDancer8 · April 17, 2025, 4:19am

Hey there Ava_Books!

I’ve dabbled with lavaan a bit, and your question got me really curious. Have you considered trying out the lavaan syntax for measurement invariance? It might be a good fit for what you’re trying to do!

Something like this could work:

model <- '
  # Define factors
  factor1 =~ NA*task1 + task2 + task3
  factor2 =~ NA*task3 + task4 + task5

  # Set factor variances to 1 for identification
  factor1 ~~ 1*factor1
  factor2 ~~ 1*factor2

  # Allow factors to correlate
  factor1 ~~ factor2
'

result <- cfa(model, data = data)

What do you think? This approach lets you specify cross-loadings (like task3 loading on both factors) and sets the factor variances to 1 for identification.

Have you tried playing around with the sample size? Sometimes that can help with those pesky standard error issues.

Oh, and I’m super curious - what’s the psychology paper you’re replicating? Sounds fascinating!