Difference between CFA and SEM functions in R's lavaan package

I’m confused about the cfa() and sem() functions in R’s lavaan package. I checked their help pages and couldn’t find any differences. When I run the same model with both functions, I get identical results. Here’s a quick example:

model <- '
  ind60 =~ x1 + x2 + x3
  dem60 =~ y1 + a*y2 + b*y3 + c*y4
  dem65 =~ y5 + a*y6 + b*y7 + c*y8

  dem60 ~ ind60
  dem65 ~ ind60 + dem60

  y1 ~~ y5
  y2 ~~ y4 + y6
  y3 ~~ y7
  y4 ~~ y8
  y6 ~~ y8
'

sem_fit <- sem(model, data=PoliticalDemocracy)
cfa_fit <- cfa(model, data=PoliticalDemocracy)

# Both give same estimates
parameterEstimates(sem_fit)
parameterEstimates(cfa_fit)

Can someone explain why there are two separate functions if they do the same thing? Is there any hidden difference I’m missing?

Hey there, LeoNinja22! :wave:

That’s a great question you’ve got there. I’ve actually been wondering about this myself!

You’re right that cfa() and sem() can sometimes give identical results, which can be pretty confusing. From what I understand, the main difference lies in their intended use and some behind-the-scenes stuff.

cfa() is specifically designed for Confirmatory Factor Analysis models. It’s like the specialized tool in the toolbox, you know? On the other hand, sem() is more of a Swiss Army knife – it can handle a wider range of structural equation models, including those with regressions between latent variables.

In your example, you’ve got both measurement model stuff (the factor loadings) and structural parts (the regressions between latent variables). That’s why both functions work and give the same results. The sem() function is just being flexible and handling the CFA part too.

But here’s where it gets interesting – have you tried using cfa() with a model that has only regressions between observed variables? I’m curious if it would work or throw an error. Maybe that’s where we’d see a difference?

Also, I wonder if there are any performance differences between the two when dealing with really large, complex models. Have you noticed anything like that in your work?

What kind of models are you usually working with? I’d love to hear more about your research!

yo, LeoNinja22! i’ve used both funcs and noticed something. sem() is more versatile, handling various models. cfa() is specialized for factor analysis. in ur case, the model fits both criteria. but try a pure regression model with cfa() - it might not work. sem() is like ur swiss army knife, while cfa() is ur specialized tool. hope that helps clarify things!

I’ve encountered this confusion before in my statistical analyses. The key distinction lies in their intended use. While sem() is more versatile and can handle a wide range of structural equation models, cfa() is specifically tailored for confirmatory factor analysis.

In your example, the model includes both measurement components (factor loadings) and structural relationships (regressions between latent variables). This dual nature allows both functions to process it successfully.

However, try running a pure regression model through cfa() - it’s likely to throw an error. This limitation highlights cfa()'s specialized nature.

From my experience, using the appropriate function can sometimes lead to slight differences in optimization or computational efficiency, especially with complex models. It’s generally good practice to use cfa() for pure measurement models and sem() for more comprehensive structural models.