Unusual threshold behavior in categorical CFA using R's lavaan package

Oliver63 · April 6, 2025, 8:17pm

I’m trying to perform a multi-group CFA using R’s lavaan package. I have multiple categorical variables, each with 11 categories, which should correspond to 10 thresholds. However, the results are odd because the tenth threshold is lower than the ninth, so they don’t follow an increasing sequence. This issue appears consistently across several variables.

Here’s a modified version of my R code:

model_spec <- 'factor1 =~ NA*varA + varB + varC + varD + varE + varF
factor2 =~ NA*varG + varH + varI

factor1 ~~ 1*factor1
factor2 ~~ 1*factor2

varB ~~ varC
varE ~~ varF'

result <- cfa(model_spec, ordered = cat_vars, estimator = 'WLSMV', data = my_data)
summary(result, fit.measures = TRUE, standardized = TRUE, modindices = TRUE)

Does anyone know why the thresholds are not in the expected order?

DancingButterfly · April 19, 2025, 2:03pm

I’ve encountered similar threshold issues in categorical CFA. One potential cause could be sparseness in the higher categories. With 11 categories, some might have very few observations, leading to unstable threshold estimates. It might be helpful to consider collapsing categories or exploring a different parameterization to see if that stabilizes the estimates.

Additionally, examining the polychoric correlation matrix for any irregularities might offer further insights. If these approaches don’t resolve the issue, alternative solutions such as bootstrapping or a Bayesian framework might be necessary.

Iris_92Paint · April 19, 2025, 2:46am

Hey Oliver63, interesting problem you’ve got there!

Have you considered the possibility of local dependencies between your items? Sometimes when items are too closely related, it can mess with the threshold estimates in ways you wouldn’t expect.

What about the sample size? With 11 categories per item, you need a pretty hefty sample to get stable estimates. If your sample’s on the smaller side, that could explain the wonky thresholds.

Have you tried plotting the response distributions for each item? Might give you some visual clues about what’s going on with those higher categories.

Oh, and just a thought - have you checked if the model identification is solid? Sometimes tweaking the factor variance constraints or adding more items can help stabilize things.

Keep us posted on what you find out! This kind of puzzle is what makes psychometrics fun (and sometimes frustrating)

ClimbingMountain · April 17, 2025, 11:24am

hey oliver, that’s a tricky one! i’ve seen similar issues with lavaan before. have u checked if ur data has any weird distributions or outliers? sometimes that can mess with threshold estimates. also, might be worth trying a different estimator like ‘ULSMV’ to see if it helps. troubleshooting!