Trouble with multi-group CFA for Likert scale data: low frequency categories

CreativeChef15 · April 3, 2025, 7:18am

I’m stuck on a multi-group CFA for a psych tool using Likert scale data. I made the data ordinal and used WLSMV estimation with theta parameterization. Here’s a snippet of my code:

model_psych <- '
  mood =~ q1_ord + q2_ord + q3_ord + q4_ord + q5_ord + q6_ord
  worry =~ q7_ord + q8_ord + q9_ord + q10_ord + q11_ord + q12_ord
  physical =~ q13_ord + q14_ord + q15_ord + q16_ord + q17_ord + q18_ord
  mood~~worry
  worry~~physical
  mood~~physical
'

fit_psych <- cfa(model_psych, data=survey_data, estimator='WLSMV', std.lv=TRUE, parameterization='theta')

The initial CFA works, but I get a warning about the variance-covariance matrix. The real problem starts when I try to do group analysis:

group_cfa <- cfa(model_psych, data=survey_data, estimator='WLSMV', std.lv=TRUE, group='gender', parameterization='theta')

It errors out saying some categories are empty for a variable in one group. Oddly, using a different grouping variable gives the same error but still runs. Any ideas on why this happens or how to fix it? I’m really confused about the inconsistent behavior.

Ethan85 · April 14, 2025, 11:16am

Ordinal data in multi-group CFA often introduce issues when some response categories are scarcely populated. The warnings you see can arise because the WLSMV estimator is especially sensitive to sparse data. It may be useful to examine the distribution of responses and consider combining infrequent categories without losing much conceptual meaning. Alternatively, you might experiment with another estimator, such as robust maximum likelihood, which treats the data as continuous. Finally, ensure that your sample sizes within each group are sufficient to support reliable estimation across all categories.

Mia_79Dance · April 13, 2025, 9:03am

hey creativechef15, ive had similar issues w/ likert data. have u tried checking the response distributions for each group? sometimes collapsing categories can help. also, maybe try a different estimator like MLR? it might handle sparse data better. good luck w/ ur CFA!

SwimmingDolphin · April 12, 2025, 12:18am

Hey CreativeChef15, your CFA situation sounds pretty tricky! I’ve run into similar headaches with Likert data before. Have you thought about checking the frequency tables for each item, split by gender? Sometimes there are sneaky empty cells hiding in there that throw everything off.

What if you tried collapsing some of the response categories? Like maybe combining the extreme ends if they’re rarely used? It might sacrifice a tiny bit of nuance, but could solve your empty category problem.

Oh, and I’m super curious about that inconsistent behavior you mentioned with different grouping variables. Any chance you could share which other variable you tried? It might give us a clue about what’s going on under the hood.

Hang in there! CFAs can be a real pain, but I bet we can figure this out if we put our heads together. Let me know if you want to bounce around any other ideas!