Handling nominal exogenous variables in lavaan for SEM/CFA analysis

I’m working on a measurement model using lavaan. My model includes latent exogenous variables, a non-latent endogenous variable, and two nominal control variables: gender (GENDER) and family business background (FAM_BIZ).

GENDER has values ‘F’ and ‘M’, while FAM_BIZ has ‘no_biz’ and ‘has_biz’. Both are factors. Here’s my code:

model_spec <- '
  PERC =~ P1 + P2 + P3 + P4
  DRIVE =~ D1 + D2 + D3 + D4 + D5
  EFFORT =~ E1 + E2 + E3
  CORE =~ 1*PERC + 1*DRIVE + 1*EFFORT
  CORE ~~ CORE
  
  ATTG =~ A1 + A3 + A4 + A5
  ATTG ~~ ATTG
  
  SELFG =~ S1 + S2 + S3
  SELFG ~~ SELFG
  
  ATT2 =~ 1*A2
  ATT2 ~~ ATT2
  
  Gen =~ GENDER
  FamB =~ FAM_BIZ
'

fit <- cfa(model_spec, data = mydata)
summary(fit, fit.measures = TRUE, standardized = TRUE)

When I run this, I get warnings about not being able to compute standard errors and the latent variable covariance matrix not being positive definite. The model works fine if I remove GENDER and FAM_BIZ.

What might be causing this issue? Would it help if I converted GENDER and FAM_BIZ to numeric values (using 0/1 and 1/2)? Any suggestions are welcome!

Hey there DancingButterfly! :wave:

Ooh, I love a good SEM puzzle! Your model looks super interesting. Have you considered treating GENDER and FAM_BIZ as covariates instead of latent variables? That might help smooth things out.

Something like this could work:

model_spec <- '
  # Your existing latent variable definitions

  # Add these lines
  CORE ~ GENDER + FAM_BIZ
  ATTG ~ GENDER + FAM_BIZ
  SELFG ~ GENDER + FAM_BIZ
  ATT2 ~ GENDER + FAM_BIZ
'

This way, you’re controlling for their effects on your main constructs without trying to model them as latent variables. It’s usually a safer bet with categorical predictors.

Oh, and have you checked for any multicollinearity between your variables? Sometimes that can cause funky issues with model estimation.

What do you think? I’m super curious to hear how it goes if you try this approach! :blush:

I’ve faced similar challenges with nominal variables in SEM analyses. The issues often arise because treating categorical variables as latent constructs can lead to problems with estimation and interpretation. In my experience, it’s more effective to treat them as observed predictors. Converting them to numeric form by using effect coding rather than simple dummy coding has been beneficial. For instance, assigning values of -0.5 and 0.5 helps center the variable around the overall mean, often stabilizing the estimation process. Here is an example modification for your model:

model_spec <- '
  # Your latent variable definitions
  GENDER_EC := ifelse(GENDER == "F", -0.5, 0.5)
  FAM_BIZ_EC := ifelse(FAM_BIZ == "no_biz", -0.5, 0.5)

  CORE ~ GENDER_EC + FAM_BIZ_EC
'

This strategy has worked well for me and might resolve the issues you’re encountering.

hey DancingButterfly, nominal variables can be tricky in SEM. i’d suggest treating GENDER and FAM_BIZ as observed variables instead of latent. try this:

CORE ~ GENDER + FAM_BIZ

Also, consider dummy coding them (0/1) for easier interpretation. hope this helps! let me know if you need more info.