I’m working on a measurement model using lavaan. My model includes latent exogenous variables, a non-latent endogenous variable, and two nominal control variables: gender (GENDER) and family business background (FAM_BIZ).
GENDER has values ‘F’ and ‘M’, while FAM_BIZ has ‘no_biz’ and ‘has_biz’. Both are factors. Here’s my code:
model_spec <- '
PERC =~ P1 + P2 + P3 + P4
DRIVE =~ D1 + D2 + D3 + D4 + D5
EFFORT =~ E1 + E2 + E3
CORE =~ 1*PERC + 1*DRIVE + 1*EFFORT
CORE ~~ CORE
ATTG =~ A1 + A3 + A4 + A5
ATTG ~~ ATTG
SELFG =~ S1 + S2 + S3
SELFG ~~ SELFG
ATT2 =~ 1*A2
ATT2 ~~ ATT2
Gen =~ GENDER
FamB =~ FAM_BIZ
'
fit <- cfa(model_spec, data = mydata)
summary(fit, fit.measures = TRUE, standardized = TRUE)
When I run this, I get warnings about not being able to compute standard errors and the latent variable covariance matrix not being positive definite. The model works fine if I remove GENDER and FAM_BIZ.
What might be causing this issue? Would it help if I converted GENDER and FAM_BIZ to numeric values (using 0/1 and 1/2)? Any suggestions are welcome!
Hey there DancingButterfly! 
Ooh, I love a good SEM puzzle! Your model looks super interesting. Have you considered treating GENDER and FAM_BIZ as covariates instead of latent variables? That might help smooth things out.
Something like this could work:
model_spec <- '
# Your existing latent variable definitions
# Add these lines
CORE ~ GENDER + FAM_BIZ
ATTG ~ GENDER + FAM_BIZ
SELFG ~ GENDER + FAM_BIZ
ATT2 ~ GENDER + FAM_BIZ
'
This way, you’re controlling for their effects on your main constructs without trying to model them as latent variables. It’s usually a safer bet with categorical predictors.
Oh, and have you checked for any multicollinearity between your variables? Sometimes that can cause funky issues with model estimation.
What do you think? I’m super curious to hear how it goes if you try this approach! 
I’ve faced similar challenges with nominal variables in SEM analyses. The issues often arise because treating categorical variables as latent constructs can lead to problems with estimation and interpretation. In my experience, it’s more effective to treat them as observed predictors. Converting them to numeric form by using effect coding rather than simple dummy coding has been beneficial. For instance, assigning values of -0.5 and 0.5 helps center the variable around the overall mean, often stabilizing the estimation process. Here is an example modification for your model:
model_spec <- '
# Your latent variable definitions
GENDER_EC := ifelse(GENDER == "F", -0.5, 0.5)
FAM_BIZ_EC := ifelse(FAM_BIZ == "no_biz", -0.5, 0.5)
CORE ~ GENDER_EC + FAM_BIZ_EC
'
This strategy has worked well for me and might resolve the issues you’re encountering.
hey DancingButterfly, nominal variables can be tricky in SEM. i’d suggest treating GENDER and FAM_BIZ as observed variables instead of latent. try this:
CORE ~ GENDER + FAM_BIZ
Also, consider dummy coding them (0/1) for easier interpretation. hope this helps! let me know if you need more info.