Performing Confirmatory Factor Analysis manually using Maximum Likelihood Estimation

I’m trying to do a Confirmatory Factor Analysis (CFA) by hand using Maximum Likelihood Estimation (MLE). The main equation I’m working with is the model-implied covariance matrix:

Sigma = LL^T + E

L is the loadings matrix and E is the error matrix. I want to use MLE to get estimates for L and E that minimize the difference between the implied and observed covariance matrices.

The function to minimize is:

F = log |Sigma| + tr(S*Sigma^-1) - log|S| - p

Sigma is the implied covariance matrix, S is the observed covariance matrix, and p is the number of indicator items.

I’ve set up my observed covariance matrix and created a discrepancy function in R. Now I’m stuck on how to get the ML estimates for L and E. I’ve tried using the mle function from the stats4 package, but it’s not working.

Here’s my current code for the discrepancy function:

discrepancy <- function(covar, L, E) {
  sigma <- L %*% t(L) + E
  log(det(sigma)) +
  sum(diag(covar %*% solve(sigma))) -
  log(det(covar)) -
  nrow(covar)
}

How can I use this to find the L (column vector) and E (diagonal matrix) estimates that minimize the discrepancy value? I’d like to use my starting values for L and E in the process.

Any help would be appreciated!

Hey there, fellow stats enthusiast! :wave:

Wow, tackling CFA by hand using MLE? That’s some next-level dedication! I’m honestly impressed. I’ve always relied on software for this kind of analysis, but your approach is super interesting.

Have you considered using an optimization algorithm like Newton-Raphson or gradient descent? These might help you find the minimum of your discrepancy function. You could potentially use the optim() function in R for this.

Just thinking out loud here, but what if you tried something like:

optimResult <- optim(par = c(startingL, startingE), 
                     fn = discrepancy, 
                     method = "BFGS", 
                     covar = observedCovarMatrix)

This is just a rough idea, of course. You’d need to adjust the function to work with a single parameter vector and probably add some constraints.

By the way, what made you decide to do this manually instead of using a package like lavaan? I’m genuinely curious about your motivation. Is it for learning purposes or do you have a specific research goal in mind?

Also, have you run into any issues with local minima? I imagine that could be tricky when optimizing this kind of function.

Keep us posted on how it goes! This is a really cool project.

wow, that’s some heavy math stuff! have u tried using the optim() function in R? it might help find the minimum for ur discrepancy function. just a thought, but maybe somethin like:

optimResult ← optim(par = c(startL, startE), fn = discrepancy, method = “BFGS”, covar = obsCovarMatrix)

let us know how it goes!

As someone who’s worked extensively with CFA, I commend your effort to perform it manually using MLE. It’s a challenging but rewarding endeavor. For optimizing your discrepancy function, consider using the nlminb() function in R. It’s particularly well-suited for constrained optimization problems like CFA.

Here’s a potential approach:

nlminb(start = c(initial_L, initial_E), objective = discrepancy, lower = c(rep(0, length(initial_L)), rep(0.01, length(initial_E))), upper = c(rep(1, length(initial_L)), rep(Inf, length(initial_E))))

This setup allows you to set bounds on your parameters, which can be crucial for obtaining meaningful results. The lower bound on E elements prevents Heywood cases.

Remember to adjust your discrepancy function to work with a single parameter vector. Also, consider using multiple starting values to avoid local minima issues. Good luck with your analysis!