Exploratory Factor Analysis

Exploratory Factor Analysis (EFA) is a powerful statistical method for discovering the hidden, unobserved constructs that explain why your measured variables are correlated. In fields from psychology to market research, it is the essential tool for moving from a long list of survey questions to a concise, meaningful set of underlying dimensions, thereby validating new measurement scales and refining theories.

Understanding Latent Constructs and Correlation

At its heart, EFA is built on a simple but profound idea: observed variables (like your survey items or test questions) are often correlated because they are all influenced by one or more latent variables, or factors. You cannot directly measure a factor like "job satisfaction," "anxiety," or "brand loyalty." Instead, you measure things you can observe—responses to specific statements—that you believe are manifestations of that latent construct. EFA helps you test that belief. The fundamental hypothesis is that the pattern of correlations among your many observed variables can be explained by their shared relationships with a much smaller number of these underlying factors. Your goal is to uncover that simpler structure.

Factor Extraction: PAF vs. Maximum Likelihood

Once you have a correlation matrix, the next step is factor extraction—the mathematical process of estimating the underlying factors. The two most common extraction methods are principal axis factoring (PAF) and maximum likelihood (ML). PAF focuses solely on the shared variance among variables (the communality), ignoring the unique variance attributed to error or item-specific traits. It is a robust, less assumption-laden method often preferred for initial exploration. Maximum likelihood, in contrast, assumes your data follows a multivariate normal distribution. Its major advantage is that it provides statistical tests (like the chi-square test of model fit) to help evaluate how well your factor model explains the observed correlations. While ML is powerful for hypothesis testing, PAF is frequently the go-to choice for purely exploratory work where you have minimal prior expectations.

Determining the Number of Factors: Eigenvalues and Scree Plots

A critical and often subjective decision in EFA is choosing how many factors to retain. Two primary tools guide this choice: eigenvalues and scree plots. An eigenvalue represents the amount of variance in all the observed variables that is accounted for by a given factor. The classic Kaiser criterion suggests retaining factors with eigenvalues greater than 1, as each observed variable contributes 1 unit of variance, so a factor should explain at least as much variance as a single variable. A more visual method is the scree plot, which graphs eigenvalues in descending order. You look for the "elbow"—the point where the curve bends and begins to flatten—and retain the factors above that point. In practice, researchers use these rules in tandem, alongside theoretical interpretability, to make a final decision. For example, you might extract four factors where three have eigenvalues above 1.5 and the scree plot shows a clear elbow after the third point, prompting you to test both three- and four-factor solutions.

Factor Rotation for Clear Interpretation

The initial, unrotated factor solution is often mathematically optimal but difficult to interpret because factors can be correlated with many variables. Rotation is a transformation applied to the factor axes to achieve a simple structure, where each variable loads highly on one factor and has near-zero loadings on others. There are two main families of rotation. Orthogonal rotation (like Varimax) assumes the factors are uncorrelated. It maximizes the variance of squared loadings, making high loadings higher and low loadings lower, which simplifies column interpretation. Oblique rotation (like Promax or Direct Oblimin) allows factors to correlate, which is often a more realistic assumption in social science research. It simplifies the pattern of loadings, making the structure clearer while acknowledging that constructs like "depression" and "anxiety" are likely related. You choose orthogonal rotation when you theoretically expect independent factors and oblique when you expect them to be correlated.

Interpreting Factor Loadings and Building Scales

Common Pitfalls

Inadequate Sample Size and Missing Data. EFA requires a substantial sample for stable results. A rough guideline is a minimum of 10 participants per observed variable, with an absolute minimum of 100-200 overall. Running EFA on a small sample or with many missing values can produce unreliable factor structures that won't replicate. Always address missing data appropriately (e.g., through imputation) before analysis and ensure your sample is sufficiently large.

Misapplying Extraction Methods. Using principal components analysis (PCA) when you should use PAF or ML is a common error. PCA is a data reduction technique that does not differentiate between shared and unique variance; it is not a true factor analysis model. If your goal is to identify latent constructs that explain covariation, use PAF or ML, not PCA.

Forcing an Incorrect Number of Factors. Blindly following the "eigenvalue > 1" rule or ignoring the scree plot can lead to over- or under-extraction. Retaining too many factors captures noise, while retaining too few forces conceptually distinct items together. Always cross-validate the statistical rules with theoretical sense and interpretability.

Ignoring Cross-Loadings and Poor Items. Items that load moderately (e.g., around 0.3-0.4) on multiple factors (cross-loadings) or fail to load strongly (e.g., below 0.3) on any factor are problematic. They muddy the clarity of your factors. In scale development, these items are candidates for revision or removal in subsequent analyses to achieve a clean, simple structure.

Summary

Exploratory Factor Analysis is a foundational method for identifying the latent constructs that explain the correlations among a set of observed variables, crucial for scale development and theory building.
Researchers determine the number of factors to retain using statistical heuristics like eigenvalues greater than 1 and the visual scree plot, balanced by theoretical interpretability.
Factor extraction via principal axis factoring or maximum likelihood estimates the underlying factor structure, after which rotation (orthogonal or oblique) is applied to achieve a clear, interpretable pattern of factor loadings.
The final loading matrix reveals which items cluster together, allowing you to name the factors and validate your measurement construct, while also highlighting poorly performing items for refinement.

Exploratory Factor Analysis

Exploratory Factor Analysis

Understanding Latent Constructs and Correlation

Factor Extraction: PAF vs. Maximum Likelihood

Determining the Number of Factors: Eigenvalues and Scree Plots

Factor Rotation for Clear Interpretation

Interpreting Factor Loadings and Building Scales

Common Pitfalls

Summary

Write better notes with AI