--- title: "Reliability analysis in tallieR" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Reliability analysis in tallieR} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include = FALSE} knitr::opts_chunk$set(collapse = TRUE, comment = "#>") set.seed(42) ``` tallieR provides two measures of internal consistency: Cronbach's alpha and McDonald's omega. This vignette explains what each measures, when to prefer one over the other, and how to use both functions. ## Why internal consistency matters When questionnaire items are summed or averaged into a scale score, internal consistency tells you how well those items measure the same underlying construct. Low consistency suggests the items may not belong together; high consistency is a prerequisite for treating the total score as meaningful. ## Cronbach's alpha Cronbach's alpha assumes that all items have *equal* factor loadings (tau-equivalence). Under that assumption, alpha is the expected correlation between the current scale and any other scale of the same length drawn from the same item pool. ```{r alpha, eval = FALSE} library(tallieR) study <- read_scoreme_dir("exports/") # All questionnaires in the study cronbach_alpha(study) # Specific subset cronbach_alpha(study, questionnaires = c("ess", "isi", "phq9")) ``` The output is a data frame with one row per questionnaire: | Column | Description | |---|---| | `questionnaire_id` | Questionnaire identifier | | `alpha` | Cronbach's alpha | | `ci_lower` / `ci_upper` | Exact 95% CI (Feldt et al., 1987) | | `n_items` | Number of numeric items used | | `n_obs` | Number of complete observations | | `note` | `NA` on success, or reason for failure | The confidence interval uses the exact F-distribution method of Feldt et al. (1987) rather than a bootstrap approximation. A wider interval reflects fewer participants, not a worse instrument. ### Interpreting alpha Conventional thresholds (Nunnally, 1978): | Alpha | Interpretation | |---|---| | < 0.60 | Poor | | 0.60 -- 0.70 | Questionable | | 0.70 -- 0.80 | Acceptable | | 0.80 -- 0.90 | Good | | >= 0.90 | Excellent (may indicate item redundancy) | These are rules of thumb, not hard cutoffs. Context matters: a screener with 3 items and alpha = 0.72 may be perfectly adequate for its purpose. ## McDonald's omega Omega relaxes the tau-equivalence assumption. It uses the factor loadings from a single-factor EFA to estimate the proportion of scale variance attributable to the common factor: $$\omega_t = \frac{(\sum \lambda_i)^2}{(\sum \lambda_i)^2 + \sum(1 - \lambda_i^2)}$$ When items have unequal loadings (which is the norm in psychological questionnaires), omega is a less biased estimate of reliability than alpha. Alpha systematically underestimates reliability for congeneric scales, and can overestimate it when items are highly correlated for reasons unrelated to the construct. ```{r omega, eval = FALSE} omega_reliability(study) omega_reliability(study, questionnaires = c("ess", "isi")) ``` Output columns: `questionnaire_id`, `omega`, `n_items`, `n_obs`, `note`. ## When to use which | Situation | Recommendation | |---|---| | Tau-equivalent items (equal loadings assumed) | Either; alpha is conventional | | Congeneric items (unequal loadings, typical) | Prefer omega | | Comparing against published norms that report alpha | Report both; flag the difference | | Small sample (< 30) | Alpha with exact CI; omega may not converge | | Reporting for publication | Report both with sample size and n items | ## Comparing alpha and omega side by side ```{r compare, eval = FALSE} alpha_res <- cronbach_alpha(study, questionnaires = c("ess", "isi", "phq9")) omega_res <- omega_reliability(study, questionnaires = c("ess", "isi", "phq9")) merge( alpha_res[, c("questionnaire_id", "alpha", "ci_lower", "ci_upper", "n_obs")], omega_res[, c("questionnaire_id", "omega")], by = "questionnaire_id" ) ``` ## Using an `items_long()` data frame directly Both functions accept either a study object or a data frame produced by `items_long()`. This is useful when you want to filter to a specific group or time point before computing reliability: ```{r items-input, eval = FALSE} items <- items_long(study) # Only control group control_items <- items[items$group == "control", ] cronbach_alpha(control_items) # Only baseline session baseline_items <- items[items$session == "baseline", ] omega_reliability(baseline_items) ``` ## Handling non-numeric items Some instruments include items that cannot be coerced to numeric --- MCTQ clock times, STOP-BANG yes/no responses. These are silently dropped before estimation. The `n_items` column in the output tells you how many numeric items were actually used, so you can detect if unexpected items were dropped. ## Failure modes Questionnaires that cannot be estimated return `NA` with an explanatory note: | Situation | Note | |---|---| | Fewer than 2 numeric items | "Need at least 2 numeric items." | | Fewer than 2 complete observations | "Need at least 2 complete observations." | | Zero variance in row totals (alpha) | "Zero variance in row totals." | | More items than observations (omega) | "More items than observations; covariance matrix is singular." | | EFA non-convergence (omega) | "Factor analysis did not converge." | ## References Cronbach, L. J. (1951). Coefficient alpha and the internal structure of tests. *Psychometrika*, 16(3), 297--334. Feldt, L. S., Woodruff, D. J., & Salih, F. A. (1987). Statistical inference for coefficient alpha. *Applied Psychological Measurement*, 11(1), 93--103. McDonald, R. P. (1999). *Test theory: A unified treatment*. Lawrence Erlbaum Associates. Nunnally, J. C. (1978). *Psychometric theory* (2nd ed.). McGraw-Hill. Revelle, W., & Zinbarg, R. E. (2009). Coefficients alpha, beta, omega, and the glb: Comments on Sijtsma. *Psychometrika*, 74(1), 145--154.