---
title: "Getting started with R-itable"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Getting started with R-itable}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment  = "#>"
)
```

```{r setup}
library(Ritable)
```

## Overview

**R-itable** estimates narrow-sense heritability (h²) for quantitative traits in
family cohort studies using a profile-likelihood variance-components approach,
without requiring SOLAR Eclipse or any proprietary software.

The three main steps are:

1. Build the additive genetic relationship matrix (GRM) from a pedigree.
2. Estimate heritability for one trait with `herit_vc()`.
3. Scale to many traits with `herit_batch()`, then visualise with `plot_forest()`.

---

## 1. Build the GRM

`build_grm()` takes a pedigree data frame and returns an **additive genetic
relationship matrix** (GRM, also written **A**). The GRM encodes how much
genetic material any two individuals share due to common ancestry, and is the
cornerstone of the variance-components model.

The pedigree needs four columns:

| Column | Content |
|--------|---------|
| `id` | Individual ID |
| `pat` | Father's ID (`0` or `NA` for founders) |
| `mom` | Mother's ID (`0` or `NA` for founders) |
| `sex` | `1` = male, `2` = female |

Importantly, the pedigree should contain **everyone** — both the study subjects
you want to analyse *and* their parents/grandparents (founders), even if the
founders have no phenotype data. Including founders lets `kinship2` trace
shared ancestry correctly.

In the example below, IDs 1–4 are two unrelated founder couples. IDs 5–6 are
offspring of couple 1×2, and IDs 7–8 are offspring of couple 3×4. The study
subjects are IDs 5–8.

```{r build-grm}
ped <- data.frame(
  id  = 1:8,
  pat = c(0, 0, 0, 0, 1, 1, 3, 3),  # founders have pat = 0
  mom = c(0, 0, 0, 0, 2, 2, 4, 4),  # founders have mom = 0
  sex = c(1, 2, 1, 2, 1, 2, 1, 2)
)

A <- build_grm(ped, study_ids = 5:8)
round(A, 3)
```

Reading the matrix:

- **Diagonal = 1** for non-inbred individuals (each person shares 100% of
  their genome with themselves).
- **5 & 6 share 0.5** — they are full siblings (same parents 1×2), so they
  share on average half their segregating alleles.
- **7 & 8 share 0.5** — full siblings of couple 3×4, same logic.
- **5 & 7 share 0** — their parents come from entirely different founders,
  so they are genetically unrelated in this pedigree.

This block-diagonal pattern is typical of a study with multiple independent
families. In a real cohort the pedigree is often more complex (multi-generation,
distant relatives), but the interpretation is the same: larger off-diagonal
values mean more shared ancestry.

---

## 2. Estimate heritability for a single trait

```{r single-trait}
set.seed(42)

# Simulate some phenotype data for the study subjects
dat <- data.frame(
  IID     = 5:8,
  age     = c(35, 38, 40, 44),
  sex_num = c(1, 2, 1, 2),
  bmi     = c(24.1, 27.3, 22.8, 29.5)
)

# Unadjusted model
res_unadj <- herit_vc("bmi", grm = A, data = dat, min_n = 3)

# Adjusted model (age + sex)
res_adj <- herit_vc("bmi", grm = A, data = dat,
                    covs  = c("age", "sex_num"),
                    label = "bmi_adj",
                    min_n = 3)

str(res_unadj)
```

The returned list contains:

| Field | Description |
|-------|-------------|
| `h2` | MLE narrow-sense heritability |
| `se` | SE from profile-LL curvature |
| `ci_lo`, `ci_hi` | 95% profile-likelihood CI |
| `pval` | One-sided LRT p-value (boundary-corrected) |
| `var_covariates` | Variance explained by fixed-effect covariates (R²); `NA` for unadjusted models |
| `sigma2_a` | Additive genetic variance (σ²_g) |
| `sigma2_e` | Residual environmental variance (σ²_e) |

---

## 3. Batch estimation

For real analyses you will typically run many traits across several covariate
models. `herit_batch()` handles this and returns a tidy data frame.

```{r batch}
# Add a second trait
dat$hdl <- c(55.0, 60.2, 48.3, 72.1)

covs_list <- list(
  unadj = NULL,
  cov1  = c("age", "sex_num")
)

res <- herit_batch(
  traits    = c("bmi", "hdl"),
  grm       = A,
  data      = dat,
  covs_list = covs_list,
  min_n     = 3,
  .progress = FALSE
)

res
```

---

## 4. Forest plot

```{r forest, fig.width=7, fig.height=3}
if (requireNamespace("ggplot2", quietly = TRUE)) {
  plot_forest(res,
              model_filter  = "cov1",
              colour_by     = "trait",
              title         = "Adjusted heritability (age + sex)")
}
```

---

## 5. Colour palette

`Ritable_colours` is exported for consistent figure styling across your
analysis. It contains the four brand colours used in `plot_forest()` by default:

```{r colours}
Ritable_colours
```

You can use them directly in ggplot2 calls:

```{r colours-example, eval=FALSE}
ggplot2::scale_colour_manual(values = Ritable_colours[c("blue", "pink")])
```

---

## 6. Realistic workflow

Here we simulate a small multi-family cohort — 80 nuclear families, 2 parents
+ 3 offspring each — with three phenotypes that have different true h² values.
This mirrors what you would do with real pedigree and phenotype CSV files.

```{r real-workflow}
library(Ritable)

# ---- 1. Simulate pedigree ---------------------------------------------------
set.seed(2026)
n_fam <- 80

ped <- do.call(rbind, lapply(seq_len(n_fam), function(f) {
  base <- (f - 1L) * 5L
  data.frame(
    id  = base + 1:5,
    pat = c(0L, 0L, base + 1L, base + 1L, base + 1L),
    mom = c(0L, 0L, base + 2L, base + 2L, base + 2L),
    sex = c(1L, 2L, sample(1:2, 3, replace = TRUE))
  )
}))

# ---- 2. Simulate phenotypes for offspring -----------------------------------
# Three traits with approximate h2: bmi ~0.6, sbp ~0.35, hdl ~0.5
simulate_trait <- function(ped, n_fam, h2, seed) {
  set.seed(seed)
  genetic  <- rep(rnorm(n_fam, 0, sqrt(h2)),       each = 5)
  residual <- rnorm(nrow(ped),          0, sqrt(1 - h2))
  genetic + residual
}

pheno_all <- data.frame(
  bmi = simulate_trait(ped, n_fam, h2 = 0.60, seed = 1),
  sbp = simulate_trait(ped, n_fam, h2 = 0.35, seed = 2),
  hdl = simulate_trait(ped, n_fam, h2 = 0.50, seed = 3)
)

off_rows <- ped$pat != 0L
dat <- data.frame(
  IID     = ped$id[off_rows],
  age     = round(runif(sum(off_rows), 20, 75)),
  sex_num = ped$sex[off_rows],
  bmi     = pheno_all$bmi[off_rows],
  sbp     = pheno_all$sbp[off_rows],
  hdl     = pheno_all$hdl[off_rows]
)
dat$age2 <- dat$age^2

# ---- 3. Build GRM once — reuse for all models -------------------------------
A <- build_grm(ped, study_ids = dat$IID)

# ---- 4. Define covariate models ---------------------------------------------
covs_list <- list(
  unadj = NULL,
  cov1  = c("age", "sex_num"),
  cov2  = c("age", "sex_num", "age2")
)

# ---- 5. Batch estimation ----------------------------------------------------
res <- herit_batch(
  traits    = c("bmi", "sbp", "hdl"),
  grm       = A,
  data      = dat,
  covs_list = covs_list
)

res
```

```{r real-forest, fig.width=7, fig.height=4}
if (requireNamespace("ggplot2", quietly = TRUE)) {
  plot_forest(res, model_filter = "cov2",
              title = "Heritability estimates (age + sex + age\u00b2)")
}
```

---

## 7. Citation

If you use **R-itable** in a publication, please cite:

```
Franca, L. & Leocadio-Miguel, M. (2026). R-itable: Pedigree-Based Heritability
Estimation for Family Cohort Studies. R package version 0.1.0.
https://github.com/circadia-bio/R-itable
```