POLMM (Proportional Odds Logistic Mixed Model) implements association tests for ordinal categorical phenotypes while accounting for sample relatedness.
Key Features:
- Handles unbalanced phenotypic distributions
- More powerful than treating ordinal traits as binary or quantitative
- Support both dense GRM and sparse GRM to adjust for sample relatedness
- Accurate p-values for low-frequency and rare variants
- Scales to biobank-size datasets
- Support both single-variant analysis and set-based analysis
Citation:
Bi et al. (2021). Efficient mixed model approach for large-scale genome-wide association studies of ordinal categorical phenotypes. American Journal of Human Genetics. doi:10.1016/j.ajhg.2021.03.019
POLMM-GENE extends POLMM to perform set-based association tests for rare variants in genomic regions (e.g., genes). It is particularly powerful for exome sequencing data.
Key Features:
- SKAT, SKAT-O, and Burden tests
- Annotation-based variant grouping
- Cauchy combination for multiple tests
- Handles ultra-rare variants (MAC < threshold)
Citation:
Bi et al. (2023). Scalable mixed model methods for set-based association studies on large-scale categorical data analysis and its application to exome-sequencing data in UK Biobank. American Journal of Human Genetics. doi:10.1016/j.ajhg.2023.03.010
Step 1: Model Fitting and Preprocessing
This step is shared by both marker- and region-level analyses and returns the obj.POLMM object required for step two. See ?GRAB.NullModel and ?GRAB.POLMM for detailed parameter instructions. A quick example is provided below.
# Load data
GenoFile <- system.file("extdata", "simuPLINK.bed", package = "GRAB")
SparseGRMFile <- system.file("extdata", "SparseGRM.txt", package = "GRAB")
OutputFile <- file.path(tempdir(), "resultPOLMMmarker.txt")
PhenoFile <- system.file("extdata", "simuPHENO.txt", package = "GRAB")
PhenoData <- data.table::fread(PhenoFile, header = TRUE)
PhenoData$OrdinalPheno <- factor(PhenoData$OrdinalPheno, levels = c(0, 1, 2))
# Step 1
obj.POLMM <- GRAB.NullModel(
OrdinalPheno ~ AGE + GENDER,
data = PhenoData,
subjIDcol = "IID",
method = "POLMM",
traitType = "ordinal",
GenoFile = GenoFile,
SparseGRMFile = SparseGRMFile
)
Notes:
GenoFileis always required for POLMM to estimate the variance ratio parameter- If
SparseGRMFileis not provided, a dense GRM is calculated fromGenoFile; if given, the sparse GRM is used for model fitting - Phenotype must be coded as an ordered factor
- Missing phenotypes must be coded as
NA obj.POLMMcontains the data structure for step 2
obj.POLMM Components
The POLMM null model object contains:
N: Sample sizesubjData: Subject IDsM: Number of ordinal categoriestau: Variance component estimatebeta: Covariate effect estimateseps: Threshold parametersbVec: Random effect estimateseta: Linear predictoryVec: Phenotype matrix (1 col)Cova: Design matrixmuMat: Fitted probabilitiesYMat: Category indicator matrix
Step 2 of Marker-Level Analysis
Refer to ?GRAB.Marker and ?GRAB.POLMM for detailed parameter instructions. A quick example is provided below.
# Step 2
GRAB.Marker(obj.POLMM, GenoFile, OutputFile,
control = list(ifOutGroup = TRUE))
# View results
head(data.table::fread(OutputFile))
Output Columns:
Standard columns:
Marker: Variant identifier (rsID or CHR:POS:REF:ALT)Info: Variant information (CHR:POS:REF:ALT format)AltFreq: Alternative allele frequencyAltCounts: Alternative allele countMissingRate: Proportion of missing genotypesPvalue: Association p-valuebeta: Effect size estimate (log-odds scale)seBeta: Standard error of betazScore: Z-score from score test
Additional columns (ifOutGroup = TRUE):
AltFreqInGroup.1,AltFreqInGroup.2, …: Allele frequency in each ordinal categoryAltCountsInGroup.1,AltCountsInGroup.2, …: Allele counts in each ordinal categorynSamplesInGroup.1,nSamplesInGroup.2, …: Sample size in each ordinal category
Step 2 of Region-Level Analysis
Refer to ?GRAB.Region and ?GRAB.POLMM.Region for detailed parameter instructions. A quick example is provided below. The obj.POLMM object generated in Step 1: Model Fitting and Preprocessing is needed.
# Load data
GenoFileRV <- system.file("extdata", "simuPLINK_RV.bed", package = "GRAB")
SparseGRMFile <- system.file("extdata", "SparseGRM.txt", package = "GRAB")
GroupFile <- system.file("extdata", "simuPLINK_RV.group", package = "GRAB")
OutputFile <- file.path(tempdir(), "resultPOLMMregion.txt")
# Step 2
GRAB.Region(obj.POLMM, GenoFileRV, OutputFile,
GroupFile = GroupFile,
SparseGRMFile = SparseGRMFile,
MaxMAFVec = "0.01,0.005"
)
# View results
head(data.table::fread(OutputFile))
head(data.table::fread(paste0(OutputFile, ".markerInfo")))
Output Files
Region-level analysis generates four output files:
1. Main results (OutputFile):
Region: Gene/region identifiernMarkers: Number of rare variants included (MAC ≥ threshold)nMarkersURV: Number of ultra-rare variants (MAC < threshold)Anno.Type: Annotation categoryMaxMAF.Cutoff: MAF cutoff usedpval.SKATO: SKAT-O p-valuepval.SKAT: SKAT p-valuepval.Burden: Burden test p-value
2. Marker info (.markerInfo):
Detailed statistics for rare variants included in region tests:
Region,ID,Info,AnnoAltFreq,MAC,MAF,MissingRateStatVec: Score statisticaltBetaVec: Effect size estimateseBetaVec: Standard errorpval0Vec: Unadjusted p-valuepval1Vec: SPA-adjusted p-value
3. Other marker info (.otherMarkerInfo):
Information for excluded markers (ultra-rare or failed QC)
4. Burden summaries (.infoBurdenNoWeight):
Summary statistics for burden tests without variant weights