POLMM Approaches
POLMM
and POLMM-GENE
are accurate and efficient approaches for associating ordinal categorical traits with single variants and variant sets (e.g., genes), respectively.
Main Features
POLMM
and POLMM-GENE
are:
- designed for ordinal categorical trait analysis
- accurate for unbalanced phenotypic distributions (e.g., sample size proportions across three levels of 30:1:1)
- scalable for large-scale biobank data analysis (e.g., UK Biobank)
- support both dense GRM and sparse GRM (recommended) to adjust for family relatedness
- support both single-variant analysis and set-based analysis (Burden tests, SKAT, and SKAT-O)
Important Notes Prior to Analysis
-
For the function
GRAB.NullModel
, the left side of theformula
argument should be a factor when fitting a null model in step 1. If thefactor
function is used to convert phenotype to a factor, we highly recommend specifying thelevels
argument explicitly. -
We recommend using sparse GRM to adjust for family relatedness due to its high computational efficiency. Generally, we did not observe power loss compared to using dense GRM.
Quick Start-up Guide
The following example demonstrates the usage of POLMM approaches.
First, Read Data and Convert Phenotype to a Factor
library(GRAB)
PhenoFile = system.file("extdata", "simuPHENO.txt", package = "GRAB")
PhenoData = data.table::fread(PhenoFile, header = TRUE)
PhenoData$OrdinalPheno <- factor(PhenoData$OrdinalPheno, levels = c(0, 1, 2))
Step 1(a): If dense GRM is used in model fitting, GenoFile
is required
GenoFile = system.file("extdata", "simuPLINK.bed", package = "GRAB")
obj.POLMM = GRAB.NullModel(
factor(OrdinalPheno) ~ AGE + GENDER,
data = PhenoData,
subjData = PhenoData$IID,
method = "POLMM",
traitType = "ordinal",
GenoFile = GenoFile,
control = list(showInfo = FALSE, LOCO = FALSE, tolTau = 0.2, tolBeta = 0.1)
)
Step 1(b): If sparse GRM is used in model fitting, SparseGRMFile
is required
SparseGRMFile = system.file("extdata", "SparseGRM.txt", package = "GRAB")
GenoFile = system.file("extdata", "simuPLINK.bed", package = "GRAB")
obj.POLMM = GRAB.NullModel(
formula = OrdinalPheno ~ AGE + GENDER,
data = PhenoData,
subjData = PhenoData$IID,
method = "POLMM",
traitType = "ordinal",
GenoFile = GenoFile,
SparseGRMFile = SparseGRMFile,
control = list(showInfo = FALSE, LOCO = FALSE, tolTau = 0.2, tolBeta = 0.1)
)
OutputDir = tempdir()
objPOLMMFile = file.path(OutputDir, "objPOLMMFile.RData")
save(obj.POLMM, file = objPOLMMFile)
Step 2(a): Single-variant tests using POLMM
# Load a precomputed example object to perform step 2 without repeating step 1
objPOLMMFile = system.file("extdata", "objPOLMMnull.RData", package = "GRAB")
load(objPOLMMFile)
GenoFile = system.file("extdata", "simuPLINK.bed", package = "GRAB")
OutputDir = tempdir()
OutputFile = file.path(OutputDir, "simuMarkerOutput.txt")
GRAB.Marker(obj.POLMM, GenoFile = GenoFile, OutputFile = OutputFile)
data.table::fread(OutputFile)
Step 2(b): Set-based tests using POLMM-GENE
objPOLMMFile = system.file("extdata", "objPOLMMnull.RData", package = "GRAB")
load(objPOLMMFile) # read in an R object of "obj.POLMM"
GenoFile = system.file("extdata", "simuPLINK_RV.bed", package = "GRAB")
OutputDir = tempdir()
OutputFile = file.path(OutputDir, "simuRegionOutput.txt")
GroupFile = system.file("extdata", "simuPLINK_RV.group", package = "GRAB")
SparseGRMFile = system.file("extdata", "SparseGRM.txt", package = "GRAB")
GRAB.Region(
objNull = obj.POLMM,
GenoFile = GenoFile,
GenoFileIndex = NULL,
OutputFile = OutputFile,
OutputFileIndex = NULL,
GroupFile = GroupFile,
SparseGRMFile = SparseGRMFile,
MaxMAFVec = "0.01,0.005"
)
data.table::fread(OutputFile)
Citation
-
POLMM: Bi, Wenjian, Wei Zhou, Rounak Dey, Bhramar Mukherjee, Joshua N. Sampson, and Seunggeun Lee. Efficient mixed model approach for large-scale genome-wide association studies of ordinal categorical phenotypes. The American Journal of Human Genetics 108, no. 5 (2021): 825-839.
-
POLMM-GENE: Bi, Wenjian, Wei Zhou, Peipei Zhang, Yaoyao Sun, Weihua Yue, and Seunggeun Lee. Scalable mixed model approaches for set-based association studies on large-scale categorical data analysis and its application to 450k exome sequencing data in UK Biobank. To be submitted.