SPACox / SPAmix / SPAGRM approaches
Main features
SPACox
, SPAmix
, and SPAGRM
are accurate and efficient approaches to associating complex traits (including but not limited to time-to-event traits) to single-variant.
The three methods use empirical SPA approaches and share the below features.
- Model residuals (whose sum is zero) are needed as input. Users can select appropriate statistical models depending on the type of traits in analysis.
- High computational efficiency to analyze large-scale biobank data with millions of individuals
- High accuracy to analyze common, low-frequency, and rare variants, even if the phenotypic distribution (or residual distribution) is highly unbalanced.
The three methods are different in terms of
-
SPACox
is the basic function to analyze unrelated subjects in a homogeneous population. -
SPAmix
extendsSPACox
to analyze an admixture population or multiple populations. The method is still only valid to analyze unrelated subjects. -
SPAGRM
extendsSPACox
to analyze a study cohort in which subjects can be genetically related to each other. The method is still only valid to analyze a homogeneous population.
Important notes about function GRAB.NullModel
-
If the left side of argument
formula
is model residual, please specify the argumenttraitType = "Residuals"
. Otherwise, the argumenttraitType
can be specified based on the type of trait in analysis. -
For method
SPAmix
, the top SNP-derived PC and related information is required in the argumentsformula
andcontrol
. -
For method
SPAGRM
, argumentsGenoFile
,GenoFileIndex
, andSparseGRMFile
are required to characterize the family relatedness.
Quick Start-up Guide
The below gives examples to demonstrate the usage of SPACox
, SPAmix
, and SPAGRM
approaches.
Step 1. Read in data and fit a null model
library(GRAB)
library(survival)
PhenoFile = system.file("extdata", "simuPHENO.txt", package = "GRAB")
PhenoData = data.table::fread(PhenoFile, header = T)
obj.SPACox = GRAB.NullModel(
Surv(SurvTime, SurvEvent) ~ AGE + GENDER,
data = PhenoData,
subjData = IID,
method = "SPACox",
traitType = "time-to-event"
)
SPACox
method can also support model residuals as input. The above codes are the same as below.
obj.coxph = coxph(
Surv(SurvTime, SurvEvent) ~ AGE + GENDER,
data = PhenoData
)
obj.SPACox = GRAB.NullModel(
obj.coxph$residuals ~ AGE + GENDER,
data = PhenoData,
subjData = IID,
method = "SPACox",
traitType = "Residual"
)
SPAmix
method also support both original trait or model residuals as input. For SPAmix
, the confounding factors of SNP-derived PCs are required and should be specified in control
.
PhenoFile = system.file("extdata", "simuPHENO.txt", package = "GRAB")
PhenoData = data.table::fread(PhenoFile, header = T)
obj.SPAmix = GRAB.NullModel(
Surv(SurvTime, SurvEvent) ~ AGE + GENDER + PC1 + PC2,
data = PhenoData,
subjData = IID,
method = "SPAmix",
traitType = "time-to-event",
control = list(PC_columns = "PC1,PC2")
)
The same results can be obtained via using model residuals
obj.coxph = coxph(
Surv(SurvTime, SurvEvent ) ~ AGE + GENDER + PC1 + PC2,
data = PhenoData
)
obj.SPACox = GRAB.NullModel(
obj.coxph$residuals ~ AGE + GENDER + PC1 + PC2,
data = PhenoData,
subjData = IID,
method = "SPAmix",
traitType = "Residual",
control = list(PC_columns = "PC1,PC2")
)
Step 2. Conduct genome-wide association studies
For different types of traits and methods, the step 2 is the same as below.
GenoFile = system.file("extdata", "simuPLINK.bed", package = "GRAB")
OutputDir = tempdir()
# step 2 for SPACox method
OutputFile = file.path(OutputDir, "Results_SPACox.txt")
GRAB.Marker(obj.SPACox, GenoFile = GenoFile, OutputFile = OutputFile)
# step 2 for SPAmix method
OutputFile = file.path(OutputDir, "Results_SPAmix.txt")
GRAB.Marker(obj.SPAmix, GenoFile = GenoFile, OutputFile = OutputFile)
Detailed documentation about how to use SPAGRMĀ is available atĀ SPAGRM online tutorial.
Citation
-
SPACox: Wenjian Bi, Lars G. Fritsche, Bhramar Mukherjee, Sehee Kim, and Seunggeun Lee. A fast and accurate method for genome-wide time-to-event data analysis and its application to UK Biobank. The American Journal of Human Genetics 107, no. 2 (2020): 222-233.
-
SPAmix: Yuzhuo Ma, He Xu, Ying Li, Hyesung Kim, Lin-lin Xu, Lin Miao, Peng Xu, Fengbiao Mao, Xu-jie Zhou, Wei Zhou, Seunggeun Lee, Ji-Feng Zhang, Peipei Zhang, Wenjian Bi (2025). A scalable, accurate, and universal analysis framework using individual-specific allele frequency for large-scale genetic association studies in an admixed population. Genome Biology in press
-
SPAGRM: He Xu, Yuzhuo Ma, Lin-lin Xu, Yin Li, Yufei Liu, Ying Li, Xu-jie Zhou, Wei Zhou, Seunggeun Lee, Peipei Zhang, Weihua Yue and Wenjian Bi (2025). SPA(GRM): effectively controlling for sample relatedness in large-scale genome-wide association studies of longitudinal traits. Nature Communications 16(1): 1413.