Overview
The GRAB
package is primarily designed to conduct genome-wide association studies (GWAS) for both single-variant and set-based analyses. Additionally, the package can be used:
- to simulate genotype/phenotype data
- to calculate sparse GRM
- to read genotype data from PLINK/BGEN files
Genome-wide Association Studies
All approaches share the same analysis framework, which includes the following two steps:
- Step 1: Fit a null model using traits, covariates, and GRM (if applied).
- Step 2: Conduct single-variant or set-based tests to identify markers or marker-sets (e.g., genes) significantly associated with the trait of interest.
The GRAB
package supports multiple trait types including:
- Ordinal categorical traits (POLMM / POLMM-GENE)
- Time-to-event traits (SPACox)
- Model residuals after fitting a null model
The GRAB
package includes SPACox
, SPAmix
, and SPAGRM
methods that support model residuals as input. For any type of trait, users can select an appropriate model to fit the trait against confounding factors and then calculate model residuals. If the sum of the residuals is zero, they can serve as input for a GWAS.
The GRAB
package supports the following genotype file formats for association studies:
PLINK
binary files (.bed, .bim, .fam)BGEN
(.bgen, .bgi, .sample)
Preparation Before Using the GRAB Package
The GRAB
package supports using genotype data to adjust for sample relatedness via genetic relationship matrix (GRM). PLINK binary files with high-quality genotyped variants are required for this purpose. In UK Biobank real data analysis, we used the following cutoffs in PLINK to select ~340K SNPs for White British subjects:
--maf 0.05
--indep-pairwise 500 50 0.2
If the sample size in the analysis is greater than 100,000, we recommend using sparse GRM (instead of dense GRM) to adjust for sample relatedness. The function getSparseGRM()
internally uses GCTA
software (we tested v1.93) to create a SparseGRMFile
that will be passed to functions as needed. As required by the GCTA
software, the function only supports Linux and PLINK files. Two steps are required:
-
Step 1: Run
getTempFilesFullGRM()
to save temporary files to tempDir. -
Step 2: Run
getSparseGRM()
to combine the temporary files to create aSparseGRMFile
.
Users can customize parameters including (minMafGRM, maxMissingGRM, nPartsGRM)
, but these parameters should be consistent for functions getTempFilesFullGRM()
and getSparseGRM()
. Otherwise, the temporary files cannot be accurately located.
Other Functions in the GRAB
Package
The GRAB
package provides additional functions to facilitate user workflows. More details can be found in the corresponding sections.
Data Simulation
The GRAB
package can be used to simulate genotype/phenotype data.
Genetic Relationship Matrix (GRM)
The GRAB
package supports sparse GRM to adjust for family relatedness. Functions getTempFilesFullGRM()
and getSparseGRM()
can be used to generate a sparse GRM using PLINK files.
Reading Genotype Data
The GRAB
package can also be used to read genotype data into R from PLINK
and BGEN
files.