Overview of GRAB package
The GRAB
package is mainly designed to conduct genome-wide association studies (GWAS) in terms of both single-variant and set-based analysis. In addition, the package can also be used
- to simulate genotype/phenotype data,
- to calculate sparse GRM, and
- to read in genotype data from PLINK/BGEN files.
Genome-wide association studies
All of these approaches share the same analysis framework including the following two steps
- Step 1: Fit a null model using trait, covariates, and GRM (if applied).
- Step 2: Conduct single-variant or set-based tests to identify marker or marker-set (e.g. gene) significantly associated with the trait of interest.
The GRAB
package supports multiple traits including
- Ordianl categorical trait (POLMM / POLMM-GENE), and
- Time-to-event trait (SPACox)
- Model residuals after fitting a null model
For binary/quantitative traits analysis, we plan to incorporate SAIGE/SAIGE-GENE in the GRAB
package in the future. Currently, SAIGE
package is still being continuously updated.
The GRAB
package includes SPACox
, SPAmix
, and SPAGRM
methods in which model residuals are supported as input. For any type of trait, users can select an appropriate model to fit the trait to confounding factors and then calculate model residuals. If the sum of the residuals is zero, then it can serve as an input for a GWAS.
The GRAB
package supports the below genotype file format for association studies
PLINK
binary files (.bed, .bim, .fam)BGEN
(.bgen, .bgi, .sample)
Preparation before using GRAB package
GRAB
package supports using genotype data to adjust for sample relatedness via genetic relationship matrix (GRM). PLINK binary files with high-quality genotyped variants are required for that purpose. In UK Biobank real data analysis, we used the following cutoffs in PLINK to select ~ 340K SNPs for White British subjects.
--maf 0.05
--indep-pairwise 500 50 0.2
If the sample size in analysis is greater than 100,000, we recommend using sparse GRM (instead of dense GRM) to adjust for sample relatedness. The function getSparseGRM()
internally uses GCTA
software (gcta_1.93.1beta) to make a SparseGRMFile
which will be passed to functions if needed. As required by GCTA
software, the function can only support Linux and PLINK files. Two steps are needed as below.
-
Step 1: Run
getTempFilesFullGRM()
to save temporary files to tempDir. -
Step 2: Run
getSparseGRM()
to combine the temporary files to make aSparseGRMFile
.
Users can customize parameters including (minMafGRM, maxMissingGRM, nPartsGRM)
, but the above parameters should be consistant for functions getTempFilesFullGRM()
and getSparseGRM()
. Otherwise, the temporary files cannot be accurately located.
Other functions in GRAB
package
The GRAB
package provides some additional functions to facilitate users. More details can be seen in the corresponding sections.
Data Simulation
The GRAB
package can be used to simulate genotype/phenotype data.
Genetic Relationship Matrix (GRM)
The GRAB
package supports sparse GRM to adjust for family relatedness. Functions getTempFilesFullGRM()
and getSparseGRM()
can be used to generate a sparse GRM using PLINK files.
Read in Genotype Data
The GRAB
package can also be used to read in genotype data to R from PLINK
and BGEN
files.