# Phenotype simulation
The below demonstrates the simulation of
- Binary trait
- Quantitative trait
- Ordinal categorical trait
- Time-to-event trait (to be updated)
The phenotype data system.file("extdata", "simuPHENO.txt", package = "GRAB")
is simulated as below line by line.
First, we simulate covariates data frame in R
library(GRAB)
set.seed(678910)
FamFile = system.file("extdata", "simuPLINK.fam", package = "GRAB")
FamData = data.table::fread(FamFile)
IID = FamData$V2 # Individual ID
n = length(IID) # sample size
Covar = data.table::data.table(IID = IID,
AGE = rnorm(n, 60),
GENDER = rbinom(n, 1, 0.5))
Then, we simulate linear predictors eta
beta.AGE = 0.5
beta.GENDER = 0.5
Covar = Covar %>% mutate(eta = beta.AGE * AGE + beta.GENDER * GENDER)
To simulate phenotypes for subjects with a given family structure, please add a random effect to the linear predictors eta
as below.
bVec = GRAB.SimubVec(500, 50, "10-members", tau = 1)
Covar = merge(Covar, bVec)
PhenoData = Covar %>% mutate(eta = eta + bVec)
Next, we simulate phenotypes using eta
A. binary trait
PhenoData = PhenoData %>% mutate(BinaryPheno = GRAB.SimuPheno(eta, traitType = "binary",
control = list(pCase=0.1)))
PhenoData %>% select(BinaryPheno) %>% table()
# Random number seed: 493536667
# .
# 0 1
# 900 100
B. quantitative trait
PhenoData = PhenoData %>% mutate(QuantPheno = GRAB.SimuPheno(eta, traitType = "quantitative",
control = list(sdError=1)))
# Random number seed: 98062725
C. ordinal categorical trait
PhenoData = PhenoData %>% mutate(OrdinalPheno = GRAB.SimuPheno(eta, traitType = "ordinal",
control = list(pEachGroup = c(8,1,1))))
PhenoData %>% select(OrdinalPheno) %>% table()
# Random number seed: 418335750
# .
# 0 1 2
# 800 100 100
D. time-to-event trait
TimeToEventPheno = GRAB.SimuPheno(PhenoData$eta, traitType = "time-to-event", control = list(eventRate = 0.1))
# Random number seed: 881619480
PhenoData = cbind(PhenoData, TimeToEventPheno)
The simulated phenotype data is as below
head(PhenoData)
# IID AGE GENDER eta bVec BinaryPheno QuantPheno OrdinalPheno SurvTime SurvEvent
# 1: Subj-1 59.61118 0 29.59961 -0.2059781 0 0.2786663 0 0.03245097 0
# 2: Subj-10 61.31636 0 29.32101 -1.3371678 0 -2.2973289 0 0.19733412 0
# 3: Subj-100 59.16625 0 29.90944 0.3263173 0 -2.4870252 1 0.09451178 0
# 4: Subj-101 60.18490 1 28.99876 -1.5936861 0 -3.4063276 0 0.10946053 0
# 5: Subj-102 58.76700 1 27.90020 -1.9832990 0 -3.4421444 0 0.04882298 0
# 6: Subj-103 59.98608 1 29.11236 -1.3806732 0 -1.3891887 0 0.30540795 0
The phenotype data is stored in the below file system.file("extdata", "simuPHENO.txt", package = "GRAB")
PhenoFile = system.file("extdata", "simuPHENO.txt", package = "GRAB")
data.table::fwrite(PhenoData, PhenoFile,
row.names = F, quote = F, col.names = T, sep = "\t")
The distribution of simulated phenotypes
The distribution of simulated phenotypes compared to linear predicators eta
is as below.
- quantitative phenotype: positive correction between
eta
and phenotypes - binary phenotype: higher
eta
, higher possibility of being cases - ordinal categorical phenotype: higher
eta
, higher possibility of being groups with larger number