Running Haplin analysis

Running the analysis

By default, Haplin estimates the relative risk (RR) of a phenotype associated with a haplotype, based on triad or dyad genotype data. The output of the genDataPreprocess function (or genDataLoad) is used to run the analysis. Haplin analysis is run by the single command:

haplin( my.prepared.gen.data )

CAUTION: this command will try to provide estimates based on all the markers in the data object! Therefore, if you have a large dataset, such as from GWAS analysis, first try running a scan over the markers with a small window size, to determine where to focus your subsequent more detailed analysis:

haplinSlide( my.prepared.gen.data, use.missing = TRUE, winlength = 3 )

This performs haplin analysis on the marker window of length given by the winlegth argument above in order to search for the most significant regions in the dataset.

For more options and examples of how to run Haplin, see below or the haplin help file, obtained by writing in R:

?haplin

and

?haplinSlide

Exemplary Haplin runs

To test that Haplin runs properly, you can use the exemplary data provided with Haplin.

Trial run no.1

Here, the data includes only genotypes and the analysis is performed on all markers:

dir.exmpl <- system.file( "extdata", package = "Haplin" )
exemplary.file1 <- paste0( dir.exmpl, "/HAPLIN.trialdata.txt" )

trial.data1 <- genDataRead( file.in = exemplary.file1, file.out = "trial_data1",
    dir.out = ".", format = "haplin", n.vars = 0 )
## Reading the data in chunks...
##  -- chunk 1 --
##  -- chunk 2 -- 
## ... done reading.
## Reading the marker names...
## Warning: No map file given, map file empty or the number of map file rows not
## equal to the number of markers in data; will generate dummy marker names.
## ...done
## Preparing data...
## ... done preparing
## Saving data...
## ... saved to files: ./trial_data1_gen.ffData, ./trial_data1_gen.RData
trial.data1.prep <- genDataPreprocess( data.in = trial.data1, design = "triad",
  file.out = "trial_data1_prep", dir.out = "." )
## Reading the marker names...
## Warning: No map file given, map file empty or the number of map file rows not
## equal to the number of markers in data; will generate dummy marker names.
## ...done
## Recoding genetic data (no. of loci: 2)...
##    ...running on only one CPU core! This may take some time...
##    ...checking alleles per SNP...
## opening ff /tmp/Rtmpsqe1qO/ff/ff10e4377c111c.ff
##    ...done, all alleles: 1 2 3 4 C T 
##    ...recoding SNPs...
##    ...done
## Saving data... 
## ... saved to files: ./trial_data1_prep_gen.ffData ,  ./trial_data1_prep_gen.RData
haplin( trial.data1.prep, use.missing = TRUE, maternal = TRUE )
## 
## ## HAPLIN, VERSION 7.3.2 ##
## opening ff /tmp/Rtmpsqe1qO/ff/ff10e4357edc43.ff
## There were 5 rows with missing data
## All rows retained in analysis
## No lines contained Mendelian inconsistencies
## 
## Running EM for preliminary estimates of haplotype frequencies...  Done
## Removing unused haplotypes...  Done
## 
## Using EM to estimate model with no effect:
## EM iter: 1    |GLM deviance: 0            |Coefficients: 2.89196e-18  1.55109e-18  3.73751e-19  1.55355e-18  9.90351e-19  
## EM iter: 2    |GLM deviance: 345.258      |Coefficients: -0.899013    0.0911016    -1.40653     0.684214     -1.50184     
## EM iter: 3    |GLM deviance: 344.375      |Coefficients: -0.90182     0.0914246    -1.41264     0.685363     -1.50184     
## EM iter: 4    |GLM deviance: 344.373      |Coefficients: -0.901826    0.0914253    -1.41265     0.685366     -1.50184     
## EM iter: 5    |GLM deviance: 344.373      |Coefficients: -0.901826    0.0914253    -1.41265     0.685366     -1.50184     
## EM iter: 6    |GLM deviance: 344.373      |Coefficients: -0.901826    0.0914253    -1.41265     0.685366     -1.50184     
## 
## Done
## 
## Using EM to estimate full model:
## EM iter: 1    |GLM deviance: 0            |Coefficients: 2.88242e-18  1.54772e-18  3.29104e-19  1.55e-18     9.58497e-19  8.99662e-21  1.80698e-23  6.48172e-20  1.19262e-20  -3.14362e-19 4.07141e-20  -5.25161e-21 4.06689e-20  1.70197e-20  8.07211e-20  -4.69819e-22 -9.56356e-21 2.30845e-21  1.36843e-18  3.81334e-20  8.52962e-20  3.74784e-20  2.76302e-20  
## EM iter: 2    |GLM deviance: 324.308      |Coefficients: -0.786562    0.234966     -1.23777     0.627763     -1.46355     -0.603638    -0.37483     -0.782319    -0.137819    0.942851     -0.262305    0.700751     -0.274264    0.972414     0.30563      0.245949     0.28865      0.00556478   -0.468796    -0.353194    0.717524     0.521617     -14.3119     
## EM iter: 3    |GLM deviance: 324.026      |Coefficients: -0.785755    0.238644     -1.24484     0.632153     -1.46081     -0.609407    -0.3795      -0.77606     -0.14077     0.953784     -0.258198    0.703041     -0.281416    0.974712     0.301628     0.239555     0.284818     0.00131739   -0.462487    -0.346801    0.728309     0.512107     -14.3091     
## EM iter: 4    |GLM deviance: 324.025      |Coefficients: -0.785716    0.238678     -1.24486     0.632184     -1.46078     -0.609489    -0.379552    -0.776042    -0.140807    0.953882     -0.258161    0.70306      -0.28148     0.974733     0.301582     0.239504     0.284766     0.00128555   -0.46244     -0.346727    0.728394     0.512041     -14.3091     
## EM iter: 5    |GLM deviance: 324.025      |Coefficients: -0.785716    0.238678     -1.24486     0.632185     -1.46078     -0.60949     -0.379553    -0.776042    -0.140807    0.953884     -0.258161    0.70306      -0.281481    0.974733     0.301581     0.239504     0.284766     0.0012854    -0.46244     -0.346726    0.728394     0.51204      -14.3091     
## EM iter: 6    |GLM deviance: 324.025      |Coefficients: -0.785716    0.238678     -1.24486     0.632185     -1.46078     -0.60949     -0.379553    -0.776042    -0.140807    0.953884     -0.258161    0.70306      -0.281481    0.974733     0.301581     0.239504     0.284766     0.0012854    -0.46244     -0.346726    0.728394     0.51204      -14.3091     
## 
## Done
## 
## Estimation finished, preparing output... Done
## 
## Note: Some relative risk estimates fall outside the default plotting range.
## Consider replotting, with argument "ylim" set wider
## 
## #################################
## ----Arguments supplied to haplin in this run:----
## 
## filespecs: markers = 1:2
## model: design = "triad", use.missing = TRUE, xchrom = FALSE, comb.sex = "double", maternal = TRUE, poo = FALSE, test.maternal = FALSE, scoretest = "no"
## variables: ccvar = NULL, strata = NULL, sex = NULL
## haplos: reference = "reciprocal", response = "free", threshold = 0.01, max.haplos = NULL, haplo.file = NULL
## control: resampling = "no", max.EM.iter = 50, data.out = "no", verbose = TRUE, printout = TRUE
## 
## ----Data summary:----
## 
## Number of triads in original file: 249 
## 
## Accounting for possible loss of triads:
##                                                   
##  Cause of loss     Triads removed Triads remaining
##  Missing data                   0              249
##  Mendelian incons.              0              249
##  Unused haplotypes              5              244
## 
## Triads remaining for analysis: 244 
## 
## NOTE: In the following, the most frequent allele
##  is printed as upper-case, all others are lower-case
## 
## Marker m1 (raw counts):
## Missing alleles:  6 
##  Allele Frequency Percent
##  1       149       10.0  
##  2       412       27.7  
##  3        87        5.8  
##  4       840       56.5  
##  total  1488      100.0  
## Chi-squared test for HWE, p-value:  0.6585 
## 
## Marker m2 (raw counts):
## Missing alleles:  4 
##  Allele Frequency Percent
##  C      1397       93.8  
##  t        93        6.2  
##  total  1490      100.0  
## Chi-squared test for HWE, p-value:  0.5721 
## 
## --------
## Haplotypes removed because of low frequencies:
## 1-t 2-t 3-t 
## 
## Haplotypes used in the analysis, with coding:
## 1-C 2-C 3-C 4-C 4-t 
##   1   2   3   4   5 
## 
## ----Estimation results:----
## 
## Date of call:
## Thu Oct 31 21:08:12 2024 
## 
## Number of triads: 244
## 
## Number of haplotypes: 5
## 
## Haplotype frequencies with 95% confidence intervals:
##  Haplotype Frequency(%) lower upper
##  1-C       10.95         7.94 14.94
##  2-C       30.62        25.78 35.82
##  3-C        6.94         4.58 10.40
##  4-C       45.39        39.95 50.80
##  4-t        5.60         3.55  8.81
## 
## Single- and double dose effects (Relative Risk) with 95% confidence intervals:
## Reference method: reciprocal
## Response model: free
## 
## ----Child haplotypes----
##  Haplotype Dose     Relative Risk Lower CI   Upper CI   P-value   
##  1-C       Single   0.729         0.454      1.17       0.186     
##  1-C       Double   1.22          0.396      3.75       0.731     
##                                                                   
##  2-C       Single   1.04          0.703      1.55       0.826     
##  2-C       Double   0.642         0.315      1.34       0.23      
##                                                                   
##  3-C       Single   0.611         0.342      1.1        0.103     
##  3-C       Double   0.67          0.0811     5.34       0.705     
##                                                                   
##  4-C       Single   1.62          1.02       2.55       0.0411    
##  4-C       Double   1.89          1.03       3.35       0.0353    
##                                                                   
##  4-t       Single   1.25          0.702      2.23       0.446     
##  4-t       Double   3.51          0.709      17.7       0.125     
##                                                                   
## ----Maternal haplotypes----
##  Haplotype Dose     Relative Risk Lower CI   Upper CI   P-value   
##  1-C       Single   1.18          0.746      1.87       0.487     
##  1-C       Double   0.912         0.192      4.18       0.906     
##                                                                   
##  2-C       Single   1.07          0.716      1.59       0.754     
##  2-C       Double   0.874         0.437      1.75       0.7       
##                                                                   
##  3-C       Single   1.13          0.652      2.03       0.653     
##  3-C       Double   2.78          0.567      14.2       0.212     
##                                                                   
##  4-C       Single   0.797         0.529      1.19       0.27      
##  4-C       Double   1.05          0.619      1.76       0.868     
##                                                                   
##  4-t       Single   0.828         0.461      1.51       0.536     
##  4-t       Double   2.98e-05      0          Inf        1         
##                                                                   
## 
## Overall test for difference between null model (no effects) and full model:
## ------------
## LIKELIHOOD RATIO TEST:                                     
##  Loglike null model:       -1228.0728
##  Loglike full model:       -1217.7436
##  df:                          18.0000
##  Likelihood ratio p-value:     0.2970
## 
## (NOTE: The test may be sensitive to rare haplotypes)
Results of trial run no.1

Results of trial run no.1

First, the information about the calculation process is printed:

  • how many rows contained missing data and how many were dropped (if use.missing = F);
  • information about any other reasons to drop data, e.g., Mandelian inconsistencies;
  • process of EM estimation - for each iteration, the deviance and values of the coefficients of the model; NB: there should not be many iterations, which would indicate some problems with the chosen model or with the input data

Next, the default is to print the summary of the results:

  • the full information about the input data and arguments;
  • summary of all the data parts dropped due to some inconsistencies;
  • the frequency of alleles in the markers and the results of testing for Hardy-Weinberg equilibrium (HWE);
  • summary of the haplotypes found;
  • and finally, the model estimations with relative risks for each of the haplotypes.

As you can see, a lot of information is printed out by default. One can change it by setting options verbose and/or printout to FALSE. Moreover, it’s usually quite useful to save the results into an object:

my.results <- haplin( trial.data1.prep, use.missing = TRUE, maternal = TRUE,
  verbose = FALSE, printout = FALSE )
my.results
## This is the result of a haplin run.
## Number of data lines used: 244 | Number of haplotypes used: 5
## Please use the "summary", "plot", "haptable" or "output" functions to obtain
##  more details.

To access the details of the results, we can use the following functions with the saved object my.results as the argument:

  • summary, which outputs all the results in a nicely formatted text (as outputted above) or,
  • haptable, which gives the same data only in a matrix format,
  • plot, which replots the figure with the results,
  • output, which saves the results to text and JPG files.

Trial run no.2

This example shows analysis of data where apart from genotypes, there are two covariate columns (n.vars = 2), one coding for the case-control status (ccvar = 2). In the analysis, we take into account all the available data, imputing the parts that are missing (use.missing = TRUE). The estimation will be based on the dose-response model (response = "mult").

exemplary.file2 <- paste0( dir.exmpl, "/HAPLIN.trialdata2.txt" )

trial.data2 <- genDataRead( file.in = exemplary.file2, file.out = "trial_data2",
    dir.out = ".", format = "haplin", allele.sep = "", n.vars = 2 )
## The format of the file is 'haplin' with covariate data but no names of the covariate data is given. Will generate dummy names.
## Reading the data in chunks...
##  -- chunk 1 --
##  -- chunk 2 -- 
## ... done reading.
## Reading the marker names...
## Warning: No map file given, map file empty or the number of map file rows not
## equal to the number of markers in data; will generate dummy marker names.
## ...done
## Preparing data...
## ... done preparing
## Saving data...
## ... saved to files: ./trial_data2_gen.ffData, ./trial_data2_gen.RData
trial.data2.prep <- genDataPreprocess( data.in = trial.data2, design = "triad",
  file.out = "trial_data2_prep", dir.out = "." )
## Reading the marker names...
## Warning: No map file given, map file empty or the number of map file rows not
## equal to the number of markers in data; will generate dummy marker names.
## ...done
## Recoding covariate data...
## ...done
## Recoding genetic data (no. of loci: 2)...
##    ...running on only one CPU core! This may take some time...
##    ...checking alleles per SNP...
## opening ff /tmp/Rtmpsqe1qO/ff/ff10e48e29aac.ff
##    ...done, all alleles: A G B 
##    ...recoding SNPs...
##    ...done
## Saving data... 
## ... saved to files: ./trial_data2_prep_gen.ffData ,  ./trial_data2_prep_gen.RData
haplin( trial.data2.prep, use.missing = TRUE, ccvar = 2, design =
  "cc.triad", reference = "ref.cat", response = "mult" )
## 
## ## HAPLIN, VERSION 7.3.2 ##
## opening ff /tmp/Rtmpsqe1qO/ff/ff10e42180da12.ff
## There were 508 rows with missing data
## All rows retained in analysis
## The following 5 data lines were dropped due to Mendelian inconsistencies:
##  240 257 302 320 780 
## 
## Running EM for preliminary estimates of haplotype frequencies...  Done
## Removing unused haplotypes...  Done
## 
## Note:
## The selected case/control variable is column 2: 'cov.2.c'
## The following case/control coding and frequencies have been found:
## controls(0): 762, cases(1): 377
## 
## Using EM to estimate model with no effect:
## EM iter: 1    |GLM deviance: 0            |Coefficients: -1.14634e-18 1.27688e-18  9.86974e-19  1.77371e-18  
## EM iter: 2    |GLM deviance: 359.799      |Coefficients: -0.713055    -1.18646     0.320706     1.27225      
## EM iter: 3    |GLM deviance: 104.965      |Coefficients: -0.713055    -1.90555     0.224806     1.34845      
## EM iter: 4    |GLM deviance: 101.348      |Coefficients: -0.713055    -2.02624     0.2036       1.3596       
## EM iter: 5    |GLM deviance: 101.879      |Coefficients: -0.713055    -2.04019     0.19944      1.36137      
## EM iter: 6    |GLM deviance: 101.978      |Coefficients: -0.713055    -2.04171     0.198655     1.36167      
## EM iter: 7    |GLM deviance: 101.994      |Coefficients: -0.713055    -2.04188     0.198509     1.36172      
## EM iter: 8    |GLM deviance: 101.997      |Coefficients: -0.713055    -2.04189     0.198483     1.36173      
## EM iter: 9    |GLM deviance: 101.998      |Coefficients: -0.713055    -2.0419      0.198478     1.36173      
## EM iter: 10   |GLM deviance: 101.998      |Coefficients: -0.713055    -2.0419      0.198477     1.36173      
## EM iter: 11   |GLM deviance: 101.998      |Coefficients: -0.713055    -2.0419      0.198477     1.36173      
## EM iter: 12   |GLM deviance: 101.998      |Coefficients: -0.713055    -2.0419      0.198477     1.36173      
## EM iter: 13   |GLM deviance: 101.998      |Coefficients: -0.713055    -2.0419      0.198477     1.36173      
## 
## Done
## 
## Using EM to estimate full model:
## EM iter: 1    |GLM deviance: 0            |Coefficients: -1.04221e-18 1.27807e-18  1.01006e-18  1.74902e-18  1.82394e-19  -1.92831e-19 
## EM iter: 2    |GLM deviance: 310.934      |Coefficients: -0.808547    -1.05734     0.26884      1.27994      -1.28603     0.328171     
## EM iter: 3    |GLM deviance: 89.7213      |Coefficients: -0.893282    -1.79929     0.167796     1.36204      -0.501711    0.387945     
## EM iter: 4    |GLM deviance: 86.3708      |Coefficients: -0.905895    -1.9526      0.145439     1.37468      -0.342099    0.402278     
## EM iter: 5    |GLM deviance: 87.1614      |Coefficients: -0.908033    -1.97423     0.140962     1.37675      -0.319404    0.405527     
## EM iter: 6    |GLM deviance: 87.3163      |Coefficients: -0.908428    -1.97705     0.140093     1.3771       -0.316378    0.40624      
## EM iter: 7    |GLM deviance: 87.3418      |Coefficients: -0.908505    -1.97742     0.139926     1.37716      -0.315974    0.406393     
## EM iter: 8    |GLM deviance: 87.346       |Coefficients: -0.90852     -1.97747     0.139895     1.37717      -0.315919    0.406426     
## EM iter: 9    |GLM deviance: 87.3468      |Coefficients: -0.908524    -1.97747     0.139889     1.37718      -0.315911    0.406432     
## EM iter: 10   |GLM deviance: 87.3469      |Coefficients: -0.908524    -1.97747     0.139888     1.37718      -0.31591     0.406434     
## EM iter: 11   |GLM deviance: 87.3469      |Coefficients: -0.908524    -1.97747     0.139888     1.37718      -0.315909    0.406434     
## EM iter: 12   |GLM deviance: 87.3469      |Coefficients: -0.908524    -1.97747     0.139887     1.37718      -0.315909    0.406434     
## EM iter: 13   |GLM deviance: 87.3469      |Coefficients: -0.908524    -1.97747     0.139887     1.37718      -0.315909    0.406434     
## 
## Done
## 
## Estimation finished, preparing output... Done
## 
## #################################
## ----Arguments supplied to haplin in this run:----
## 
## filespecs: markers = 1:2
## model: design = "cc.triad", use.missing = TRUE, xchrom = FALSE, comb.sex = "double", maternal = FALSE, poo = FALSE, test.maternal = FALSE, scoretest = "no"
## variables: ccvar = 2, strata = NULL, sex = NULL
## haplos: reference = "ref.cat", response = "mult", threshold = 0.01, max.haplos = NULL, haplo.file = NULL
## control: resampling = "no", max.EM.iter = 50, data.out = "no", verbose = TRUE, printout = TRUE
## 
## ----Data summary:----
## 
## Number of triads in original file: 1139 
## 
## Accounting for possible loss of triads:
##                                                   
##  Cause of loss     Triads removed Triads remaining
##  Missing data                   0             1139
##  Mendelian incons.              5             1134
##  Unused haplotypes              0             1134
## 
## Triads remaining for analysis: 1134 
## 
## NOTE: In the following, the most frequent allele
##  is printed as upper-case, all others are lower-case
## 
## Marker m1 (raw counts):
## Missing alleles:  1424 
##  Allele Frequency Percent
##  a      1290       23.8  
##  G      4120       76.2  
##  total  5410      100.0  
## Chi-squared test for HWE, p-value:  0.6875 
## 
## Marker m2 (raw counts):
## Missing alleles:  954 
##  Allele Frequency Percent
##  a       142        2.4  
##  B      5738       97.6  
##  total  5880      100.0  
## Chi-squared test for HWE, p-value:  0.01014 
## 
## --------
## Haplotypes removed because of low frequencies:
## a-a 
## 
## Haplotypes used in the analysis, with coding:
## G-a a-B G-B 
##   1   2   3 
## 
## ----Estimation results:----
## 
## Date of call:
## Thu Oct 31 21:08:13 2024 
## 
## Number of triads: 1134
## 
## Number of haplotypes: 3
## 
## Haplotype frequencies with 95% confidence intervals:
##  Haplotype Frequency(%) lower upper
##  G-a        2.64         2.15  3.23
##  a-B       21.88        20.44 23.38
##  G-B       75.47        73.92 76.95
## 
## Single- and double dose effects (Relative Risk) with 95% confidence intervals:
## Reference method: ref.cat
## Reference category: 3 (Haplotype G-B)
## Response model: mult
## 
## ----Child haplotypes----
##  Haplotype Dose     Relative Risk Lower CI   Upper CI   P-value   
##  G-a       Single   0.729         0.407      1.33       0.294     
##  G-a       Double   0.531         0.166      1.76       0.294     
##                                                                   
##  a-B       Single   1.5           1.23       1.83       4.43e-05  
##  a-B       Double   2.26          1.52       3.34       4.43e-05  
##                                                                   
##  G-B       Single   REF                                           
##  G-B       Double   1             1          1          NA        
##                                                                   
## 
## Overall test for difference between null model (no effects) and full model:
## ------------
## LIKELIHOOD RATIO TEST:                                        
##  Loglike null model:       -3117.5699752
##  Loglike full model:       -3108.4934459
##  df:                           2.0000000
##  Likelihood ratio p-value:     0.0001143
## 
## (NOTE: The test may be sensitive to rare haplotypes)
Results of trial run no.2

Results of trial run no.2


Running analysis on GWAS data: HaplinSlide

This is very similar to running haplin, but the results are a list of haptable. This type of analysis is usually performed when we have much bigger data. Let’s read a proper .ped file:

exemplary.file3 <- paste0( dir.exmpl, "/exmpl_data.ped" )

hapSlide.data <- genDataRead( exemplary.file3, file.out = "hapSlide_data",
    format = "ped" )
## 'n.vars' was not given explicitly and will be set to 6 based on the format given.
## Reading the data in chunks...
##  -- chunk 1 --
##  -- chunk 2 -- 
## ... done reading.
## Reading the marker names...
## Warning: No map file given, map file empty or the number of map file rows not
## equal to the number of markers in data; will generate dummy marker names.
## ...done
## Preparing data...
## ... done preparing
## Saving data...
## ... saved to files: ./hapSlide_data_gen.ffData, ./hapSlide_data_gen.RData
hapSlide.data
## This is raw genetic data read in through genDataRead.
## 
## It contains the following parts:
##    cov.data, gen.data, aux 
## 
## with following dimensions:
##   - covariate variables =  id.fam, id.c, id.f, id.m, sex, cc 
##       (total  6  covariate variables),
##   - number of markers =  429 ,
##   - number of data lines =  1659

The preprocessing of this file would take a while. Let’s focus then on first 100 markers:

hapSlide.subset.data <- genDataGetPart( hapSlide.data, design = "cc",
    markers = 1:100, file.out = "hapSlide_subset_data" )
## Provided arguments:
##  --- chosen markers: 1, 2, 3, 4, 5, 6 ... 95, 96, 97, 98, 99, 100 
## INFO: Will select 1659 rows and 200 columns.
## opening ff /tmp/Rtmpsqe1qO/ff/ff10e41c69314d.ff
## Saving data... 
## ... saved to files: ./hapSlide_subset_data_gen.ffData, ./hapSlide_subset_data_gen.RData
hapSlide.subset.data.prep <- genDataPreprocess( hapSlide.subset.data,
    design = "cc.triad", file.out = "hapSlide_subset_prep" )
## Reading the marker names...
## Warning: No map file given, map file empty or the number of map file rows not
## equal to the number of markers in data; will generate dummy marker names.
## ...done
## Converting PED format to internal haplin...
##    Creating unique IDs for individuals...
##    ...done.
## 
## Checking family and id variables...
## Warning: Found family size larger than 3!  Will extract trios from general
## pedigree.
## Warning: 1100 mother code(s) and 1100 father code(s) refer to non-existing
## individuals and have been set to missing
##    Sorting and re-coding families...
## opening ff /tmp/Rtmpsqe1qO/ff/ff10e45af851e.ff
## Recoding covariate data...
## ...done
## Recoding genetic data (no. of loci: 100)...
##    ...running on only one CPU core! This may take some time...
##    ...checking alleles per SNP...
##    ...done, all alleles: C G A T 
##    ...recoding SNPs...
##    ...done
## Saving data... 
## ... saved to files: ./hapSlide_subset_prep_gen.ffData ,  ./hapSlide_subset_prep_gen.RData

We run the haplinSlide analysis:

hapSlide.res1 <- haplinSlide( hapSlide.subset.data.prep, use.missing = TRUE,
    ccvar = 10, design = "cc.triad", reference = "ref.cat", response = "mult" )
## INFO: No parallel environment selected. Will run in sequential mode.
## Running Haplin on Window 'm1' (1/100)...:
## opening ff /tmp/Rtmpsqe1qO/ff/ff10e45a7ed1ce.ff
## done
## Running Haplin on Window 'm2' (2/100)...: done
## Running Haplin on Window 'm3' (3/100)...: done
## Running Haplin on Window 'm4' (4/100)...: done
## Running Haplin on Window 'm5' (5/100)...: done
## Running Haplin on Window 'm6' (6/100)...: done
## Running Haplin on Window 'm7' (7/100)...: done
## Running Haplin on Window 'm8' (8/100)...: done
## Running Haplin on Window 'm9' (9/100)...: done
## Running Haplin on Window 'm10' (10/100)...: done
## Running Haplin on Window 'm11' (11/100)...: done
## Running Haplin on Window 'm12' (12/100)...: done
## Running Haplin on Window 'm13' (13/100)...: done
## Running Haplin on Window 'm14' (14/100)...: done
## Running Haplin on Window 'm15' (15/100)...: done
## Running Haplin on Window 'm16' (16/100)...: done
## Running Haplin on Window 'm17' (17/100)...: done
## Running Haplin on Window 'm18' (18/100)...: done
## Running Haplin on Window 'm19' (19/100)...: done
## Running Haplin on Window 'm20' (20/100)...: done
## Running Haplin on Window 'm21' (21/100)...: done
## Running Haplin on Window 'm22' (22/100)...: done
## Running Haplin on Window 'm23' (23/100)...: done
## Running Haplin on Window 'm24' (24/100)...: done
## Running Haplin on Window 'm25' (25/100)...: done
## Running Haplin on Window 'm26' (26/100)...: done
## Running Haplin on Window 'm27' (27/100)...: done
## Running Haplin on Window 'm28' (28/100)...: done
## Running Haplin on Window 'm29' (29/100)...: done
## Running Haplin on Window 'm30' (30/100)...: done
## Running Haplin on Window 'm31' (31/100)...: done
## Running Haplin on Window 'm32' (32/100)...: done
## Running Haplin on Window 'm33' (33/100)...: done
## Running Haplin on Window 'm34' (34/100)...: done
## Running Haplin on Window 'm35' (35/100)...: done
## Running Haplin on Window 'm36' (36/100)...: done
## Running Haplin on Window 'm37' (37/100)...: done
## Running Haplin on Window 'm38' (38/100)...: done
## Running Haplin on Window 'm39' (39/100)...: done
## Running Haplin on Window 'm40' (40/100)...: done
## Running Haplin on Window 'm41' (41/100)...: done
## Running Haplin on Window 'm42' (42/100)...: done
## Running Haplin on Window 'm43' (43/100)...: done
## Running Haplin on Window 'm44' (44/100)...: done
## Running Haplin on Window 'm45' (45/100)...: done
## Running Haplin on Window 'm46' (46/100)...: done
## Running Haplin on Window 'm47' (47/100)...: done
## Running Haplin on Window 'm48' (48/100)...: done
## Running Haplin on Window 'm49' (49/100)...: done
## Running Haplin on Window 'm50' (50/100)...: done
## Running Haplin on Window 'm51' (51/100)...: done
## Running Haplin on Window 'm52' (52/100)...: done
## Running Haplin on Window 'm53' (53/100)...: done
## Running Haplin on Window 'm54' (54/100)...: done
## Running Haplin on Window 'm55' (55/100)...: done
## Running Haplin on Window 'm56' (56/100)...: done
## Running Haplin on Window 'm57' (57/100)...: done
## Running Haplin on Window 'm58' (58/100)...: done
## Running Haplin on Window 'm59' (59/100)...: done
## Running Haplin on Window 'm60' (60/100)...: done
## Running Haplin on Window 'm61' (61/100)...: done
## Running Haplin on Window 'm62' (62/100)...: done
## Running Haplin on Window 'm63' (63/100)...: done
## Running Haplin on Window 'm64' (64/100)...: done
## Running Haplin on Window 'm65' (65/100)...: done
## Running Haplin on Window 'm66' (66/100)...: done
## Running Haplin on Window 'm67' (67/100)...: done
## Running Haplin on Window 'm68' (68/100)...: done
## Running Haplin on Window 'm69' (69/100)...: done
## Running Haplin on Window 'm70' (70/100)...: done
## Running Haplin on Window 'm71' (71/100)...: done
## Running Haplin on Window 'm72' (72/100)...: done
## Running Haplin on Window 'm73' (73/100)...: done
## Running Haplin on Window 'm74' (74/100)...: done
## Running Haplin on Window 'm75' (75/100)...: done
## Running Haplin on Window 'm76' (76/100)...: done
## Running Haplin on Window 'm77' (77/100)...: done
## Running Haplin on Window 'm78' (78/100)...: done
## Running Haplin on Window 'm79' (79/100)...: done
## Running Haplin on Window 'm80' (80/100)...: done
## Running Haplin on Window 'm81' (81/100)...: done
## Running Haplin on Window 'm82' (82/100)...: done
## Running Haplin on Window 'm83' (83/100)...: done
## Running Haplin on Window 'm84' (84/100)...: done
## Running Haplin on Window 'm85' (85/100)...: done
## Running Haplin on Window 'm86' (86/100)...: done
## Running Haplin on Window 'm87' (87/100)...: done
## Running Haplin on Window 'm88' (88/100)...: done
## Running Haplin on Window 'm89' (89/100)...: done
## Running Haplin on Window 'm90' (90/100)...: done
## Running Haplin on Window 'm91' (91/100)...: done
## Running Haplin on Window 'm92' (92/100)...: done
## Running Haplin on Window 'm93' (93/100)...: done
## Running Haplin on Window 'm94' (94/100)...: done
## Running Haplin on Window 'm95' (95/100)...: done
## Running Haplin on Window 'm96' (96/100)...: done
## Running Haplin on Window 'm97' (97/100)...: done
## Running Haplin on Window 'm98' (98/100)...: done
## Running Haplin on Window 'm99' (99/100)...: done
## Running Haplin on Window 'm100' (100/100)...: done
## 
##  --- haplinSlide has completed ---

Here, the output is stored as a list of tables, where each table contains results of the haplin run on a certain marker or a set of markers, called a window. The length of the window can be controlled by the winlength argument (see the haplinSlide help for details). Here, the default window length was used, 1 marker.

Investigating and plotting results

Due to the size of haplinSlide results, it is best to narrow down the visualization of the results. If you have installed the ggplot2 package, you can use the functions plotPValues and plot.haplinSlide. First, we advise to plot only the p-values of the estimation we’re interested in, using the plotPValues function:

all.p.values <- plotPValues( hapSlide.res1 )
Results of haplinSlide analysis in a plot of the overall p-values

Results of haplinSlide analysis in a plot of the overall p-values

By default, all the windows are plotted, but we can choose to plot results for the windows only with p-values below a certain threshold:

all.p.values <- plotPValues( hapSlide.res1, plot.signif.only = TRUE,
    signif.thresh = 0.25 )
Results of haplinSlide analysis in a plot of the overall p-values, for windows with p-values below 0.25

Results of haplinSlide analysis in a plot of the overall p-values, for windows with p-values below 0.25

The object all.p.values holds the table with p-values for all the windows plotted.

head( all.p.values )
##    marker haplos   est.name       est. lower upper    p.value star
## m1     m1     m1 pv.overall 0.15404214    NA    NA 0.15404214     
## m2     m2     m2 pv.overall 0.05300921    NA    NA 0.05300921     
## m4     m4     m4 pv.overall 0.07280665    NA    NA 0.07280665     
## m5     m5     m5 pv.overall 0.03280638    NA    NA 0.03280638    *
## m6     m6     m6 pv.overall 0.24185705    NA    NA 0.24185705     
## m7     m7     m7 pv.overall 0.24607721    NA    NA 0.24607721

Special effects: maternal or parent-of-origin

If we ran analysis with estimation of the maternal or parent-of-origin (PoO) effects, then we can choose to plot these p-values:

hapSlide.res2 <- haplinSlide( hapSlide.subset.data.prep, use.missing = TRUE,
  ccvar = 10, design = "cc.triad", poo = TRUE, reference = "ref.cat",
  response = "mult" )
## INFO: No parallel environment selected. Will run in sequential mode.
## Running Haplin on Window 'm1' (1/100)...: done
## Running Haplin on Window 'm2' (2/100)...: done
## Running Haplin on Window 'm3' (3/100)...: done
## Running Haplin on Window 'm4' (4/100)...: done
## Running Haplin on Window 'm5' (5/100)...: done
## Running Haplin on Window 'm6' (6/100)...: done
## Running Haplin on Window 'm7' (7/100)...: done
## Running Haplin on Window 'm8' (8/100)...: done
## Running Haplin on Window 'm9' (9/100)...: done
## Running Haplin on Window 'm10' (10/100)...: done
## Running Haplin on Window 'm11' (11/100)...: done
## Running Haplin on Window 'm12' (12/100)...: done
## Running Haplin on Window 'm13' (13/100)...: done
## Running Haplin on Window 'm14' (14/100)...: done
## Running Haplin on Window 'm15' (15/100)...: done
## Running Haplin on Window 'm16' (16/100)...: done
## Running Haplin on Window 'm17' (17/100)...: done
## Running Haplin on Window 'm18' (18/100)...: done
## Running Haplin on Window 'm19' (19/100)...: done
## Running Haplin on Window 'm20' (20/100)...: done
## Running Haplin on Window 'm21' (21/100)...: done
## Running Haplin on Window 'm22' (22/100)...: done
## Running Haplin on Window 'm23' (23/100)...: done
## Running Haplin on Window 'm24' (24/100)...: done
## Running Haplin on Window 'm25' (25/100)...: done
## Running Haplin on Window 'm26' (26/100)...: done
## Running Haplin on Window 'm27' (27/100)...: done
## Running Haplin on Window 'm28' (28/100)...: done
## Running Haplin on Window 'm29' (29/100)...: done
## Running Haplin on Window 'm30' (30/100)...: done
## Running Haplin on Window 'm31' (31/100)...: done
## Running Haplin on Window 'm32' (32/100)...: done
## Running Haplin on Window 'm33' (33/100)...: done
## Running Haplin on Window 'm34' (34/100)...: done
## Running Haplin on Window 'm35' (35/100)...: done
## Running Haplin on Window 'm36' (36/100)...: done
## Running Haplin on Window 'm37' (37/100)...: done
## Running Haplin on Window 'm38' (38/100)...: done
## Running Haplin on Window 'm39' (39/100)...: done
## Running Haplin on Window 'm40' (40/100)...: done
## Running Haplin on Window 'm41' (41/100)...: done
## Running Haplin on Window 'm42' (42/100)...: done
## Running Haplin on Window 'm43' (43/100)...: done
## Running Haplin on Window 'm44' (44/100)...: done
## Running Haplin on Window 'm45' (45/100)...: done
## Running Haplin on Window 'm46' (46/100)...: done
## Running Haplin on Window 'm47' (47/100)...: done
## Running Haplin on Window 'm48' (48/100)...: done
## Running Haplin on Window 'm49' (49/100)...: done
## Running Haplin on Window 'm50' (50/100)...: done
## Running Haplin on Window 'm51' (51/100)...: done
## Running Haplin on Window 'm52' (52/100)...: done
## Running Haplin on Window 'm53' (53/100)...: done
## Running Haplin on Window 'm54' (54/100)...: done
## Running Haplin on Window 'm55' (55/100)...: done
## Running Haplin on Window 'm56' (56/100)...: done
## Running Haplin on Window 'm57' (57/100)...: done
## Running Haplin on Window 'm58' (58/100)...: done
## Running Haplin on Window 'm59' (59/100)...: done
## Running Haplin on Window 'm60' (60/100)...: done
## Running Haplin on Window 'm61' (61/100)...: done
## Running Haplin on Window 'm62' (62/100)...: done
## Running Haplin on Window 'm63' (63/100)...: done
## Running Haplin on Window 'm64' (64/100)...: done
## Running Haplin on Window 'm65' (65/100)...: done
## Running Haplin on Window 'm66' (66/100)...: done
## Running Haplin on Window 'm67' (67/100)...: done
## Running Haplin on Window 'm68' (68/100)...: done
## Running Haplin on Window 'm69' (69/100)...: done
## Running Haplin on Window 'm70' (70/100)...: done
## Running Haplin on Window 'm71' (71/100)...: done
## Running Haplin on Window 'm72' (72/100)...: done
## Running Haplin on Window 'm73' (73/100)...: done
## Running Haplin on Window 'm74' (74/100)...: done
## Running Haplin on Window 'm75' (75/100)...: done
## Running Haplin on Window 'm76' (76/100)...: done
## Running Haplin on Window 'm77' (77/100)...: done
## Running Haplin on Window 'm78' (78/100)...: done
## Running Haplin on Window 'm79' (79/100)...: done
## Running Haplin on Window 'm80' (80/100)...: done
## Running Haplin on Window 'm81' (81/100)...: done
## Running Haplin on Window 'm82' (82/100)...: done
## Running Haplin on Window 'm83' (83/100)...: done
## Running Haplin on Window 'm84' (84/100)...: done
## Running Haplin on Window 'm85' (85/100)...: done
## Running Haplin on Window 'm86' (86/100)...: done
## Running Haplin on Window 'm87' (87/100)...: done
## Running Haplin on Window 'm88' (88/100)...: done
## Running Haplin on Window 'm89' (89/100)...: done
## Running Haplin on Window 'm90' (90/100)...: done
## Running Haplin on Window 'm91' (91/100)...: done
## Running Haplin on Window 'm92' (92/100)...: done
## Running Haplin on Window 'm93' (93/100)...: done
## Running Haplin on Window 'm94' (94/100)...: done
## Running Haplin on Window 'm95' (95/100)...: done
## Running Haplin on Window 'm96' (96/100)...: done
## Running Haplin on Window 'm97' (97/100)...: done
## Running Haplin on Window 'm98' (98/100)...: done
## Running Haplin on Window 'm99' (99/100)...: done
## Running Haplin on Window 'm100' (100/100)...: done
## 
##  --- haplinSlide has completed ---
all.p.values2 <- plotPValues( hapSlide.res2, which.p.val = "poo",
    plot.signif.only = TRUE, signif.thresh = 0.2 )
Results of haplinSlide analysis in a plot of the 'maternal' p-values, for windows with p-values below 0.2

Results of haplinSlide analysis in a plot of the ‘maternal’ p-values, for windows with p-values below 0.2

head( all.p.values2 )
##     marker                            haplos   est.name      est.     lower
## 724    m31 m31:\nc (40.5%),\n ref: G (59.5%) RRcm_RRcf1 1.4903807 0.9919165
## 725    m32 m32:\na (35.9%),\n ref: T (64.1%) RRcm_RRcf1 1.5457860 1.0327751
## 729    m36 m36:\na (45.3%),\n ref: T (54.7%) RRcm_RRcf1 0.7436971 0.4825578
## 736    m46 m46:\nc (37.8%),\n ref: T (62.2%) RRcm_RRcf1 0.6638964 0.4280048
## 740    m50     m50:\na (36%),\n ref: T (64%) RRcm_RRcf1 0.6640618 0.4342559
## 745    m55 m55:\nc (44.2%),\n ref: G (55.8%) RRcm_RRcf1 1.4191366 0.9415219
##        upper    p.value star
## 724 2.217478 0.05305328     
## 725 2.286814 0.03184453    *
## 729 1.128567 0.16509548     
## 736 1.017188 0.06214700     
## 740 1.001985 0.05340188     
## 745 2.111067 0.09117895

NOTE: here, the star column marks the significant p-values:

  • * marks p-value  < 0.05,
  • ** marks p-value  < 0.01,
  • *** marks p-value  < 0.001.

And then, we can plot the relative risks:

plot( hapSlide.res2, plot.signif.only = TRUE )
Results of haplinSlide analysis in a plot of the relative risks, for windows with p-values below 0.05. Coding of symbols: 'cdd' means a double allele dose (both parents gave the same allele), while 'cm_cf' is the ratio of the risks calculated for when the minor allele came from the mother to the risk when the minor allele came from the father (RRR = RR_cm / RR_cf).

Results of haplinSlide analysis in a plot of the relative risks, for windows with p-values below 0.05. Coding of symbols: ‘cdd’ means a double allele dose (both parents gave the same allele), while ‘cm_cf’ is the ratio of the risks calculated for when the minor allele came from the mother to the risk when the minor allele came from the father (RRR = RR_cm / RR_cf).

Here’s also how the results look like when we estimate the maternal effects, with the window length set to two:

hapSlide.res3 <- haplinSlide( hapSlide.subset.data.prep, use.missing = TRUE,
  ccvar = 10, design = "cc.triad", maternal = TRUE, reference = "ref.cat",
  response = "mult", winlength = 2 )
## 
## Important: Remember that SNPs must be in correct physical ordering
## for a sliding window approach to be meaningful!
## INFO: No parallel environment selected. Will run in sequential mode.
## Running Haplin on Window 'm1-m2' (1/99)...: done
## Running Haplin on Window 'm2-m3' (2/99)...: done
## Running Haplin on Window 'm3-m4' (3/99)...: done
## Running Haplin on Window 'm4-m5' (4/99)...: done
## Running Haplin on Window 'm5-m6' (5/99)...: done
## Running Haplin on Window 'm6-m7' (6/99)...: done
## Running Haplin on Window 'm7-m8' (7/99)...: done
## Running Haplin on Window 'm8-m9' (8/99)...: done
## Running Haplin on Window 'm9-m10' (9/99)...: done
## Running Haplin on Window 'm10-m11' (10/99)...: done
## Running Haplin on Window 'm11-m12' (11/99)...: done
## Running Haplin on Window 'm12-m13' (12/99)...: done
## Running Haplin on Window 'm13-m14' (13/99)...: done
## Running Haplin on Window 'm14-m15' (14/99)...: done
## Running Haplin on Window 'm15-m16' (15/99)...: done
## Running Haplin on Window 'm16-m17' (16/99)...: done
## Running Haplin on Window 'm17-m18' (17/99)...: done
## Running Haplin on Window 'm18-m19' (18/99)...: done
## Running Haplin on Window 'm19-m20' (19/99)...: done
## Running Haplin on Window 'm20-m21' (20/99)...: done
## Running Haplin on Window 'm21-m22' (21/99)...: done
## Running Haplin on Window 'm22-m23' (22/99)...: done
## Running Haplin on Window 'm23-m24' (23/99)...: done
## Running Haplin on Window 'm24-m25' (24/99)...: done
## Running Haplin on Window 'm25-m26' (25/99)...: done
## Running Haplin on Window 'm26-m27' (26/99)...: done
## Running Haplin on Window 'm27-m28' (27/99)...: done
## Running Haplin on Window 'm28-m29' (28/99)...: done
## Running Haplin on Window 'm29-m30' (29/99)...: done
## Running Haplin on Window 'm30-m31' (30/99)...: done
## Running Haplin on Window 'm31-m32' (31/99)...: done
## Running Haplin on Window 'm32-m33' (32/99)...: done
## Running Haplin on Window 'm33-m34' (33/99)...: done
## Running Haplin on Window 'm34-m35' (34/99)...: done
## Running Haplin on Window 'm35-m36' (35/99)...: done
## Running Haplin on Window 'm36-m37' (36/99)...: done
## Running Haplin on Window 'm37-m38' (37/99)...: done
## Running Haplin on Window 'm38-m39' (38/99)...: done
## Running Haplin on Window 'm39-m40' (39/99)...: done
## Running Haplin on Window 'm40-m41' (40/99)...: done
## Running Haplin on Window 'm41-m42' (41/99)...: done
## Running Haplin on Window 'm42-m43' (42/99)...: done
## Running Haplin on Window 'm43-m44' (43/99)...: done
## Running Haplin on Window 'm44-m45' (44/99)...: done
## Running Haplin on Window 'm45-m46' (45/99)...: done
## Running Haplin on Window 'm46-m47' (46/99)...: done
## Running Haplin on Window 'm47-m48' (47/99)...: done
## Running Haplin on Window 'm48-m49' (48/99)...: done
## Running Haplin on Window 'm49-m50' (49/99)...: done
## Running Haplin on Window 'm50-m51' (50/99)...: done
## Running Haplin on Window 'm51-m52' (51/99)...: done
## Running Haplin on Window 'm52-m53' (52/99)...: done
## Running Haplin on Window 'm53-m54' (53/99)...: done
## Running Haplin on Window 'm54-m55' (54/99)...: done
## Running Haplin on Window 'm55-m56' (55/99)...: done
## Running Haplin on Window 'm56-m57' (56/99)...: done
## Running Haplin on Window 'm57-m58' (57/99)...: done
## Running Haplin on Window 'm58-m59' (58/99)...: done
## Running Haplin on Window 'm59-m60' (59/99)...: done
## Running Haplin on Window 'm60-m61' (60/99)...: done
## Running Haplin on Window 'm61-m62' (61/99)...: done
## Running Haplin on Window 'm62-m63' (62/99)...: done
## Running Haplin on Window 'm63-m64' (63/99)...: done
## Running Haplin on Window 'm64-m65' (64/99)...: done
## Running Haplin on Window 'm65-m66' (65/99)...: done
## Running Haplin on Window 'm66-m67' (66/99)...: done
## Running Haplin on Window 'm67-m68' (67/99)...: done
## Running Haplin on Window 'm68-m69' (68/99)...: done
## Running Haplin on Window 'm69-m70' (69/99)...: done
## Running Haplin on Window 'm70-m71' (70/99)...: done
## Running Haplin on Window 'm71-m72' (71/99)...: done
## Running Haplin on Window 'm72-m73' (72/99)...: done
## Running Haplin on Window 'm73-m74' (73/99)...: done
## Running Haplin on Window 'm74-m75' (74/99)...: done
## Running Haplin on Window 'm75-m76' (75/99)...: done
## Running Haplin on Window 'm76-m77' (76/99)...: done
## Running Haplin on Window 'm77-m78' (77/99)...: done
## Running Haplin on Window 'm78-m79' (78/99)...: done
## Running Haplin on Window 'm79-m80' (79/99)...: done
## Running Haplin on Window 'm80-m81' (80/99)...: done
## Running Haplin on Window 'm81-m82' (81/99)...: done
## Running Haplin on Window 'm82-m83' (82/99)...: done
## Running Haplin on Window 'm83-m84' (83/99)...: done
## Running Haplin on Window 'm84-m85' (84/99)...: done
## Running Haplin on Window 'm85-m86' (85/99)...: done
## Running Haplin on Window 'm86-m87' (86/99)...: done
## Running Haplin on Window 'm87-m88' (87/99)...: done
## Running Haplin on Window 'm88-m89' (88/99)...: done
## Running Haplin on Window 'm89-m90' (89/99)...: done
## Running Haplin on Window 'm90-m91' (90/99)...: done
## Running Haplin on Window 'm91-m92' (91/99)...: done
## Running Haplin on Window 'm92-m93' (92/99)...: done
## Running Haplin on Window 'm93-m94' (93/99)...: done
## Running Haplin on Window 'm94-m95' (94/99)...: done
## Running Haplin on Window 'm95-m96' (95/99)...: done
## Running Haplin on Window 'm96-m97' (96/99)...: done
## Running Haplin on Window 'm97-m98' (97/99)...: done
## Running Haplin on Window 'm98-m99' (98/99)...: done
## Running Haplin on Window 'm99-m100' (99/99)...: done
## 
##  --- haplinSlide has completed ---
plot( hapSlide.res3, plot.signif.only = TRUE, signif.thresh = 0.01 )
Results of haplinSlide analysis in a plot of the relative risks, for windows with p-values below 0.05. The top panels show the relative risks (RR) when a given haplotype was found in the child, while the bottom panels show RR of the disease in the child, when a given haplotype occured in the mother. Coding of symbols: 'c' and 'cdd' means a single and double haplotype dose in the child, respectively; while 'm' and 'mdd' is a single and double haplotype dose in the mother.

Results of haplinSlide analysis in a plot of the relative risks, for windows with p-values below 0.05. The top panels show the relative risks (RR) when a given haplotype was found in the child, while the bottom panels show RR of the disease in the child, when a given haplotype occured in the mother. Coding of symbols: ‘c’ and ‘cdd’ means a single and double haplotype dose in the child, respectively; while ‘m’ and ‘mdd’ is a single and double haplotype dose in the mother.


Analysis of gene-environment interactions (GxE) with haplinStrat

If our dataset contains a column with environmental exposure of (usually) the mother, we can analyse it with haplinStrat, that calculates relative risks for strata defined by the categories of the environmental exposure variable. To do that, we use the strata argument which points to the column in the dataset that contains the environmental exposure categories.

hapStrat.res <- haplinStrat( data = trial.data2.prep, strata = 1, use.missing = TRUE,
  ccvar = 2, design = "cc.triad", reference = "ref.cat", response = "mult" )
## 
## ## Running haplinStrat ##
## 
## Selected stratification variable: cov.1.c
## Frequency distribution of stratification variable:
##   0   1   2   3   4 
## 565 164 235 122  53 
## 
## Running Haplin on full data file...Done
## 
## Running Haplin on stratum "0"...Done
## 
## Running Haplin on stratum "1"...Done
## 
## Running Haplin on stratum "2"...Done
## 
## Running Haplin on stratum "3"...Done
## 
## Running Haplin on stratum "4"...Done

The result of haplinStrat run is a list of haplin objects: one for the haplin run on the entire data, and then one for every stratum.

lapply( hapStrat.res, haptable )
## $all
##   marker alleles    counts     HWE.pv Original After.rem.NA After.rem.Mend.inc.
## 1     m1     a/G 1290/4120 0.68753925     1139         1139                1134
## 2     m2     a/B  142/5738 0.01013796     1139         1139                1134
## 3   <NA>    <NA>      <NA>         NA     1139         1139                1134
##   After.rem.unused.haplos   pv.overall haplos  haplofreq haplofreq.lower
## 1                    1134 0.0001143177    G-a 0.02638491      0.02147764
## 2                    1134 0.0001143177    a-B 0.21875556      0.20444082
## 3                    1134 0.0001143177    G-B 0.75465330      0.73924179
##   haplofreq.upper reference   RR.est. RR.lower RR.upper   RR.p.value RRdd.est.
## 1      0.03229311        -  0.7288033 0.406933 1.326609 2.941405e-01 0.5311542
## 2      0.23375110        -  1.5049152 1.233929 1.827839 4.427926e-05 2.2647697
## 3      0.76952889       ref 1.0000000 1.000000 1.000000           NA 1.0000000
##   RRdd.lower RRdd.upper RRdd.p.value
## 1  0.1655944   1.759891 2.941405e-01
## 2  1.5225812   3.340997 4.427926e-05
## 3  1.0000000   1.000000           NA
## 
## $`0`
##   marker alleles   counts     HWE.pv Original After.rem.NA After.rem.Mend.inc.
## 1     m1     a/G 611/2051 0.54762530      565          565                 563
## 2     m2     a/B  83/2825 0.08595161      565          565                 563
## 3   <NA>    <NA>     <NA>         NA      565          565                 563
##   After.rem.unused.haplos pv.overall haplos  haplofreq haplofreq.lower
## 1                     563 0.05946876    G-a 0.02998195      0.02305902
## 2                     563 0.05946876    a-B 0.21877540      0.19855388
## 3                     563 0.05946876    G-B 0.75095143      0.72875106
##   haplofreq.upper reference   RR.est.  RR.lower RR.upper RR.p.value RRdd.est.
## 1      0.03959983        -  0.5905714 0.2321331 1.480704  0.2596641 0.3487746
## 2      0.23995536        -  1.3839777 1.0002959 1.888952  0.0480650 1.9153942
## 3      0.77169777       ref 1.0000000 1.0000000 1.000000         NA 1.0000000
##   RRdd.lower RRdd.upper RRdd.p.value
## 1 0.05388576   2.192486    0.2596641
## 2 1.00059196   3.568141    0.0480650
## 3 1.00000000   1.000000           NA
## 
## $`1`
##   marker alleles  counts      HWE.pv Original After.rem.NA After.rem.Mend.inc.
## 1     m1     a/G 196/574 0.428881321      164          164                 163
## 2     m2     a/B  23/813 0.002084963      164          164                 163
## 3   <NA>    <NA>    <NA>          NA      164          164                 163
##   After.rem.unused.haplos pv.overall haplos  haplofreq haplofreq.lower
## 1                     163 0.04349637    G-a 0.02583657      0.01501303
## 2                     163 0.04349637    a-B 0.21901402      0.18258633
## 3                     163 0.04349637    G-B 0.75393845      0.71107490
##   haplofreq.upper reference  RR.est.  RR.lower RR.upper RR.p.value RRdd.est.
## 1      0.04576707        -  1.637815 0.5101295 5.121898 0.40752061  2.682440
## 2      0.26062166        -  1.875963 1.1400504 3.024137 0.01205527  3.519237
## 3      0.79186342       ref 1.000000 1.0000000 1.000000         NA  1.000000
##   RRdd.lower RRdd.upper RRdd.p.value
## 1  0.2602321  26.233838   0.40752061
## 2  1.2997148   9.145403   0.01205527
## 3  1.0000000   1.000000           NA
## 
## $`2`
##   marker alleles  counts    HWE.pv Original After.rem.NA After.rem.Mend.inc.
## 1     m1     a/G 281/873 0.3411622      235          235                 233
## 2     m2     a/B 20/1218 0.6828818      235          235                 233
## 3   <NA>    <NA>    <NA>        NA      235          235                 233
##   After.rem.unused.haplos pv.overall haplos  haplofreq haplofreq.lower
## 1                     233  0.1878774    G-a 0.01970935      0.01136969
## 2                     233  0.1878774    a-B 0.22348409      0.19239438
## 3                     233  0.1878774    G-B 0.75600156      0.72105225
##   haplofreq.upper reference   RR.est.  RR.lower RR.upper RR.p.value RRdd.est.
## 1      0.03349155        -  0.6061775 0.1373868 2.766825 0.51474061 0.3674512
## 2      0.25762541        -  1.3973086 0.9365884 2.057394 0.09994355 1.9524714
## 3      0.78851132       ref 1.0000000 1.0000000 1.000000         NA 1.0000000
##   RRdd.lower RRdd.upper RRdd.p.value
## 1 0.01887512   7.655323   0.51474061
## 2 0.87719783   4.232869   0.09994355
## 3 1.00000000   1.000000           NA
## 
## $`3`
##   marker alleles  counts    HWE.pv Original After.rem.NA After.rem.Mend.inc.
## 1     m1     a/G 137/447 0.5291981      122          122                 122
## 2     m2     a/B  12/610 0.7286501      122          122                 122
## 3   <NA>    <NA>    <NA>        NA      122          122                 122
##   After.rem.unused.haplos pv.overall haplos  haplofreq haplofreq.lower
## 1                     122  0.3866152    G-a 0.02094722      0.01001402
## 2                     122  0.3866152    a-B 0.20803203      0.16646432
## 3                     122  0.3866152    G-B 0.76940329      0.71883493
##   haplofreq.upper reference  RR.est.  RR.lower RR.upper RR.p.value RRdd.est.
## 1      0.04326867        -  1.064175 0.2211826 5.439074  0.9296104  1.132469
## 2      0.25734762        -  1.460580 0.8528910 2.525766  0.1690615  2.133295
## 3      0.81367005       ref 1.000000 1.0000000 1.000000         NA  1.000000
##   RRdd.lower RRdd.upper RRdd.p.value
## 1 0.04892176  29.583522    0.9296104
## 2 0.72742311   6.379496    0.1690615
## 3 1.00000000   1.000000           NA
## 
## $`4`
##   marker alleles counts    HWE.pv Original After.rem.NA After.rem.Mend.inc.
## 1     m1     a/G 65/175 0.1952259       53           53                  53
## 2     m2     a/B  4/272 0.8628440       53           53                  53
## 3   <NA>    <NA>   <NA>        NA       53           53                  53
##   After.rem.unused.haplos pv.overall haplos  haplofreq haplofreq.lower
## 1                      53 0.06436654    G-a 0.02841723      0.01079299
## 2                      53 0.06436654    a-B 0.22293073      0.15784872
## 3                      53 0.06436654    G-B 0.74541225      0.66168754
##   haplofreq.upper reference     RR.est.  RR.lower RR.upper RR.p.value
## 1      0.07385296        -  0.000746149 0.0000000      Inf 1.00000000
## 2      0.30402789        -  2.009912139 0.9253955 4.346706 0.07591341
## 3      0.81351133       ref 1.000000000 1.0000000 1.000000         NA
##      RRdd.est. RRdd.lower RRdd.upper RRdd.p.value
## 1 6.419717e-07   0.000000        Inf   1.00000000
## 2 4.039747e+00   0.856357   18.89385   0.07591341
## 3 1.000000e+00   1.000000    1.00000           NA

Plotting the results

We can plot the results easily with the plot function (provided that the ggplot2 package is installed on your system!):

plot( hapStrat.res )
Results of haplinStrat run.

Results of haplinStrat run.

Checking for significance of interactions

To check whether there is any significant interaction between the environmental exposure and genotypes, we use the gxe function:

gxe( hapStrat.res )
##           gxe.test      chisq df      pval
## 1       haplo.freq 2.74586596  8 0.9492800
## 2            child 3.45632944  8 0.9025524
## 3 haplo.freq.trend 0.08145869  2 0.9600889
## 4      child.trend 0.29423427  2 0.8631929