Title: | Analyzing Case-Parent Triad and/or Case-Control Data with SNP Haplotypes |
---|---|
Description: | Performs genetic association analyses of case-parent triad (trio) data with multiple markers. It can also incorporate complete or incomplete control triads, for instance independent control children. Estimation is based on haplotypes, for instance SNP haplotypes, even though phase is not known from the genetic data. 'Haplin' estimates relative risk (RR + conf.int.) and p-value associated with each haplotype. It uses maximum likelihood estimation to make optimal use of data from triads with missing genotypic data, for instance if some SNPs has not been typed for some individuals. 'Haplin' also allows estimation of effects of maternal haplotypes and parent-of-origin effects, particularly appropriate in perinatal epidemiology. 'Haplin' allows special models, like X-inactivation, to be fitted on the X-chromosome. A GxE analysis allows testing interactions between environment and all estimated genetic effects. The models were originally described in "Gjessing HK and Lie RT. Case-parent triads: Estimating single- and double-dose effects of fetal and maternal disease gene haplotypes. Annals of Human Genetics (2006) 70, pp. 382-396". |
Authors: | Hakon K. Gjessing [aut, cre], Miriam Gjerdevik [ctb] (functions 'lineByLine', 'cbindFiles', 'rbindFiles', 'snpPower', 'snpSampleSize', 'hapSim', 'hapRun', 'hapPower', 'hapPowerAsymp', and 'hapRelEff'), Julia Romanowska [ctb] (<https://orcid.org/0000-0001-6733-1953>, new data format, parallelisation, new documentation), Oivind Skare [ctb] (TDT tests) |
Maintainer: | Hakon K. Gjessing <[email protected]> |
License: | GPL (>= 2) |
Version: | 7.3.2 |
Built: | 2025-01-29 05:57:04 UTC |
Source: | https://github.com/cran/Haplin |
Takes a sequence of files and combines them column-wise (side-by-side), i.e. reads each file line by line, pastes corresponding lines, then writes to outfile. Combining files line by line is especially useful when working with large datasets, where the reading of entire files may be time consuming and require a large amount of memory.
cbindFiles(infiles, outfile, col.sep, ask = TRUE, verbose = TRUE)
cbindFiles(infiles, outfile, col.sep, ask = TRUE, verbose = TRUE)
infiles |
A character vector of names (and paths) of the files to combine. |
outfile |
A character string giving the name of the resulting file. The name of the file is relative to the current working directory, unless the file name contains a definite path. |
col.sep |
Specifies the column separator which will be inserted between files. |
ask |
Logical. Default is "TRUE". If set to "FALSE", an already existing outfile will be overwritten without asking. |
verbose |
Logical. Default is "TRUE", which means that the line number is displayed for each iteration, i.e. each combined line. |
The function cbind
combines R objects by columns. However, reading large datafiles may require a large amount of memory and be extremely time consuming.
cbindFiles
combines the files column-wise, one line at a time, and then writes each line to outfile
. This avoids reading the full files into memory.
If infiles
contains only one file, this file will be copied directly to outfile
, i.e. without any line-by-line modifications.
There is no useful output; the objective of cbindFiles
is to produce outfile
.
Miriam Gjerdevik,
with Hakon K. Gjessing
Professor of Biostatistics
Division of Epidemiology
Norwegian Institute of Public Health
[email protected]
Web Site: https://haplin.bitbucket.io
## Not run: # Combines the three infiles side-by-side cbindFiles(infiles = c("myfile1.txt", "myfile2.txt", "myfile3.txt"), outfile = "myfile_combined_by_columns.txt", col.sep = " ", ask = TRUE, verbose = TRUE) ## End(Not run)
## Not run: # Combines the three infiles side-by-side cbindFiles(infiles = c("myfile1.txt", "myfile2.txt", "myfile3.txt"), outfile = "myfile_combined_by_columns.txt", col.sep = " ", ask = TRUE, verbose = TRUE) ## End(Not run)
This function closes all the slaves spawned in initParallelRun and finishes the mpi routines. This function MUST BE called after all the haplinSlide calls and right before exiting the script/R session!
finishParallelRun()
finishParallelRun()
This function enables to extract (and save for later use) part of genetic data read in with genDataRead.
genDataGetPart( data.in = stop("No data given!", call. = FALSE), design = stop("Design type must be given!"), markers, indiv.ids, rows, cc, sex, file.out = "my_data_part", dir.out = ".", overwrite = NULL, ... )
genDataGetPart( data.in = stop("No data given!", call. = FALSE), design = stop("Design type must be given!"), markers, indiv.ids, rows, cc, sex, file.out = "my_data_part", dir.out = ".", overwrite = NULL, ... )
data.in |
The data object (in format as the output of genDataRead). |
design |
The design used in the study - choose from:
. Any of the following can be given to narrow down the dataset: |
markers |
Vector with numbers or names indicating which markers to choose. |
indiv.ids |
Character vector giving IDs of individuals. CAUTION: in a standard PED file, individual IDs are not unique, so this will select all individuals with given IDs. |
rows |
Numeric vector giving the positions - this will select only these rows. |
cc |
One or more values to choose based on case-control status ('cc' column). |
sex |
One or more values to choose based on the 'sex' column. |
file.out |
The base for the output filename (default: "my_data_part"). |
dir.out |
The path to the directory where the output files will be saved. |
overwrite |
Whether to overwrite the output files: if NULL (default), will prompt the user to give answer; set to TRUE, will automatically overwrite any existing files; and set to FALSE, will stop if the output files exist. |
... |
If any additional covariate data are available in |
The genetic data from GWAS studies can be quite large, and thus the analysis is time-consuming. If a user knows where they want to focus the analysis, they can use this function to extract part of the entire dataset and use only this part in subsequent Haplin analysis.
A list object with three elements:
cov.data - a data.frame
with covariate data (if available in
the input file)
gen.data - a list with chunks of the genetic data; the data is divided column-wise, using 10,000 columns per chunk; each element of this list is a ff matrix
aux - a list with meta-data and important parameters.
This now contains only the selected subset of data.
No checks are performed when choosing a subset of the data - it is the user's
obligation to check whether the data subset contains correct number of individuals
(especially important when using the triad
design study) and/or markers!
# The argument 'overwrite' is set to TRUE! # Read the data: examples.dir <- system.file( "extdata", package = "Haplin" ) example.file <- file.path( examples.dir, "HAPLIN.trialdata2.txt" ) my.gen.data.read <- genDataRead( file.in = example.file, file.out = "trial_data", dir.out = tempdir( check = TRUE ), format = "haplin", allele.sep = "", n.vars = 2, cov.header = c( "smoking", "sex" ), overwrite = TRUE ) my.gen.data.read # Extract part with only men: men.subset <- genDataGetPart( my.gen.data.read, design = "triad", sex = 1, dir.out = tempdir( check = TRUE ), file.out = "gen_data_men_only", overwrite = TRUE ) men.subset # Extract the part with only smoking women: women.smoke.subset <- genDataGetPart( my.gen.data.read, design = "triad", dir.out = tempdir( check = TRUE ), sex = 0, smoking = c( 1,2 ), overwrite = TRUE ) women.smoke.subset
# The argument 'overwrite' is set to TRUE! # Read the data: examples.dir <- system.file( "extdata", package = "Haplin" ) example.file <- file.path( examples.dir, "HAPLIN.trialdata2.txt" ) my.gen.data.read <- genDataRead( file.in = example.file, file.out = "trial_data", dir.out = tempdir( check = TRUE ), format = "haplin", allele.sep = "", n.vars = 2, cov.header = c( "smoking", "sex" ), overwrite = TRUE ) my.gen.data.read # Extract part with only men: men.subset <- genDataGetPart( my.gen.data.read, design = "triad", sex = 1, dir.out = tempdir( check = TRUE ), file.out = "gen_data_men_only", overwrite = TRUE ) men.subset # Extract the part with only smoking women: women.smoke.subset <- genDataGetPart( my.gen.data.read, design = "triad", dir.out = tempdir( check = TRUE ), sex = 0, smoking = c( 1,2 ), overwrite = TRUE ) women.smoke.subset
This function loads the data from the saved .ffData and .RData files, and prepares the data to subsequent analysis.
genDataLoad(filename = stop("'filename' must be given!"), dir.in = ".")
genDataLoad(filename = stop("'filename' must be given!"), dir.in = ".")
filename |
The base of the filenames; i.e. if the data is saved in "my_data_gen.ffData", "my_data_gen.RData" and "my_data_cov.RData", then the 'filename' should be "my_data". |
dir.in |
The path to the directory where files were saved (defaults to the current directory). |
A list object with three elements:
cov.data - a data.frame
with covariate data (if available in
the input file)
gen.data - a list with chunks of the genetic data; the data is divided column-wise, using 10,000 columns per chunk; each element of this list is a ff matrix
aux - a list with meta-data and important parameters.
This function prepares the data to be used in Haplin analysis
genDataPreprocess( data.in = stop("You have to give the object to preprocess!"), map.file, map.header = FALSE, design = "triad", file.out = "data_preprocessed", dir.out = ".", ncpu = 1, overwrite = NULL )
genDataPreprocess( data.in = stop("You have to give the object to preprocess!"), map.file, map.header = FALSE, design = "triad", file.out = "data_preprocessed", dir.out = ".", ncpu = 1, overwrite = NULL )
data.in |
Input data, as loaded by genDataRead or genDataLoad. |
map.file |
Filename (with path if the file is not in current directory) of the .map file holding the SNP names, if available. |
map.header |
Logical: does the map.file contain a header in the first row? Default: FALSE. |
design |
The design used in the study - choose from:
. |
file.out |
The core name of the files that will contain the preprocessed data (character string); ready to load next time with genDataLoad function; default: "data_preprocessed". |
dir.out |
The directory that will contain the saved data; defaults to current working directory. |
ncpu |
The number of CPU cores to use - this speeds up the process for large datasets significantly. Default is 1 core, maximum is 1 less than the total number of cores available on a current machine (even if the number given by the user is more than that). |
overwrite |
Whether to overwrite the output files: if NULL (default), will prompt the user to give answer; set to TRUE, will automatically overwrite any existing files; and set to FALSE, will stop if the output files exist. |
A list object with three elements:
cov.data - a data.frame
with covariate data (if available in
the input file)
gen.data - a list with chunks of the genetic data; the data is divided column-wise, using 10,000 columns per chunk; each element of this list is a ff matrix
aux - a list with meta-data and important parameters:
variables - tabulated information of the covariate data;
variables.nas - how many NA values per each column of covariate data;
alleles - all the possible alleles in each marker;
alleles.nas - how many NA values in each marker;
nrows.with.missing - how many rows contain any missing allele information;
which.rows.with.missing - vector of indices of rows with missing data (if any)
.
The .map file should contain at least two columns, where the second one contains SNP names. Any additional columns should be separated by a whitespace character, but will be ignored. The file should contain a header.
# The argument 'overwrite' is set to TRUE! # First, read the data: examples.dir <- system.file( "extdata", package = "Haplin" ) example.file <- file.path( examples.dir, "exmpl_data.ped" ) ped.data.read <- genDataRead( example.file, file.out = "exmpl_ped_data", dir.out = tempdir( check = TRUE ), format = "ped", overwrite = TRUE ) ped.data.read # Take only part of the data (if needed) ped.data.part <- genDataGetPart( ped.data.read, design = "triad", markers = 10:12, dir.out = tempdir( check = TRUE ), file.out = "exmpl_ped_data_part", overwrite = TRUE ) # Preprocess as "triad" data: ped.data.preproc <- genDataPreprocess( ped.data.part, design = "triad", dir.out = tempdir( check = TRUE ), file.out = "exmpl_data_preproc", overwrite = TRUE ) ped.data.preproc
# The argument 'overwrite' is set to TRUE! # First, read the data: examples.dir <- system.file( "extdata", package = "Haplin" ) example.file <- file.path( examples.dir, "exmpl_data.ped" ) ped.data.read <- genDataRead( example.file, file.out = "exmpl_ped_data", dir.out = tempdir( check = TRUE ), format = "ped", overwrite = TRUE ) ped.data.read # Take only part of the data (if needed) ped.data.part <- genDataGetPart( ped.data.read, design = "triad", markers = 10:12, dir.out = tempdir( check = TRUE ), file.out = "exmpl_ped_data_part", overwrite = TRUE ) # Preprocess as "triad" data: ped.data.preproc <- genDataPreprocess( ped.data.part, design = "triad", dir.out = tempdir( check = TRUE ), file.out = "exmpl_data_preproc", overwrite = TRUE ) ped.data.preproc
This function will read in data from PED or haplin formatted file.
genDataRead( file.in = stop("Filename must be given!", call. = FALSE), file.out = NULL, dir.out = ".", format = stop("Format parameter is required!"), header = FALSE, n.vars, cov.file.in, cov.header, map.file, map.header = FALSE, allele.sep = ";", na.strings = "NA", col.sep = "", overwrite = NULL )
genDataRead( file.in = stop("Filename must be given!", call. = FALSE), file.out = NULL, dir.out = ".", format = stop("Format parameter is required!"), header = FALSE, n.vars, cov.file.in, cov.header, map.file, map.header = FALSE, allele.sep = ";", na.strings = "NA", col.sep = "", overwrite = NULL )
file.in |
The name of the main input file with genotype information. |
file.out |
The base for the output filename (by default, constructed from the input file name). |
dir.out |
The path to the directory where the output files will be saved. |
format |
Format of data (will influence how data is processed) - choose from:
. |
header |
Whether the first line of the main input file contains column names; default: FALSE; NB: this is useful only for 'haplin'-formatted files! |
n.vars |
The number of columns with covariate data (if any) in the main file; NB: if the main file is in PED format, it is assumed that the first 6 columns contain the standard PED-covariates (i.e., family ID, ID of the child, father and mother, sex and case-control status), so in this case setting 'n.vars' is useful only if the PED file contains more than 6 covariate columns. |
cov.file.in |
Name of the file containing additional covariate data, if any. Caution: unless the 'cov.header' argument is used, it is assumed that the first line of this file contains the header (i.e., the column names of the additional data). |
cov.header |
The character vector containing the names of covariate columns (in the file with additional covariate data if given by the 'cov.file.in' argument; or in the main file, if it's a "haplin"-formatted file). |
map.file |
Filename (with path if the file is not in current directory) of the .map file holding the SNP names, if available (see Details). |
map.header |
Logical: does the map.file contain a header in the first row? Default: FALSE. |
allele.sep |
Character: separator between two alleles (default: ";"). |
na.strings |
Character or NA: how the missing data is coded (default: "NA"). |
col.sep |
Character: separator between the columns (i.e., markers; default: any whitespace character). |
overwrite |
Whether to overwrite the output files: if NULL (default), will prompt the user to give answer; set to TRUE, will automatically overwrite any existing files; and set to FALSE, will stop if the output files exist. |
The function reads in all the data in the file, creates ff objects to store
the genetic information and data.frame to store covariate data (if any). These
objects are saved in .RData
and .ffData
files, which can be later on
easily uploaded to R (with genDataLoad) and re-used.
A list object with three elements:
cov.data - a data.frame
with covariate data (if available in
the input file)
gen.data - a list with chunks of the genetic data; the data is divided column-wise, using 10,000 columns per chunk; each element of this list is a ff matrix
aux - a list with meta-data and important parameters.
The .map file should contain at least two columns, where the second one contains SNP names. Any additional columns should be separated by a whitespace character, but will be ignored. The file should contain a header.
When reading in a covariate file together with the genotype information, it is advised to include the header in the file, so that there is no doubt to the naming of the data columns.
# The argument 'overwrite' is set to TRUE! examples.dir <- system.file( "extdata", package = "Haplin" ) # ped format: example.file2 <- file.path( examples.dir, "exmpl_data.ped" ) ped.data.read <- genDataRead( example.file2, file.out = "exmpl_ped_data", dir.out = tempdir( check = TRUE ), format = "ped", overwrite = TRUE ) ped.data.read # haplin format: example.file1 <- file.path( examples.dir, "HAPLIN.trialdata2.txt" ) haplin.data.read <- genDataRead( file.in = example.file1, file.out = "exmpl_haplin_data", format = "haplin", allele.sep = "", n.vars = 2, cov.header = c( "smoking", "sex" ), overwrite = TRUE, dir.out = tempdir( check = TRUE ) ) haplin.data.read
# The argument 'overwrite' is set to TRUE! examples.dir <- system.file( "extdata", package = "Haplin" ) # ped format: example.file2 <- file.path( examples.dir, "exmpl_data.ped" ) ped.data.read <- genDataRead( example.file2, file.out = "exmpl_ped_data", dir.out = tempdir( check = TRUE ), format = "ped", overwrite = TRUE ) ped.data.read # haplin format: example.file1 <- file.path( examples.dir, "HAPLIN.trialdata2.txt" ) haplin.data.read <- genDataRead( file.in = example.file1, file.out = "exmpl_haplin_data", format = "haplin", allele.sep = "", n.vars = 2, cov.header = c( "smoking", "sex" ), overwrite = TRUE, dir.out = tempdir( check = TRUE ) ) haplin.data.read
Wrapper function for genDataGetPart that returns a subset of the data containing only children.
getChildren( data.in = stop("No data given!", call. = FALSE), file.out = "my_data_onlyChildren", dir.out = ".", overwrite = NULL )
getChildren( data.in = stop("No data given!", call. = FALSE), file.out = "my_data_onlyChildren", dir.out = ".", overwrite = NULL )
data.in |
The data object (in format as the output of genDataRead); note that the design of the data is assumed to be triad. |
file.out |
The base for the output filename (default: "my_data_onlyChildren"). |
dir.out |
The path to the directory where the output files will be saved. |
overwrite |
Whether to overwrite the output files: if NULL (default), will prompt the user to give answer; set to TRUE, will automatically overwrite any existing files; and set to FALSE, will stop if the output files exist. |
A list object with three elements:
cov.data - a data.frame
with covariate data (if available in
the input file)
gen.data - a list with chunks of the genetic data; the data is divided column-wise, using 10,000 columns per chunk; each element of this list is a ff matrix
aux - a list with meta-data and important parameters.
This now contains only the selected subset of data.
Wrapper function for genDataGetPart that returns a subset of the data containing only dyads (where the child and only one parent have genetic data), i.e., not triads.
getDyads( data.in = stop("No data given!", call. = FALSE), file.out = "my_data_onlyDyads", dir.out = ".", overwrite = NULL )
getDyads( data.in = stop("No data given!", call. = FALSE), file.out = "my_data_onlyDyads", dir.out = ".", overwrite = NULL )
data.in |
The data object (in format as the output of genDataRead); note that the design of the data is assumed to be "triad". |
file.out |
The base for the output filename (default: "my_data_onlyDyads"). |
dir.out |
The path to the directory where the output files will be saved (default: ".", the current directory). |
overwrite |
Whether to overwrite the output files: if NULL (default), will prompt the user to give answer; set to TRUE, will automatically overwrite any existing files; and set to FALSE, will stop if the output files exist. |
A list object with three elements:
cov.data - a data.frame
with covariate data (if available in
the input file)
gen.data - a list with chunks of the genetic data; the data is divided column-wise, using 10,000 columns per chunk; each element of this list is a ff matrix
aux - a list with meta-data and important parameters.
This now contains only the selected subset of data.
Wrapper function for genDataGetPart that returns a subset of the data containing only fathers.
getFathers( data.in = stop("No data given!", call. = FALSE), file.out = "my_data_onlyFathers", dir.out = ".", overwrite = NULL )
getFathers( data.in = stop("No data given!", call. = FALSE), file.out = "my_data_onlyFathers", dir.out = ".", overwrite = NULL )
data.in |
The data object (in format as the output of genDataRead); note that the design of the data is assumed to be triad. |
file.out |
The base for the output filename (default: "my_data_onlyFathers"). |
dir.out |
The path to the directory where the output files will be saved. |
overwrite |
Whether to overwrite the output files: if NULL (default), will prompt the user to give answer; set to TRUE, will automatically overwrite any existing files; and set to FALSE, will stop if the output files exist. |
A list object with three elements:
cov.data - a data.frame
with covariate data (if available in
the input file)
gen.data - a list with chunks of the genetic data; the data is divided column-wise, using 10,000 columns per chunk; each element of this list is a ff matrix
aux - a list with meta-data and important parameters.
This now contains only the selected subset of data.
Wrapper function for genDataGetPart that returns a subset of the data containing only full triads (where all, the child, the mother and the father have genetic data).
getFullTriads( data.in = stop("No data given!", call. = FALSE), file.out = "my_data_onlyTriads", dir.out = ".", overwrite = NULL )
getFullTriads( data.in = stop("No data given!", call. = FALSE), file.out = "my_data_onlyTriads", dir.out = ".", overwrite = NULL )
data.in |
The data object (in format as the output of genDataRead); note that the design of the data is assumed to be triad. |
file.out |
The base for the output filename (default: "my_data_onlyTriads"). |
dir.out |
The path to the directory where the output files will be saved. |
overwrite |
Whether to overwrite the output files: if NULL (default), will prompt the user to give answer; set to TRUE, will automatically overwrite any existing files; and set to FALSE, will stop if the output files exist. |
A list object with three elements:
cov.data - a data.frame
with covariate data (if available in
the input file)
gen.data - a list with chunks of the genetic data; the data is divided column-wise, using 10,000 columns per chunk; each element of this list is a ff matrix
aux - a list with meta-data and important parameters.
This now contains only the selected subset of data.
Wrapper function for genDataGetPart that returns a subset of the data containing only mothers.
getMothers( data.in = stop("No data given!", call. = FALSE), file.out = "my_data_onlyMothers", dir.out = ".", overwrite = NULL )
getMothers( data.in = stop("No data given!", call. = FALSE), file.out = "my_data_onlyMothers", dir.out = ".", overwrite = NULL )
data.in |
The data object (in format as the output of genDataRead); note that the design of the data is assumed to be triad. |
file.out |
The base for the output filename (default: "my_data_onlyMothers"). |
dir.out |
The path to the directory where the output files will be saved. |
overwrite |
Whether to overwrite the output files: if NULL (default), will prompt the user to give answer; set to TRUE, will automatically overwrite any existing files; and set to FALSE, will stop if the output files exist. |
A list object with three elements:
cov.data - a data.frame
with covariate data (if available in
the input file)
gen.data - a list with chunks of the genetic data; the data is divided column-wise, using 10,000 columns per chunk; each element of this list is a ff matrix
aux - a list with meta-data and important parameters.
This now contains only the selected subset of data.
Performes a gene-environment test to check if haplin
estimates of relative risks change over strata of environment. It is typically applied to the output from haplinStrat
gxe(object.list)
gxe(object.list)
object.list |
A list of |
haplinStrat
runs haplin
first on the entire input data file, then on each stratum separately. The results from haplinStrat
are similar to just manually splitting the file into strata and running haplin
on each, with one important difference, however: Since some strata may be small etc., haplin might conceivably choose different haplotypes in different strata, and also choose different reference haplotypes. When first running haplin on the entire file, haplinStrat
saves the selected haplotypes and chosen reference category. Then, in the strata-specific runs haplinStrat forces haplin to choose the same haplotypes/reference category in all runs, so that results from different strata are comparable.
When applying gxe
to the output from haplinStrat, it will test whether there is a statistically significant change in parameter estimates from stratum to stratum, i.e. a gene-environment interaction since strata usually are defined by an environmental exposure.
gxe
uses Wald tests to test for interactions. It always tests whether there is change in haplotype frequencies from stratum to stratum. More importantly, it separately tests whether any genetic effects, such as fetal genetic effects, maternal effects, or parent-of-origin effects, change significantly over strata.
gxe
can also be run from within haplinSlide
by using the strata
argument in haplinSlide
.
A dataframe with one row for each test that is performed (haplo.freq is the first, the remaining depend on the model that has been estimated). The Wald chi-squared test value, degrees-of-freedom, and resulting p-value are reported.
In the future, the structure of the output from gxe will change. In particular, measures of ratios of relative risks will be reported in addition to the p-values
Further information is found on the web page.
Hakon K. Gjessing
Professor of Biostatistics
Division of Epidemiology
Norwegian Institute of Public Health
[email protected]
Gjessing HK and Lie RT. Case-parent triads: Estimating single- and double-dose effects of fetal and maternal disease gene haplotypes. Annals of Human Genetics (2006) 70, pp. 382-396.
Web Site: https://haplin.bitbucket.io
haplin
, haplinStrat
, haplinSlide
## Not run: # All standard haplin runs can be done with haplinStrat. # Below is an illustration. See the haplin help page for more # examples. # # Analyzing the effect of fetal genes, including triads with missing data, # using a multiplicative response model. The first column of the data file # in this example contains the stratification variable. result <- haplinStrat("C:/work/data.dat", strata = 1, use.missing = T, response = "mult", reference = "ref.cat", winlength = 1) # Provide summary of separate results: lapply(result, summary) # Plot results separately: par(ask = T) lapply(result, plot) # # Convert results to table format and stack them over strata: haptable(result) # Test for interaction between haplotype risk estimates and the strata variable: gxe(result) ## End(Not run)
## Not run: # All standard haplin runs can be done with haplinStrat. # Below is an illustration. See the haplin help page for more # examples. # # Analyzing the effect of fetal genes, including triads with missing data, # using a multiplicative response model. The first column of the data file # in this example contains the stratification variable. result <- haplinStrat("C:/work/data.dat", strata = 1, use.missing = T, response = "mult", reference = "ref.cat", winlength = 1) # Provide summary of separate results: lapply(result, summary) # Plot results separately: par(ask = T) lapply(result, plot) # # Convert results to table format and stack them over strata: haptable(result) # Test for interaction between haplotype risk estimates and the strata variable: gxe(result) ## End(Not run)
haplin
fits a log-linear model to case-parent triads, case-control data, or combined (hybrid) case-parent control-parent triads or dyads. It estimates marker or haplotype frequencies, and uses the EM algorithm to reconstruct haplotypes and, if requested, impute missing genotypes. haplin
prints and plots estimates of relative risks associated with fetal and maternal haploypes, and in addition allows splitting fetal haplotype effects into maternally and paternally inherited effects. It allows special models, like x-inactivation, to be fitted on the X-chromosome. The result is an object of class haplin
, which can be explored with summary
, plot
, and haptable
.
haplin( data, markers = "ALL", design = "triad", use.missing = FALSE, xchrom = FALSE, maternal = FALSE, test.maternal = FALSE, poo = FALSE, scoretest = "no", ccvar = NULL, strata = NULL, sex = NULL, comb.sex = "double", reference = "reciprocal", response = "free", threshold = 0.01, max.haplos = NULL, haplo.file = NULL, resampling = "no", max.EM.iter = 50, data.out = "no", verbose = TRUE, printout = TRUE )
haplin( data, markers = "ALL", design = "triad", use.missing = FALSE, xchrom = FALSE, maternal = FALSE, test.maternal = FALSE, poo = FALSE, scoretest = "no", ccvar = NULL, strata = NULL, sex = NULL, comb.sex = "double", reference = "reciprocal", response = "free", threshold = 0.01, max.haplos = NULL, haplo.file = NULL, resampling = "no", max.EM.iter = 50, data.out = "no", verbose = TRUE, printout = TRUE )
data |
An R-object which is the result of using genDataPreprocess. See the web page for a detailed description of how to use this function. |
markers |
Default is "ALL", which means |
design |
The value "triad" is used for the standard case triad design, without independent controls. The value "cc.triad" means a combination of case triads and control triads. This requires the argument |
use.missing |
A logical value used to determine whether triads with missing data should be included in the analysis. When set to TRUE, |
xchrom |
Logical, defaults to "FALSE". If set to "TRUE", |
maternal |
If TRUE, maternal effects are estimated as well as the standard fetal effects. |
test.maternal |
Not yet implemented. |
poo |
Parent-Of-Origin effects. If TRUE, |
scoretest |
Special interest only. If "no", no score test is computed. If "yes", an overall score p-value is included in the output, and the individual score values are returned in the |
ccvar |
Numeric. Should give the column number for the column containing the case-control indicator in the data file. Needed for the "cc" and "cc.triad" designs. The column should contain two numeric values, of which the largest one is always used to denote cases. |
strata |
Not yet implemented. |
sex |
To be used with |
comb.sex |
To be used with |
reference |
Decides how |
response |
The default value "free" means that both single- and double dose effects are estimated. Choosing "mult" instead specifies a multiplicative dose-response model. |
threshold |
Sets the (approximate) lower limit for the haplotype frequencies of those haplotypes that should be retained in the analysis. Hapotypes that are less frequent are removed, and information about this is given in the output. Default is 0.01. |
max.haplos |
Not yet implemented. |
haplo.file |
Not yet implemented. |
resampling |
Mostly for testing. Default is "no". When "no", the individual haplotypes reconstructed by the EM algorithm as assumed known when computing CIs and p-values. If set to "jackknife" a jackknife-based resampling procedure is used when computing confidence intervals and p-values for effect estimates. This takes more time, but corrects the CIs and p-values for the uncertainty contained in unphased data. Note: in all recent versions of |
max.EM.iter |
The maximum number of iterations used by the EM algorithm. This value can be increased if necessary, which sometimes is the case with e.g. case-control data which a substantial amount of missing. However, for triad data with little missing information there is usually no need for many iterations. |
data.out |
Character. Accepts values "no", "prelim", "null" or "full", with "no" as default. For values other than default, |
verbose |
Default is T (=TRUE). During the EM algorithm, |
printout |
Logical. If TRUE (default), |
Input data can be either a haplin
format data file, or a PED data. These have to be loaded into R first, using genDataRead or genDataLoad functions, and then pre-processed with the genDataPreprocess function. If the PED data file is used, the arguments filename
, n.vars
, sep
, allele.sep
, na.strings
, ccvar
, and sex
need not be specified.
The output can be examined by print
, summary
, plot
and haptable
.
An object of class haplin
is returned.
(The only exception is when data.out
is set different from "no", where haplin
will produce a data file with haplotypes identified.)
Typically, some of the included haplotypes will be relatively rare, such as a frequency of 1% - 5%. For those haplotypes there may be too little data to estimate the double doses properly, so the estimates may be unreliable. This is seen from the extremely wide confidence intervals. The rare double dose estimates should be disregarded, but the remaining single and double dose estimates are valid. To avoid the problem one can also reduce the model to a purely multiplicative model by setting response = "mult"
combined with reference = "ref.cat"
.
Further information is found on the web page.
Hakon K. Gjessing
Professor of Biostatistics
Division of Epidemiology
Norwegian Institute of Public Health
[email protected]
Gjessing HK and Lie RT. Case-parent triads: Estimating single- and double-dose effects of fetal and maternal disease gene haplotypes. Annals of Human Genetics (2006) 70, pp. 382-396.
Web Site: https://haplin.bitbucket.io
summary.haplin
, plot.haplin
, pedToHaplin
, haptable
, haplinSlide
, genDataLoad
, genDataRead
, genDataPreprocess
# setting up the directory with exemplary data dir.in <- system.file( "extdata", package = "Haplin" ) file.in <- file.path( dir.in, "data.dat" ) # reading data in data.in <- genDataRead( file.in, file.out = "poo_exmpl_data_read", format = "haplin", dir.out = tempdir( check = TRUE ), n.vars = 1, allele.sep = " ", col.sep = " ", overwrite = TRUE ) # preprocessing the data data.preproc <- genDataPreprocess( data.in, design = "triad", file.out = "poo_exmpl_data_preproc", dir.out = tempdir( check = TRUE ), overwrite = TRUE ) # running haplin, calculating POO res.POO <- haplin( data.preproc, markers = 2, poo = TRUE, response = "mult", reference = 2, use.missing = TRUE ) res.POO ## Not run: # 1. Read the data: my.haplin.data <- genDataRead( file.in = "HAPLIN.trialdata.txt", file.out = "trial_data1", dir.out = ".", format = "haplin", n.vars = 0 ) # 2. Run pre-processing: haplin.data.prep <- genDataPreprocess( data.in = my.haplin.data, format = "haplin", design = "triad", file.out = "trial_data1_prep", dir.out = "." ) # 3. Analyze: # Standard run: haplin( haplin.data.prep ) # Specify path, estimate maternal effects: haplin( haplin.data.prep, maternal = T ) # Specify path, use haplotype no. 2 as reference: haplin( haplin.data.prep, reference = 2 ) # Remove more haplotypes from estimation by increasing the threshold # to 5%: haplin( haplin.data.prep, threshold = 0.05 ) # Estimate maternal effects, using the most frequent haplotype as reference. # Use all data, including triads with missing data. Select # markers 3, 4 and 8 from the supplied data. haplin( haplin.data.prep, use.missing = T, maternal = T, reference = "ref.cat", markers = c(3,4,8) ) # Note: in this version of haplin, the jackknife is # no longer necessary since the standard errors are already corrected. # Some examples showing how to save the haplin result and later # recall plot and summary results: # Same analysis as above, saving the result in the object "result.1": result.1 <- haplin( haplin.data.prep, use.missing = T, maternal = T, reference = "ref.cat", markers = c(3,4,8) ) # Replot the saved result (fetal effects): plot( result.1 ) # Replot the saved result (maternal effects): plot( result.1, plot.maternal = T ) # Print a very short summary of saved result: result.1 # A full summary of saved result, with confidence intervals and # p-values (the same as haplin prints when running): summary( result.1 ) # Some examples when the data file contains two covariates, # the second is the case-control variable: # The following standard triad run is INCORRECT since it disregards # case status: haplin("data.dat", use.missing = T, n.vars = 2, design = "triad") # Combined run on "hybrid" design, correctly using both case-parent # triads and control-parent triads: haplin( my.haplin.data, use.missing = T, n.vars = 2, ccvar = 2, design = "cc.triad" ) # If parent columns are not in the file, a plain case-control # run can be used: haplin( my.haplin.data, use.missing = T, n.vars = 2, ccvar = 2, design = "cc", response = "mult", reference = "ref.cat" ) # An example of how to produce a data file with all possible haplotypes # identified for each triad, together with their probaility weights: result.data <- haplin( my.haplin.data, use.missing = T, markers = c(3,4,8), data.out = "prelim" ) # result.data will then contain the data file, with a vector of # probabilities (freq) computed from the preliminary haplotype # frequencies. ## End(Not run)
# setting up the directory with exemplary data dir.in <- system.file( "extdata", package = "Haplin" ) file.in <- file.path( dir.in, "data.dat" ) # reading data in data.in <- genDataRead( file.in, file.out = "poo_exmpl_data_read", format = "haplin", dir.out = tempdir( check = TRUE ), n.vars = 1, allele.sep = " ", col.sep = " ", overwrite = TRUE ) # preprocessing the data data.preproc <- genDataPreprocess( data.in, design = "triad", file.out = "poo_exmpl_data_preproc", dir.out = tempdir( check = TRUE ), overwrite = TRUE ) # running haplin, calculating POO res.POO <- haplin( data.preproc, markers = 2, poo = TRUE, response = "mult", reference = 2, use.missing = TRUE ) res.POO ## Not run: # 1. Read the data: my.haplin.data <- genDataRead( file.in = "HAPLIN.trialdata.txt", file.out = "trial_data1", dir.out = ".", format = "haplin", n.vars = 0 ) # 2. Run pre-processing: haplin.data.prep <- genDataPreprocess( data.in = my.haplin.data, format = "haplin", design = "triad", file.out = "trial_data1_prep", dir.out = "." ) # 3. Analyze: # Standard run: haplin( haplin.data.prep ) # Specify path, estimate maternal effects: haplin( haplin.data.prep, maternal = T ) # Specify path, use haplotype no. 2 as reference: haplin( haplin.data.prep, reference = 2 ) # Remove more haplotypes from estimation by increasing the threshold # to 5%: haplin( haplin.data.prep, threshold = 0.05 ) # Estimate maternal effects, using the most frequent haplotype as reference. # Use all data, including triads with missing data. Select # markers 3, 4 and 8 from the supplied data. haplin( haplin.data.prep, use.missing = T, maternal = T, reference = "ref.cat", markers = c(3,4,8) ) # Note: in this version of haplin, the jackknife is # no longer necessary since the standard errors are already corrected. # Some examples showing how to save the haplin result and later # recall plot and summary results: # Same analysis as above, saving the result in the object "result.1": result.1 <- haplin( haplin.data.prep, use.missing = T, maternal = T, reference = "ref.cat", markers = c(3,4,8) ) # Replot the saved result (fetal effects): plot( result.1 ) # Replot the saved result (maternal effects): plot( result.1, plot.maternal = T ) # Print a very short summary of saved result: result.1 # A full summary of saved result, with confidence intervals and # p-values (the same as haplin prints when running): summary( result.1 ) # Some examples when the data file contains two covariates, # the second is the case-control variable: # The following standard triad run is INCORRECT since it disregards # case status: haplin("data.dat", use.missing = T, n.vars = 2, design = "triad") # Combined run on "hybrid" design, correctly using both case-parent # triads and control-parent triads: haplin( my.haplin.data, use.missing = T, n.vars = 2, ccvar = 2, design = "cc.triad" ) # If parent columns are not in the file, a plain case-control # run can be used: haplin( my.haplin.data, use.missing = T, n.vars = 2, ccvar = 2, design = "cc", response = "mult", reference = "ref.cat" ) # An example of how to produce a data file with all possible haplotypes # identified for each triad, together with their probaility weights: result.data <- haplin( my.haplin.data, use.missing = T, markers = c(3,4,8), data.out = "prelim" ) # result.data will then contain the data file, with a vector of # probabilities (freq) computed from the preliminary haplotype # frequencies. ## End(Not run)
Produces a list, each element of which is an object of class haplin
, which is the result of fitting the log-linear haplin
models to the data one "window" at a time.
haplinSlide( data, markers = "ALL", winlength = 1, strata = NULL, table.output = TRUE, cpus = 1, para.env = NULL, slaveOutfile = "", printout = FALSE, verbose = FALSE, ...)
haplinSlide( data, markers = "ALL", winlength = 1, strata = NULL, table.output = TRUE, cpus = 1, para.env = NULL, slaveOutfile = "", printout = FALSE, verbose = FALSE, ...)
data |
R-object of class "haplin.ready", which is e.g., output from |
markers |
Default is "ALL", which means haplinSlide uses all available markers in the data set in the analysis. Alternatively, the relevant markers can be specified by giving a vector or numbers (e.g., |
winlength |
Length of the sliding, overlapping windows to be run along the markers. See details. |
strata |
A single numeric value specifying which data column contains the stratification variable. |
table.output |
If |
cpus |
|
para.env |
The user can choose parallel environment to use — "parallel" (default) or "Rmpi" (for use on clusters); this option is used only when |
slaveOutfile |
Character. To be used when |
printout |
Default is FALSE. If TRUE, provides a full summary of each |
verbose |
Same as for |
... |
Remaining arguments to be used by |
haplinSlide
runs haplin
on a series of overlapping windows of the chosen markers. Except for the markers
and winlength
arguments, all arguments are used exactly as in haplin
itself. For instance, if markers = c(1, 3, 4, 5, 7, 8)
and winlength = 4
, haplinSlide
will run haplin
on first the markers c(1, 3, 4, 5)
, then on c(3, 4, 5, 7)
, and finally on c(4, 5, 7, 8)
. The results are returned in a list. The elements are named "1-3-4-5" etc., and can be extracted with, say, summary(res[["1-3-4-5"]])
etc., where res
is the saved result. Or the output can be examined by, for instance, using lapply(res, summary)
and lapply(res, plot)
.
When running haplinSlide
on a large number of markers, the output can become prohibitively large. In that case table.output
should be set to TRUE
, and haplinSlide
will return a list of summary "haptables". This list can then be stacked into a single dataframe using toDataFrame
. To avoid exessive memory use, the default is table.output = TRUE
.
When multiple cores are available, set the cpus
to the number of cores that should be used. This will run haplinSlide
in parallel on the chosen number of cores. Note that feedback is provided by each of the cores separately, and some cores may start working on markers far out in the sequence.
A list of objects of class haplin is returned.
Further information is found on the web page.
Hakon K. Gjessing
Professor of Biostatistics
Division of Epidemiology
Norwegian Institute of Public Health
[email protected]
Gjessing HK and Lie RT. Case-parent triads: Estimating single- and double-dose effects of fetal and maternal disease gene haplotypes. Annals of Human Genetics (2006) 70, pp. 382-396.
Web Site: https://haplin.bitbucket.io
haplin
, summary.haplin
, plot.haplin
, haptable
, toDataFrame
## Not run: # (Almost) all standard haplin runs can be done with haplinSlide. # Below is an illustration. See the haplin help page for more # examples. # # 1. Read the data: my.haplin.data <- genDataRead( file.in = "HAPLIN.trialdata.txt", file.out = "trial_data1", dir.out = tempdir( check = TRUE ), format = "haplin", n.vars = 0 ) # 2. Run pre-processing: haplin.data.prep <- genDataPreprocess( data.in = my.haplin.data, format = "haplin", design = "triad", file.out = "trial_data1_prep", dir.out = tempdir( check = TRUE ) ) # 3. Analyze: # Analyzing the effect of fetal genes, including triads with missing data, # using a multiplicative response model. When winlength = 1, separate # markers are used. To make longer windows, winlength can be increased # correspondingly: result.1 <- haplinSlide( haplin.data.prep, use.missing = T, response = "mult", reference = "ref.cat", winlength = 1, table.output = F) # Provide summary of separate results: lapply(result.1, summary) # Plot results: par(ask = T) lapply(result.1, plot) ## End(Not run)
## Not run: # (Almost) all standard haplin runs can be done with haplinSlide. # Below is an illustration. See the haplin help page for more # examples. # # 1. Read the data: my.haplin.data <- genDataRead( file.in = "HAPLIN.trialdata.txt", file.out = "trial_data1", dir.out = tempdir( check = TRUE ), format = "haplin", n.vars = 0 ) # 2. Run pre-processing: haplin.data.prep <- genDataPreprocess( data.in = my.haplin.data, format = "haplin", design = "triad", file.out = "trial_data1_prep", dir.out = tempdir( check = TRUE ) ) # 3. Analyze: # Analyzing the effect of fetal genes, including triads with missing data, # using a multiplicative response model. When winlength = 1, separate # markers are used. To make longer windows, winlength can be increased # correspondingly: result.1 <- haplinSlide( haplin.data.prep, use.missing = T, response = "mult", reference = "ref.cat", winlength = 1, table.output = F) # Provide summary of separate results: lapply(result.1, summary) # Plot results: par(ask = T) lapply(result.1, plot) ## End(Not run)
haplin
to each subset/stratum of data, determined by the argument strata
Produces a list, each element of which is an object of class haplin
, which is the result of fitting the log-linear haplin
models to each strata
stratum independently.
haplinStrat( data, strata = NULL, ...)
haplinStrat( data, strata = NULL, ...)
data |
R-object of class "haplin.ready", which is e.g., output from |
strata |
A single integer specifying the number of the column in the covariate data that contains the stratification variable |
... |
Remaining arguments to be used by |
haplinStrat
runs haplin
first on the entire input data file, then on each stratum separately. Strata are defined by the strata
variable, which can be coded as numerical or character. However, one should use only a moderate number of levels/strata, since haplin
will be run independetly on each, and some strata may otherwise have an insufficient amount of data.
Running haplinStrat
is thus just a simplification of manually splitting the file into strata and running haplin
on each; the end result would be the same.
The main reason for running haplinStrat is to test for gene-environment interactions. This is achieved by running postTest
on the result from haplinStrat
.
haplinStrat
can also be run from within haplinSlide
by using the strata
argument in haplinSlide
.
A list of objects of class haplin is returned. The first element contains the result of running haplin
on the entire data file; the remaining elements are the results from each of the strata. The names of the list correspond to the values of the strata
variable.
Further information is found on the web page.
Hakon K. Gjessing
Professor of Biostatistics
Division of Epidemiology
Norwegian Institute of Public Health
[email protected]
Gjessing HK and Lie RT. Case-parent triads: Estimating single- and double-dose effects of fetal and maternal disease gene haplotypes. Annals of Human Genetics (2006) 70, pp. 382-396.
Web Site: https://haplin.bitbucket.io
haplin
, summary.haplin
, plot.haplin
, haptable
, toDataFrame
, haplinSlide
, gxe
# setting up the directory with exemplary data dir.in <- system.file( "extdata", package = "Haplin" ) file.in <- paste0( dir.in, "/data.dat" ) # reading data in data.in <- genDataRead( file.in, file.out = "poo_exmpl_data_read", format = "haplin", dir.out = tempdir( check = TRUE ), n.vars = 1, allele.sep = " ", col.sep = " ", overwrite = TRUE ) # preprocessing the data data.preproc <- genDataPreprocess( data.in, design = "triad", file.out = "poo_exmpl_data_preproc", dir.out = tempdir( check = TRUE ), overwrite = TRUE ) # running haplinStrat, checking for gene-environment interactions res.GxE <- haplinStrat( data.preproc, markers = 2, strata = 1, poo = FALSE, response = "mult", reference = 2, use.missing = TRUE ) res.GxE # running haplinStrat, checking for POO-environment interactions res.POOxE <- haplinStrat( data.preproc, markers = c(1,2,3), strata = 1, poo = TRUE, response = "mult", reference = "ref.cat", use.missing = TRUE ) res.POOxE ## Not run: # All standard haplin runs can be done with haplinStrat. # Below is an illustration. See the haplin help page for more # examples. # # Analyzing the effect of fetal genes, including triads with missing data, # using a multiplicative response model. The first column of the data file # in this example contains the stratification variable. result <- haplinStrat("C:/work/data.dat", strata = 1, use.missing = T, response = "mult", reference = "ref.cat", winlength = 1) # Provide summary of separate results: lapply(result, summary) # Plot results separately: par(ask = T) lapply(result, plot) # # Convert results to table format and stack them over strata: haptable(result) # Test for interaction between haplotype risk estimates and the strata variable: postTest(result) ## End(Not run)
# setting up the directory with exemplary data dir.in <- system.file( "extdata", package = "Haplin" ) file.in <- paste0( dir.in, "/data.dat" ) # reading data in data.in <- genDataRead( file.in, file.out = "poo_exmpl_data_read", format = "haplin", dir.out = tempdir( check = TRUE ), n.vars = 1, allele.sep = " ", col.sep = " ", overwrite = TRUE ) # preprocessing the data data.preproc <- genDataPreprocess( data.in, design = "triad", file.out = "poo_exmpl_data_preproc", dir.out = tempdir( check = TRUE ), overwrite = TRUE ) # running haplinStrat, checking for gene-environment interactions res.GxE <- haplinStrat( data.preproc, markers = 2, strata = 1, poo = FALSE, response = "mult", reference = 2, use.missing = TRUE ) res.GxE # running haplinStrat, checking for POO-environment interactions res.POOxE <- haplinStrat( data.preproc, markers = c(1,2,3), strata = 1, poo = TRUE, response = "mult", reference = "ref.cat", use.missing = TRUE ) res.POOxE ## Not run: # All standard haplin runs can be done with haplinStrat. # Below is an illustration. See the haplin help page for more # examples. # # Analyzing the effect of fetal genes, including triads with missing data, # using a multiplicative response model. The first column of the data file # in this example contains the stratification variable. result <- haplinStrat("C:/work/data.dat", strata = 1, use.missing = T, response = "mult", reference = "ref.cat", winlength = 1) # Provide summary of separate results: lapply(result, summary) # Plot results separately: par(ask = T) lapply(result, plot) # # Convert results to table format and stack them over strata: haptable(result) # Test for interaction between haplotype risk estimates and the strata variable: postTest(result) ## End(Not run)
Simulates the statistical power of genetic analyses assessing fetal effects, maternal effects and/or parent-of-origin effects. Effects of X-chromosome genes and gene-environment interaction effects are also allowed.
hapPower(hapRun.result, alpha = 0.05)
hapPower(hapRun.result, alpha = 0.05)
hapRun.result |
The result of running |
alpha |
|
The Haplin framework includes different modules for assessing genetic effects: haplin
,
haplinStrat
and haplinSlide
. hapPower
simulates the power of these analyses, which enables power calculations of fetal effects, maternal effects and/or parent-of-origin effects. Various family designs, i.e., triads, case-control, the hybrid design, and all intermediate designs, are possible.
It also allows power calculation of gene-environment interaction effects and effects on X-chromosome markers.
hapPower
calculates statistical power using the result of hapRun
, and the target effects must be specified in this function, see Examples below, and details in https://haplin.bitbucket.io/docu/Haplin_power.pdf.
hapPower
returns the simulated power.
Miriam Gjerdevik,
with Hakon K. Gjessing
Professor of Biostatistics
Division of Epidemiology
Norwegian Institute of Public Health
Web Site: https://haplin.bitbucket.io
haplin
, haplinSlide
, hapSim
, hapRun
, snpPower
, snpSampleSize
, hapPowerAsymp
## Not run: ## Simulate power from 100 files using haplin. ## The files consist of fetal effects at two diallelic markers, ## corresponding to haplo.freq = rep(0.25, 4), RR = c(2,1,1,1) and RRstar = c(1,1,1,1). ## The power is simulated for the combination of 100 case triads ## and 100 control triads with no missing data at a 0.05 significance level, ## applying a multiplicative model. hapRun.res <- hapRun(nall = c(2,2), n.strata = 1, cases = c(mfc=100), controls = c(mfc=100), haplo.freq = rep(0.25,4), RR = c(2,1,1,1), RRstar = c(1,1,1,1), hapfunc = "haplin", response = "mult", n.sim = 100, dire = "simfiles", ask = FALSE) hapPower(hapRun.res) ## Simulate power from 100 files applying haplinStrat. ## The files consist of fetal and maternal effects at two diallelic markers. ## The data is simulated for 500 case triads and 200 control families in the first stratum, ## and 500 case triads and 500 control trids in the second. ## The fetal effects vary across strata, ## whereas the maternal effects are the same. ## One percent of the case triads are missing at random in the second stratum. hapRun.res <- hapRun(nall = c(2,2), n.strata = 2, cases = c(mfc=500), controls = list(c(mfc=200),c(mfc=500)), haplo.freq = rep(0.25,4), maternal = TRUE, RR = list(c(1.5,1,1,1),c(1,1,1,1)), RRstar = c(1,1,1,1), RR.mat = c(1.5,1,1,1), RRstar.mat = c(1,1,1,1), gen.missing.cases = list(NULL,0.01), use.missing = TRUE, hapfunc = "haplinStrat", n.sim = 100, ask = FALSE) hapPower(hapRun.res) ## Simulate power at the 0.1 significance level from 1000 files using haplin. ## The files consist of fetal effects at one diallelic locus, ## corresponding to haplo.freq = c(0.1,0.9), RR = c(2,1) and RRstar = c(1,1). ## The data consists of a combination of 100 case triads and 100 control triads. hapRun.res <- hapRun(nall = c(2), cases = c(mfc=100), controls = c(mfc=100), haplo.freq = c(0.1,0.9), RR = c(2,1), RRstar = c(1,1), hapfunc = "haplin", response = "mult", n.sim = 1000, ask = FALSE) hapPower(hapRun.res, alpha= 0.10) ## The latter example, applying response = "mult", should be comparable to ## the theoretic calculations of snpPower. snpPower(cases = list(mfc=100), controls = list(mfc=100), RR = 2, MAF = 0.1, alpha = 0.10) ## End(Not run)
## Not run: ## Simulate power from 100 files using haplin. ## The files consist of fetal effects at two diallelic markers, ## corresponding to haplo.freq = rep(0.25, 4), RR = c(2,1,1,1) and RRstar = c(1,1,1,1). ## The power is simulated for the combination of 100 case triads ## and 100 control triads with no missing data at a 0.05 significance level, ## applying a multiplicative model. hapRun.res <- hapRun(nall = c(2,2), n.strata = 1, cases = c(mfc=100), controls = c(mfc=100), haplo.freq = rep(0.25,4), RR = c(2,1,1,1), RRstar = c(1,1,1,1), hapfunc = "haplin", response = "mult", n.sim = 100, dire = "simfiles", ask = FALSE) hapPower(hapRun.res) ## Simulate power from 100 files applying haplinStrat. ## The files consist of fetal and maternal effects at two diallelic markers. ## The data is simulated for 500 case triads and 200 control families in the first stratum, ## and 500 case triads and 500 control trids in the second. ## The fetal effects vary across strata, ## whereas the maternal effects are the same. ## One percent of the case triads are missing at random in the second stratum. hapRun.res <- hapRun(nall = c(2,2), n.strata = 2, cases = c(mfc=500), controls = list(c(mfc=200),c(mfc=500)), haplo.freq = rep(0.25,4), maternal = TRUE, RR = list(c(1.5,1,1,1),c(1,1,1,1)), RRstar = c(1,1,1,1), RR.mat = c(1.5,1,1,1), RRstar.mat = c(1,1,1,1), gen.missing.cases = list(NULL,0.01), use.missing = TRUE, hapfunc = "haplinStrat", n.sim = 100, ask = FALSE) hapPower(hapRun.res) ## Simulate power at the 0.1 significance level from 1000 files using haplin. ## The files consist of fetal effects at one diallelic locus, ## corresponding to haplo.freq = c(0.1,0.9), RR = c(2,1) and RRstar = c(1,1). ## The data consists of a combination of 100 case triads and 100 control triads. hapRun.res <- hapRun(nall = c(2), cases = c(mfc=100), controls = c(mfc=100), haplo.freq = c(0.1,0.9), RR = c(2,1), RRstar = c(1,1), hapfunc = "haplin", response = "mult", n.sim = 1000, ask = FALSE) hapPower(hapRun.res, alpha= 0.10) ## The latter example, applying response = "mult", should be comparable to ## the theoretic calculations of snpPower. snpPower(cases = list(mfc=100), controls = list(mfc=100), RR = 2, MAF = 0.1, alpha = 0.10) ## End(Not run)
Computes the asymptotic power for genetic analyses assessing fetal effects, maternal effects and/or parent-of-origin effects. Effects of X-chromosome genes and gene-environment interaction effects are also allowed.
hapPowerAsymp(nall = 2, n.strata = 1, cases, controls, haplo.freq, RR, RRcm, RRcf, RRstar, RR.mat, RRstar.mat, xchrom = F, sim.comb.sex = "double", BR.girls, response = "mult", alpha = 0.05, ...)
hapPowerAsymp(nall = 2, n.strata = 1, cases, controls, haplo.freq, RR, RRcm, RRcf, RRstar, RR.mat, RRstar.mat, xchrom = F, sim.comb.sex = "double", BR.girls, response = "mult", alpha = 0.05, ...)
nall |
A vector of the number of alleles at each locus. By default a diallellic SNP. |
n.strata |
The number of strata. |
cases |
A list of the number of case families. Each element is a vector of the number of families of the specified family design in the corresponding stratum. The possible family designs, i.e., the possible names of the elements, are |
controls |
A list of the number of control families. Each element is a vector of the number of families of the specified family design in the corresponding stratum. The possible family designs are |
haplo.freq |
A list of which each element is a numeric vector of the haplotype frequencies in each stratum. The frequencies are normalized and sum to one. The Details section shows how to implement this argument in agreement with the possible haplotypes. |
RR |
A list of which each element is a numeric vector of the relative risks in each stratum. The Details section shows how to implement this argument in agreement with the possible haplotypes. |
RRcm |
A list of numeric vectors. Each vector contains the relative risks associated with the haplotypes transmitted from the mother for this stratum. See Details for description of how to implement this argument in agreement with the possible haplotypes. |
RRcf |
A list of numeric vectors. Each vector contains the relative risks associated with the haplotypes transmitted from the father for this stratum. See Details for description of how to implement this argument in agreement with the possible haplotypes. |
RRstar |
A list of numeric vectors. Estimates how much double-dose children would deviate from the risk expected in a multiplicative dose-response relationship. |
RR.mat |
The interpretation is similar to |
RRstar.mat |
The interpretation is similar to |
xchrom |
Logical. Equals |
sim.comb.sex |
To be used with |
BR.girls |
To be used with |
response |
The default value "mult" specifies a multiplicative dose-response model. |
alpha |
|
... |
Could include argument |
The Haplin framework includes different modules for assessing genetic effects: haplin
,
haplinStrat
and haplinSlide
. hapPowerAsymp
computes the asymptotic power for these analyses, which enables power calculations of fetal effects, maternal effects and/or parent-of-origin effects. Various family designs, i.e., triads, case-control, the hybrid design, and all intermediate designs, are possible.
It also allows power calculation of gene-environment interaction effects and effects on X-chromosome markers.
hapPower
computes power through "brute force" simulations using hapRun
. This is a robust way of checking software implementations, asymptotic approximations
and attained significance level. However, both power and the corresponding sample size calculations can be performed much more efficiently using asymptotic approximations.
The asymptotic power is calculated applying the non-centrality parameter of the Wald tests, which use the asymptotic normal distribution of the log-scale parameter. The function hapCovar
(used by hapPowerAsymp
) computes the variance-covariance matrix
by applying the log-linear model combined with the EM algorithm.
Specifying haplotype risks:
The number of haplotypes used in the simulations is determined by the nall
argument, since prod(nall)
different haplotypes can be made from the specified number of markers, length(nall)
. The arguments haplo.freq
, RR
, RRcm
, RRcf
, RRstar
, RR.mat
, and RRstar.mat
are all lists where each element represents a stratum. Within each stratum, the arguments are vectors of length equal to the number of haplotypes, specifying the relative risk etc. associated with each haplotype.
The stratum specific arguments may be simplified if the number of strata is one, or if the arguments are equal across all strata.
The haplotypes are determined by creating all possible haplotypes from the given markers, in a sequence where the first marker varies mostly quickly. For instance, if nall = c(3,2)
, the first marker has 3 alleles, the second has 2, and 6 haplotypes are possible. Taken in order, the haplotypes are 1-1, 2-1, 3-1, 1-2, 2-2, and 3-2. When specifying, say, RR = c(1,2,1,1,1,1)
the haplotype 2-1 has a double risk compared to the rest. With, for instance, two strata, the specification RR = list(c(1,2,1,1,1,1), c(1,1,1,1,1,1))
would mean that the risk associated with 2-1 is elevated only in the first stratum, not the second.
The simplest example would be with nall = c(2)
and RR = c(1,2)
, which would simulate a single SNP where the second allele has a double risk.
Specifying genetic effects:
Standard fetal effects are specified by the arguments RR
and RRstar
, whereas parent-of-origin effects are addressed by the arguments RRcm
, RRcf
and RRstar
. Maternal effects are included by the additional arguments RRmat
and RRstar.mat
.
hapPowerAsymp
returns the asymptotic power for the relevant genetic effects. The first element of the list depicts the power for each haplotype analyzed separately.
If there are more than two possible haplotypes, the second element displays the overall power for all haplotypes combined.
Miriam Gjerdevik,
with Hakon K. Gjessing
Professor of Biostatistics
Division of Epidemiology
Norwegian Institute of Public Health
Web Site: https://haplin.bitbucket.io
haplin
, haplinSlide
, haplinStrat
, hapSim
, hapRun
, snpPower
, hapPower
## Calculate the asymptotic power for a triad design ## when the minor allele increases the fetal risk by twofold. ## Assumes a multiplicative dose-response relationship. hapPowerAsymp(nall = c(2), n.strata = 1, cases = list(c(mfc=120)), haplo.freq = c(0.1,0.9), RR = c(2,1), RRstar = c(1,1)) ## Calculate the asymptotic power for the hybrid design when ## the minor allele increases the fetal risk by twofold ## in the first stratum and no effect is seen in the second ## i.e., gene-environment interaction (GxE) effects. hapPowerAsymp(nall = c(2), n.strata = 2, cases = list(c(mfc=100)), controls = list(c(mfc=100)), haplo.freq = c(0.1,0.9), RR = list(c(2,1), c(1,1)), RRstar = c(1,1)) ## Calculate the asymptotic GxE power assessing maternal- and ## parent-of-origin effects at two diallelic loci. hapPowerAsymp(nall = c(2,2), n.strata = 2, cases = c(mfc=500), haplo.freq = c(0.1,0.2,0.3,0.4), RRcm = list(c(3,1,1,1), c(1,1,1,1)), RRcf = c(1,1,1,1), RRstar = c(1,1,1,1), RR.mat = list(c(1.5,1,1,1),c(1,1,1,1)), RRstar.mat = c(1,1,1,1))
## Calculate the asymptotic power for a triad design ## when the minor allele increases the fetal risk by twofold. ## Assumes a multiplicative dose-response relationship. hapPowerAsymp(nall = c(2), n.strata = 1, cases = list(c(mfc=120)), haplo.freq = c(0.1,0.9), RR = c(2,1), RRstar = c(1,1)) ## Calculate the asymptotic power for the hybrid design when ## the minor allele increases the fetal risk by twofold ## in the first stratum and no effect is seen in the second ## i.e., gene-environment interaction (GxE) effects. hapPowerAsymp(nall = c(2), n.strata = 2, cases = list(c(mfc=100)), controls = list(c(mfc=100)), haplo.freq = c(0.1,0.9), RR = list(c(2,1), c(1,1)), RRstar = c(1,1)) ## Calculate the asymptotic GxE power assessing maternal- and ## parent-of-origin effects at two diallelic loci. hapPowerAsymp(nall = c(2,2), n.strata = 2, cases = c(mfc=500), haplo.freq = c(0.1,0.2,0.3,0.4), RRcm = list(c(3,1,1,1), c(1,1,1,1)), RRcf = c(1,1,1,1), RRstar = c(1,1,1,1), RR.mat = list(c(1.5,1,1,1),c(1,1,1,1)), RRstar.mat = c(1,1,1,1))
Computes the relative efficiency for different study designs in genetic association analysis.
hapRelEff(nall = 2, cases.comp, controls.comp, cases.ref, controls.ref, haplo.freq, RR, RRcm, RRcf, RRstar, RR.mat, RRstar.mat, xchrom = F, sim.comb.sex = "double", BR.girls, response = "mult", ...)
hapRelEff(nall = 2, cases.comp, controls.comp, cases.ref, controls.ref, haplo.freq, RR, RRcm, RRcf, RRstar, RR.mat, RRstar.mat, xchrom = F, sim.comb.sex = "double", BR.girls, response = "mult", ...)
nall |
A vector of the number of alleles at each locus. By default a diallellic SNP. |
cases.comp |
A list of the number of case families in the comparison design. Its element is a vector of the number of families of the specified family design. The possible family designs, i.e., the possible names of the elements, are |
controls.comp |
A list of the number of control families in the comparison design. Its element is a vector of the number of families of the specified family design. The possible family designs are |
cases.ref |
A list of the number of case families in the reference design. The options are the same as for |
controls.ref |
A list of the number of control families in the reference design. The options are the same as for |
haplo.freq |
A list of which its element is a numeric vector of the haplotype frequencies. The frequencies are normalized and sum to one. The Details section shows how to implement this argument in agreement with the possible haplotypes. |
RR |
A list of which its element is a numeric vector of the relative risks. The Details section shows how to implement this argument in agreement with the possible haplotypes. |
RRcm |
A numeric vector in list format containing the relative risks associated with the haplotypes transmitted from the mother. See Details for description of how to implement this argument in agreement with the possible haplotypes. |
RRcf |
A numeric vector in list format containing the relative risks associated with the haplotypes transmitted from the father. See Details for description of how to implement this argument in agreement with the possible haplotypes. |
RRstar |
A numeric vector in list format. Estimates how much double-dose children would deviate from the risk expected in a multiplicative dose-response relationship. |
RR.mat |
The interpretation is similar to |
RRstar.mat |
The interpretation is similar to |
xchrom |
Logical. Equals |
sim.comb.sex |
To be used with |
BR.girls |
To be used with |
response |
The default value "mult" specifies a multiplicative dose-response model. |
... |
Could include argument |
hapRelEff
compares two study designs for genetic association analysis, using a term called relative efficiency.
The relative efficiency is defined as the ratio of variances of estimators for the same parameter, computed from two
different designs, or equivalently, the ratio of the sample sizes needed for each of the two designs to achieve the same level and power. The number of genotyped individuals within each design is accounted for.
The relative efficiency estimated under the null hypothesis, i.e., when all relative risks are equal to one, is known as the Pitman efficiency.
To compute the variance-covariance estimates for each design, hapRelEff
calls the function hapCovar
, which calculates the asymptotic variance-covariance matrix
by applying the log-linear model combined with the EM algorithm.
The relative efficiency can be computed for fetal effects, maternal effects and parent-of-origin effect, as well as effects on X-chromosome markers. Various study designs, i.e., case-parent triads, the standard case-control design, the hybrid design, and all intermediate designs, can be compared.
Note that the exact number of case families and control families in the reference or comparison design is irrelevant, as this will be accounted for in the relative efficiency estimate. However, the ratio of control families to case families within the reference or comparison design must be specified correctly. See the Examples section.
Specifying haplotype risks:
The number of haplotypes used in the simulations is determined by the nall
argument, since prod(nall)
different haplotypes can be made from the specified number of markers, length(nall)
.
The arguments haplo.freq
, RR
, RRcm
, RRcf
, RRstar
, RR.mat
, and RRstar.mat
are all lists, containing vectors of length equal to the number of haplotypes, specifying the relative risk etc. associated with each haplotype.
However, the function will work without using the list format.
The haplotypes are determined by creating all possible haplotypes from the given markers, in a sequence where the first marker varies mostly quickly.
For instance, if nall = c(3,2)
, the first marker has 3 alleles, the second has 2, and 6 haplotypes are possible. Taken in order, the haplotypes are 1-1, 2-1, 3-1, 1-2, 2-2, and 3-2.
When specifying, say, RR = c(1,2,1,1,1,1)
the haplotype 2-1 has a double risk compared to the rest.
The simplest example would be with nall = c(2)
and RR = c(1,2)
, which would simulate a single SNP where the second allele has a double risk.
Specifying genetic effects:
Standard fetal effects are specified by the arguments RR
and RRstar
,
whereas parent-of-origin effects are addressed by the arguments RRcm
, RRcf
and RRstar
.
Maternal effects are included by the additional arguments RRmat
and RRstar.mat
.
hapRelEff
returns the relative efficiency estimate, comparing two study designs. The first element of the list depicts the relative efficiency for each haplotype analyzed separately.
If there are more than two possible haplotypes, the second element displays the overall relative efficiency for all haplotypes combined.
Miriam Gjerdevik,
with Hakon K. Gjessing
Professor of Biostatistics
Division of Epidemiology
Norwegian Institute of Public Health
Web Site: https://haplin.bitbucket.io
haplin
, hapPowerAsymp
, hapRun
, snpPower
, hapPower
## Child effects: Calculate the efficiency of the standard case-control design ## (with an equal number of case and control children) ## relative to the case-parent triad design ## under the null hypothesis when the minor allele frequency is 0.1. hapRelEff(nall = c(2), cases.comp = c(c=1), controls.comp = c(c=1), cases.ref = c(mfc=1), haplo.freq = c(0.1,0.9), RR = c(1,1)) ## Child effects: Calculate the efficiency of the standard case-control design, ## with twice as many cases as controls, relative to the case-parent triad design ## under the null hypothesis when the minor allele frequency is 0.2. hapRelEff(nall = c(2), cases.comp = c(c=2), controls.comp = c(c=1), cases.ref = c(mfc=1), haplo.freq = c(0.2,0.8), RR = c(1,1)) ## Child and maternal effects: Calculate the efficiency of the case-mother dyad design ## relative to the case-parent triad design ## under the null hypothesis when the minor allele frequency is 0.1. hapRelEff(nall = c(2), cases.comp = c(mc=1), cases.ref = c(mfc=1), haplo.freq = c(0.1,0.9), RR = c(1,1), RR.mat=c(1,1)) ## PoO effects: Calculate the efficiency of the full hybrid design, ## with twice as many control families as case families, ## relative to the case-parent triad design ## under the null hypothesis when the minor allele frequency is 0.1. hapRelEff(nall = c(2), cases.comp = c(mfc=1), controls.comp = c(mfc=2), cases.ref = c(mfc=1), haplo.freq = c(0.1,0.9), RRcm = c(1,1), RRcf = c(1,1))
## Child effects: Calculate the efficiency of the standard case-control design ## (with an equal number of case and control children) ## relative to the case-parent triad design ## under the null hypothesis when the minor allele frequency is 0.1. hapRelEff(nall = c(2), cases.comp = c(c=1), controls.comp = c(c=1), cases.ref = c(mfc=1), haplo.freq = c(0.1,0.9), RR = c(1,1)) ## Child effects: Calculate the efficiency of the standard case-control design, ## with twice as many cases as controls, relative to the case-parent triad design ## under the null hypothesis when the minor allele frequency is 0.2. hapRelEff(nall = c(2), cases.comp = c(c=2), controls.comp = c(c=1), cases.ref = c(mfc=1), haplo.freq = c(0.2,0.8), RR = c(1,1)) ## Child and maternal effects: Calculate the efficiency of the case-mother dyad design ## relative to the case-parent triad design ## under the null hypothesis when the minor allele frequency is 0.1. hapRelEff(nall = c(2), cases.comp = c(mc=1), cases.ref = c(mfc=1), haplo.freq = c(0.1,0.9), RR = c(1,1), RR.mat=c(1,1)) ## PoO effects: Calculate the efficiency of the full hybrid design, ## with twice as many control families as case families, ## relative to the case-parent triad design ## under the null hypothesis when the minor allele frequency is 0.1. hapRelEff(nall = c(2), cases.comp = c(mfc=1), controls.comp = c(mfc=2), cases.ref = c(mfc=1), haplo.freq = c(0.1,0.9), RRcm = c(1,1), RRcf = c(1,1))
Calculates Haplin results by first simulating genetic data, allowing a various number of family designs, and then running Haplin on the simulations.
The simulated data may contain of fetal effects, maternal effects and/or parent-of-origin effects.
The function allows for simulations and calculations on both autosomal and X-chromosome markers,
assuming Hardy-Weinberg equilibrium.
It enables simulation and calculation of gene-environment interaction effects, i.e, the input (relative risks, number of cases etc.) may vary across strata.
hapRun
calls haplin
, haplinStrat
or haplinSlide
to run on the simulated data files.
hapRun(nall, n.strata= 1, cases, controls, haplo.freq, RR, RRcm, RRcf, RRstar, RR.mat, RRstar.mat, hapfunc = "haplin", gen.missing.cases = NULL, gen.missing.controls = NULL, n.sim = 1000, xchrom = FALSE, sim.comb.sex = "double", BR.girls, dire, ask = TRUE, cpus = 1, slaveOutfile = "", ...)
hapRun(nall, n.strata= 1, cases, controls, haplo.freq, RR, RRcm, RRcf, RRstar, RR.mat, RRstar.mat, hapfunc = "haplin", gen.missing.cases = NULL, gen.missing.controls = NULL, n.sim = 1000, xchrom = FALSE, sim.comb.sex = "double", BR.girls, dire, ask = TRUE, cpus = 1, slaveOutfile = "", ...)
nall |
A vector of the number of alleles at each locus. |
n.strata |
The number of strata. |
cases |
A list of the number of case families. Each element is a vector of the number of families of the specified family design(s) in the corresponding stratum. The possible family designs, i.e., the possible names of the elements, are |
controls |
A list of the number of control families. Each element is a vector of the number of families of the specified family design(s) in the corresponding stratum. The possible family designs are |
haplo.freq |
A list of which each element is a numeric vector of the haplotype frequencies in each stratum. The frequencies are normalized and sum to one. The Details section shows how to implement this argument in agreement with the possible haplotypes. |
RR |
A list of which each element is a numeric vector of the relative risks in each stratum. The Details section shows how to implement this argument in agreement with the possible haplotypes. |
RRcm |
A list of numeric vectors. Each vector contains the relative risks associated with the haplotypes transmitted from the mother for this stratum. See Details for description of how to implement this argument in agreement with the possible haplotypes. |
RRcf |
A list of numeric vectors. Each vector contains the relative risks associated with the haplotypes transmitted from the father for this stratum. See Details for description of how to implement this argument in agreement with the possible haplotypes. |
RRstar |
A list of numeric vectors. Estimates how much double-dose children would deviate from the risk expected in a multiplicative dose-response relationship. |
RR.mat |
The interpretation is similar to |
RRstar.mat |
The interpretation is similar to |
hapfunc |
Defines which haplin function to run, the options being |
gen.missing.cases |
Generates missing values at random for the case families. Set to |
gen.missing.controls |
Generates missing values at random for the control families. Set to |
n.sim |
The number of simulations, i.e., the number of simulated data files. |
xchrom |
Logical. Equals |
sim.comb.sex |
To be used with |
BR.girls |
To be used with |
dire |
Gives the directory of the simulated data files. Missing by default, which means that none of the files are saved to files. |
ask |
Logical. If |
cpus |
Allows parallel processing of its analyses. The |
slaveOutfile |
Character. If |
... |
Arguments to be used by |
hapRun
applies haplin
, haplinSlide
or haplinStrat
on each data file simulated by hapSim
.
It provides simulations on various family designs, i.e., triads, case-control, the hybrid design, and all intermediate designs.
The simulated files may accomodate fetal effects, maternal effects and/or parent-of-origin effects.
hapRun
allows simulation of both autosomal and X-chromosome markers, assuming Hardy-Weinberg equilibrium.
It also enables simulation and calculation of gene-environment interaction effects.
Details on how to implement the arguments listed above are provided by hapSim
and the Examples section below.
The stratum specific arguments may be simplified if the number of strata is one, or if the arguments are equal across all strata.
haplin
, haplinStrat
and haplinSlide
will run with default values unless otherwise specified by hapRun
.
For example, if hapfunc = "haplin"
, haplin
will use response = "free"
unless response = "mult"
is explicitly given as an argument in hapRun
.
Moreover, triads with missing data are only included in the haplin analysis if the argument use.missing
equals TRUE
(default in hapRun
). Please confer https://haplin.bitbucket.io/docu/Haplin_power.pdf for further details and examples.
For information on the arguments to be passed on to haplin
, haplinStrat
and haplinSlide
, please consult their help pages.
Note that RR.mat
and RRstar.mat
and RRcm
and RRcf
are required for hapSim
to simulate maternal and parent-of-origin effects, respectively.
To calculate these effects, however, arguments maternal = TRUE
and/or poo = TRUE
must be specified.
gen.missing.cases
and gen.missing.controls
are flexible arguments. By default, both equal NULL, which means that no missing data are generated at random.
If the arguments are single numbers, missing data are generated at random with this proportion for all cases and/or controls.
If the arguments are vectors of length equal to the number of loci, missing data are generated with the corresponding proportion for each locus.
The arguments can also be matrices with the number of rows equal to the number of loci and three columns.
Each row corresponds to a locus, and the columns correspond to mothers, fathers and children, respectively.
If hapfunc = "haplin"
, hapRun
returns a dataframe consisting of results from running haplin
on each simulated file.
The first two columns are:
sim.no |
The name of the directory from which the results are calculated, i.e., the simulation number |
row.no |
The row number within each simulation |
haptable
gives detailed information of the full dataframe.
If hapfunc = "haplinSlide"
, hapRun
returns a list of which each element contains the results from a single run of haplinSlide
.
Consult suest
for a thorough description of the output. Note, however, that hapfunc = "haplinSlide"
is currently only implemented for diallelic markers, and the reference category is always chosen to be the first haplotype (see hapSim
for a description of the haplotype grid).
If hapfunc = "haplinStrat"
, haplinStrat
is used to estimate gene-effects in each stratum of the exposure covariate, and the results from all strata are compared using gxe
. hapRun
returns a list, where each element is the result of a single run of
gxe
.
Additionaly, if dire
is not missing by default, the simulated files from which the Haplin results are calculated, are stored in the given directory.
Miriam Gjerdevik,
with Hakon K. Gjessing
Professor of Biostatistics
Division of Epidemiology
Norwegian Institute of Public Health
Web Site: https://haplin.bitbucket.io
haplin
, haplinSlide
, hapSim
, haptable
, suest
, hapPower
, hapPowerAsymp
## Not run: ## Simulate Haplin results from 100 files using the multiplicative model in haplin. ## The files consist of fetal effects at two diallelic markers, ## corresponding to haplo.freq = rep(0.25, 4), RR = c(2,1,1,1) ## and RRstar = c(1,1,1,1). That is, the first allele has a doubled risk ## relative to the rest. The data consists of a combination of ## 100 case triads and 100 control triads with no missing data. ## No environmental factors are considered, i.e. the number of strata is one. hapRun(nall = c(2,2), n.strata = 1, cases = c(mfc=100), controls = c(mfc=100), haplo.freq = rep(0.25,4), RR = c(2,1,1,1), RRstar = c(1,1,1,1), hapfunc = "haplin", response = "mult", n.sim = 100, dire = "simfiles", ask = FALSE) ## Simulate power from 100 files applying haplinStrat. ## The files consist of fetal and maternal effects at two diallelic markers. ## The data is simulated for 500 case triads and 200 control families in the first stratum, ## and 500 case triads and 500 control trids in the second. ## The fetal effects vary across strata, ## whereas the maternal effects are the same. ## One percent of the case triads are missing at random in the second stratum. hapRun(nall = c(2,2), n.strata = 2, cases = c(mfc=500), controls = list(c(mfc=200),c(mfc=500)), haplo.freq = rep(0.25,4), maternal = TRUE, RR = list(c(1.5,1,1,1),c(1,1,1,1)), RRstar = c(1,1,1,1), RR.mat = c(1.5,1,1,1), RRstar.mat = c(1,1,1,1), gen.missing.cases = list(NULL,0.01), use.missing = TRUE, hapfunc = "haplinStrat", n.sim = 100, ask = FALSE) ## Simulate Haplin results from 100 files using haplin. ## The files consist of fetal effects at one diallelic locus, ## corresponding to haplo.freq = rep(0.5,2), RR = c(1.5,1) and RRstar = c(1,1). ## We have a combination of 100 case triads and ## 100 control triads with no missing data. ## No environmental effects are considered. hapRun(nall = c(2), n.strata = 1, cases = c(mfc=100), controls = c(mfc=100), haplo.freq = rep(0.5,2), RR = c(1.5,1), RRstar = c(1,1), hapfunc = "haplin", n.sim = 100, dire = "simfiles", ask = FALSE) ## End(Not run)
## Not run: ## Simulate Haplin results from 100 files using the multiplicative model in haplin. ## The files consist of fetal effects at two diallelic markers, ## corresponding to haplo.freq = rep(0.25, 4), RR = c(2,1,1,1) ## and RRstar = c(1,1,1,1). That is, the first allele has a doubled risk ## relative to the rest. The data consists of a combination of ## 100 case triads and 100 control triads with no missing data. ## No environmental factors are considered, i.e. the number of strata is one. hapRun(nall = c(2,2), n.strata = 1, cases = c(mfc=100), controls = c(mfc=100), haplo.freq = rep(0.25,4), RR = c(2,1,1,1), RRstar = c(1,1,1,1), hapfunc = "haplin", response = "mult", n.sim = 100, dire = "simfiles", ask = FALSE) ## Simulate power from 100 files applying haplinStrat. ## The files consist of fetal and maternal effects at two diallelic markers. ## The data is simulated for 500 case triads and 200 control families in the first stratum, ## and 500 case triads and 500 control trids in the second. ## The fetal effects vary across strata, ## whereas the maternal effects are the same. ## One percent of the case triads are missing at random in the second stratum. hapRun(nall = c(2,2), n.strata = 2, cases = c(mfc=500), controls = list(c(mfc=200),c(mfc=500)), haplo.freq = rep(0.25,4), maternal = TRUE, RR = list(c(1.5,1,1,1),c(1,1,1,1)), RRstar = c(1,1,1,1), RR.mat = c(1.5,1,1,1), RRstar.mat = c(1,1,1,1), gen.missing.cases = list(NULL,0.01), use.missing = TRUE, hapfunc = "haplinStrat", n.sim = 100, ask = FALSE) ## Simulate Haplin results from 100 files using haplin. ## The files consist of fetal effects at one diallelic locus, ## corresponding to haplo.freq = rep(0.5,2), RR = c(1.5,1) and RRstar = c(1,1). ## We have a combination of 100 case triads and ## 100 control triads with no missing data. ## No environmental effects are considered. hapRun(nall = c(2), n.strata = 1, cases = c(mfc=100), controls = c(mfc=100), haplo.freq = rep(0.5,2), RR = c(1.5,1), RRstar = c(1,1), hapfunc = "haplin", n.sim = 100, dire = "simfiles", ask = FALSE) ## End(Not run)
Simulates genetic data in Haplin format, consisting of fetal effects, maternal effects and/or parent-of-origin effects. Allows for simulation of both autosomal and X-linked markers, assuming Hardy-Weinberg equilibrium. Enables stratified simulations for gene-environment interaction analyses, i.e the input (relative risks, number of cases etc) may vary across different strata.
hapSim(nall, n.strata = 1, cases, controls, haplo.freq, RR, RRcm, RRcf, RRstar, RR.mat, RRstar.mat, gen.missing.cases = NULL, gen.missing.controls = NULL, n.sim = 1000, xchrom = F, sim.comb.sex = "double", BR.girls, dire = "simfiles", ask = TRUE, verbose = TRUE, cpus = 1)
hapSim(nall, n.strata = 1, cases, controls, haplo.freq, RR, RRcm, RRcf, RRstar, RR.mat, RRstar.mat, gen.missing.cases = NULL, gen.missing.controls = NULL, n.sim = 1000, xchrom = F, sim.comb.sex = "double", BR.girls, dire = "simfiles", ask = TRUE, verbose = TRUE, cpus = 1)
nall |
A vector of the number of alleles at each locus. |
n.strata |
The number of strata. |
cases |
A list of the number of case families. Each element is a vector of the number of families of the specified family design(s) in the corresponding stratum. The possible family designs, i.e., the possible names of the elements, are |
controls |
A list of the number of control families. Each element is a vector of the number of families of the specified family design(s) in the corresponding stratum. The possible family designs are |
haplo.freq |
A list of which each element is a numeric vector of the haplotype frequencies in each stratum. The frequencies will be normalized so that they sum to one. The Details section shows how to implement this argument in agreement with the possible haplotypes. |
RR |
A list of which each element is a numeric vector of the relative risks in each stratum. The Details section shows how to implement this argument in agreement with the possible haplotypes. |
RRcm |
A list of numeric vectors. Each vector contains the relative risks associated with the haplotypes transmitted from the mother for this stratum. See Details for description of how to implement this argument in agreement with the possible haplotypes. |
RRcf |
A list of numeric vectors. Each vector contains the relative risks associated with the haplotypes transmitted from the father for this stratum. See Details for description of how to implement this argument in agreement with the possible haplotypes. |
RRstar |
A list of numeric vectors. Estimates how much double-dose children would deviate from the risk expected in a multiplicative dose-response relationship. |
RR.mat |
The interpretation is similar to |
RRstar.mat |
The interpretation is parallel to |
gen.missing.cases |
Generates missing values at random for the case families. Set to |
gen.missing.controls |
Generates missing values at random for the control families. Set to |
n.sim |
The number of simulations, i.e., the number of simulated data files. |
xchrom |
Logical. Equals |
sim.comb.sex |
To be used with |
BR.girls |
To be used with |
dire |
Gives the directory of the simulated data files. |
ask |
Logical. If |
verbose |
Logical. Default is |
cpus |
Allows simulations to be performed in parallel. The |
hapSim
simulates allele values for case and control families at multiple markers (typically in LD) simultaneously. The number of markers/SNPs involved will typically be in the range 1 to 6. Data are simulated to produce relative risks of disease as specified by the user input. Simulations can be performed separately over a number of strata so as to simulate gene-environment interactions.
Specifying haplotype risks:
The number of haplotypes used in the simulations is determined by the nall
argument, since prod(nall)
different haplotypes can be made from the specified number of markers, length(nall)
. The arguments haplo.freq
, RR
, RRcm
, RRcf
, RRstar
, RR.mat
, and RRstar.mat
are all lists where each element represents a stratum. Within each stratum, the arguments are vectors of length equal to the number of haplotypes, specifying the relative risk etc. associated with each haplotype.
The stratum specific arguments may be simplified if the number of strata is one, or if the arguments are equal across all strata.
The haplotypes are determined by creating all possible haplotypes from the given markers, in a sequence where the first marker varies mostly quickly. For instance, if nall = c(3,2)
, the first marker has 3 alleles, the second has 2, and 6 haplotypes are possible. Taken in order, the haplotypes are 1-1, 2-1, 3-1, 1-2, 2-2, and 3-2. When specifying, say, RR = c(1,2,1,1,1,1)
the haplotype 2-1 has a double risk compared to the rest. With, for instance, two strata, the specification RR = list(c(1,2,1,1,1,1), c(1,1,1,1,1,1))
would mean that the risk associated with 2-1 is elevated only in the first stratum, not the second.
The simplest example would be with nall = c(2)
and RR = c(1,2)
, which would simulate a single SNP where the second allele has a double risk.
Output file format:
The format of the simulated files is relatively flexible and allows multi-allelic markers and various designs.
If both case and control families are present, the simulated files contain a leading column of the case/control status (1/0).
If xchrom=TRUE
, the neighboring column to the left of the genetic data contains the sex information (1 = male, 2 = female).
Each line represents genotypes for a case or control triad.
There are six columns for each locus, two for the mother (M), two for the father (F) and two for the child (C). The columns are placed in the following sequence: M11 M12 F11 F12 C11 C12 M21 M22 F21 F22 C21 C22... etc, where the first number indicates marker, and the second number indicates the first or second allele at this locus. Columns are separated by a single white space, and missing data are coded as NA.
Intermediate designs, for instance mother-child dyads, are represented as full triads with columns of absent family members set to missing. In the case of a pure case-control design, however, each line represents a single individual, and there are no columns representing mothers and fathers.
There are no row or column names in the files.
Some examples are given below. See https://haplin.bitbucket.io/docu/haplin_data_format.html for a thorough description of the Haplin format. Note that this description separates the two alleles for an individual within a locus by a semi-colon, such as 1;2. This is, however, not necessary.
Confer the document https://haplin.bitbucket.io/docu/Haplin_power.pdf for details and examples on how to perform the simulations.
gen.missing.cases
and gen.missing.controls
are flexible arguments. By default, both equal NULL, which means that no missing data are generated at random.
If the arguments are single numbers, missing data are generated at random with this proportion for all cases and/or controls.
If the arguments are vectors of length equal to the number of loci, missing data are generated with the corresponding proportion for each locus.
The arguments can also be matrices with the number of rows equal to the number of loci and three columns.
Each row corresponds to a locus, and the columns correspond to mothers, fathers and children, respectively.
Miriam Gjerdevik,
with Hakon K. Gjessing
Professor of Biostatistics
Division of Epidemiology
Norwegian Institute of Public Health
Web Site: https://haplin.bitbucket.io
## Not run: ## Simulate genetic data (100 files) at two diallelic markers, consisting of fetal effects ## corresponding to haplo.freq = rep(0.25, 4), RR = c(2,1,1,1) and RRstar = c(1,1,1,1), ## for the combination of 1000 case triads and 1000 control triads with no missing data. ## Only one stratum. hapSim(nall = c(2,2), n.strata = 1, cases = c(mfc=1000), controls = c(mfc=1000), haplo.freq = rep(0.25,4), RR = c(2,1,1,1), RRstar = c(1,1,1,1), n.sim = 100, dire = "simfiles") ## Simulate genetic data (100 files) at two diallelic markers, ## consisting of fetal and maternal effects corresponding to ## haplo.freq = rep(0.25, 4), RR = c(2,1,1,1), RRstar = c(1,1,1,1), ## RR.mat = c(2,1,1,1) and RRstar.mat = c(1,1,1,1), ## for 1000 case triads and zero control families. ## One percent of the case triads are missing at random. One stratum only. hapSim(nall = c(2,2), n.strata=1, cases = c(mfc=1000), controls = c(mfc=0), haplo.freq = rep(0.25,4), RR = c(2,1,1,1), RRstar = c(1,1,1,1), RR.mat = c(2,1,1,1), RRstar.mat = c(1,1,1,1), gen.missing.cases = 0.01, n.sim = 100, dire = "simfiles") ## Simulate genetic data (100 files) at two diallelic markers. In the first stratum, ## we have a combination of 500 case triads and 500 control triads with ## haplo.freq = rep(0.25, 4), RR = c(2,1,1,1) and RRstar = c(1,1,1,1). ## In the second stratum, we have 300 case triads and 500 control triads with ## haplo.freq = rep(0.25, 4), RR = c(1,1,1,1) and RRstar = c(1,1,1,1). ## One percent of the control triads are missing at random in the first stratum. hapSim(nall = c(2,2), n.strata= 2, cases = list(c(mfc=500),c(mfc=300)), controls = c(mfc=500),haplo.freq = rep(0.25,4), RR = list(c(2,1,1,1),c(1,1,1,1)), RRstar = c(1,1,1,1), gen.missing.controls = list(0.01,NULL), n.sim = 100, dire = "simfiles") ## End(Not run)
## Not run: ## Simulate genetic data (100 files) at two diallelic markers, consisting of fetal effects ## corresponding to haplo.freq = rep(0.25, 4), RR = c(2,1,1,1) and RRstar = c(1,1,1,1), ## for the combination of 1000 case triads and 1000 control triads with no missing data. ## Only one stratum. hapSim(nall = c(2,2), n.strata = 1, cases = c(mfc=1000), controls = c(mfc=1000), haplo.freq = rep(0.25,4), RR = c(2,1,1,1), RRstar = c(1,1,1,1), n.sim = 100, dire = "simfiles") ## Simulate genetic data (100 files) at two diallelic markers, ## consisting of fetal and maternal effects corresponding to ## haplo.freq = rep(0.25, 4), RR = c(2,1,1,1), RRstar = c(1,1,1,1), ## RR.mat = c(2,1,1,1) and RRstar.mat = c(1,1,1,1), ## for 1000 case triads and zero control families. ## One percent of the case triads are missing at random. One stratum only. hapSim(nall = c(2,2), n.strata=1, cases = c(mfc=1000), controls = c(mfc=0), haplo.freq = rep(0.25,4), RR = c(2,1,1,1), RRstar = c(1,1,1,1), RR.mat = c(2,1,1,1), RRstar.mat = c(1,1,1,1), gen.missing.cases = 0.01, n.sim = 100, dire = "simfiles") ## Simulate genetic data (100 files) at two diallelic markers. In the first stratum, ## we have a combination of 500 case triads and 500 control triads with ## haplo.freq = rep(0.25, 4), RR = c(2,1,1,1) and RRstar = c(1,1,1,1). ## In the second stratum, we have 300 case triads and 500 control triads with ## haplo.freq = rep(0.25, 4), RR = c(1,1,1,1) and RRstar = c(1,1,1,1). ## One percent of the control triads are missing at random in the first stratum. hapSim(nall = c(2,2), n.strata= 2, cases = list(c(mfc=500),c(mfc=300)), controls = c(mfc=500),haplo.freq = rep(0.25,4), RR = list(c(2,1,1,1),c(1,1,1,1)), RRstar = c(1,1,1,1), gen.missing.controls = list(0.01,NULL), n.sim = 100, dire = "simfiles") ## End(Not run)
Create a comprehensive table of haplin output
haptable(object)
haptable(object)
object |
A |
haptable
extracts the most important information from a haplin
object to produce a summary table. The table can then be saved with, for instance, write.table
, making the results easily accessible to other applications. You can also use output
to produce and save the same results.
—- A dataframe is returned, with the following columns: —-
marker |
Name(s) of marker(s) investigated |
alleles |
A listing of the alleles found at each marker |
counts |
Frequency counts of alleles at each marker |
HWE.pv |
P-value from Hardy-Weinberg equilibrium test at each marker |
Original |
Number of triads before removal |
After.rem.NA |
Number of triads after removal of missing |
After.rem.Mend.inc. |
Number of triads after removal of Mendelian inconsistencies |
After.rem.unused.haplos |
Number of triads after removal of unused (rare) haplotypes |
pv.overall |
Overall likelihood ratio p-value (test of all genetic effects combined) |
haplos |
Haplotypes (or single-marker alleles) found during estimation |
haplofreq |
Estimated haplotype frequencies |
haplofreq.lower |
Lower 95% CI for estimated haplotype frequencies |
haplofreq.upper |
Upper 95% CI for estimated haplotype frequencies |
reference |
Reference method. If ref.cat is used, the reference category is labeled "ref" |
RR.est. |
Estimated single dose relative risk |
RR.lower |
Lower 95% CI for single dose relative risk |
RR.upper |
Upper 95% CI for single dose relative risk |
RR.p.value |
P-values for individual single dose effect |
RRdd.est. |
Estimated double dose relative risk |
RRdd.lower |
Lower 95% CI for double dose relative risk |
RRdd.upper |
Upper 95% CI for double dose relative risk |
RRdd.p.value |
P-values for individual double dose effect |
NOTE1 |
When maternal = TRUE, there will be additional columns: |
RRm.est. |
Estimated single dose relative risk for maternal haplotype |
RRm.lower |
Lower 95% CI for single dose relative risk for maternal haplotype |
RRm.upper |
Upper 95% CI for single dose relative risk for maternal haplotype |
RRm.p.value |
P-values for individual single dose effect of maternal haplotype |
RRmdd.est. |
Estimated double dose relative risk for maternal haplotype |
RRmdd.lower |
Lower 95% CI for double dose relative risk for maternal haplotype |
RRmdd.upper |
Upper 95% CI for double dose relative risk for maternal haplotype |
RRmdd.p.value |
P-values for individual double dose effect of maternal haplotype |
NOTE2 |
When poo = TRUE, the RR.est., RR.lower, and RR.upper columns will be replaced by the following columns: |
RRcm.est. |
Estimated single dose relative risk, when inherited from the mother |
RRcm.lower |
Lower 95% CI for single dose relative risk, when inherited from the mother |
RRcm.upper |
Upper 95% CI for single dose relative risk, when inherited from the mother |
RRcm.p.value |
P-values for individual single dose effects, when inherited from the mother |
RRcf.est. |
Estimated single dose relative risk, when inherited from the father |
RRcf.lower |
Lower 95% CI for single dose relative risk, when inherited from the father |
RRcf.upper |
Upper 95% CI for single dose relative risk, when inherited from the father |
RRcf.p.value |
P-values for individual single dose effects, when inherited from the father |
RRcm_RRcf.est. |
An estimate of parent-of-origin effect, i.e. the ratio RRcm/RRcf |
RRcm_RRcf.lower |
Lower 95% CI for ratio RRcm/RRcf |
RRcm_RRcf.upper |
Upper 95% CI for ratio RRcm/RRcf |
RRcm_RRcf.p.value |
P-value for parent-of-origin effect RRcm/RRcf at that marker |
Further information is found on the web page
Hakon K. Gjessing
Professor of Biostatistics
Division of Epidemiology
Norwegian Institute of Public Health
[email protected]
Web Site: https://haplin.bitbucket.io
## Not run: # Produce a table containing the most important output from haplin: res <- haplin("data.dat", use.missing = T, maternal = T) haptable(res) ## End(Not run)
## Not run: # Produce a table containing the most important output from haplin: res <- haplin("data.dat", use.missing = T, maternal = T) haptable(res) ## End(Not run)
This function prepares a cluster using Rmpi
package. The initialization
is paired with closing of the cluster using the finishParallelRun
function.
initParallelRun(cpus)
initParallelRun(cpus)
cpus |
Number of cores to use (optional). If given, only that number of CPUs will be used. By default (if not set), the Rmpi will check how many CPUs are available in the system and take the maximum number. |
Modifies a data file line by line, i.e. reads a file line by line, converts each line, then writes to the modified file. This method is especially useful when modifying large datasets, where the reading of entire files may be time consuming and require a large amount of memory.
lineByLine(infile, outfile, linefunc = identity, choose.lines = NULL, choose.columns = NULL, col.sep = " ", ask = TRUE, blank.lines.skip = TRUE, verbose = TRUE, ...)
lineByLine(infile, outfile, linefunc = identity, choose.lines = NULL, choose.columns = NULL, col.sep = " ", ask = TRUE, blank.lines.skip = TRUE, verbose = TRUE, ...)
infile |
A character string giving the name and path of the file to be modified. |
outfile |
A character string giving the name of the modified file. The name of the file is relative to the current working directory, unless the file name contains a definite path. |
linefunc |
|
choose.lines |
A numeric vector of lines to be selected or dropped from |
choose.columns |
A numeric vector of columns to be selected (positive values) or skipped (negative values) from |
col.sep |
Specifies the separator that splits the columns in |
ask |
Logical. Default is "TRUE". If set to "FALSE", an already existing outfile will be overwritten without asking. |
blank.lines.skip |
Logical. If "TRUE" (default), |
verbose |
Logical. Default is "TRUE", which means that the line number is displayed for each iteration, in addition to output from |
... |
Further arguments to be passed to |
When reading large datafiles, functions such as read.table
can use a large amount of memory and be extremely time consuming.
Instead of reading the entire file at once, lineByLine
reads one line at a time, modifies the line using linefunc
, and then writes the line to outfile
.
The user may specify his or her own line-converting function. This function must take the argument x
, a character vector representing a single line of the file, split at spaces. However, additional arguments may be included.
If verbose
equals "TRUE", output should be displayed.
The modified vector is returned.
The framework of the line-modifying function may look something like this:
lineModify <- function(x){ .xnew <- x ## Define any modifications, for instance recoding missing values in a dataset from NA to 0: .xnew[is.na(.xnew)] <- 0 ## Just to monitor progress, display, for instance, 10 first elements, without newline: cat(paste(.xnew[1:min(10, length(.xnew))], collapse = " ")) ## Return converted vector return(.xnew) }
See Haplin:::lineConvert
for an additional example of a line-modifying function.
lineByLine
returns the number of lines read, although invisible. The main objective is the modified file.
Miriam Gjerdevik,
with Hakon K. Gjessing
Professor of Biostatistics
Division of Epidemiology
Norwegian Institute of Public Health
Web Site: https://haplin.bitbucket.io
## Not run: ## Extract the first ten columns from "myfile.txt", ## without reordering lineByLine(infile = "myfile.txt", outfile = "myfile_modified.txt", choose.columns = c(1:10)) ## End(Not run)
## Not run: ## Extract the first ten columns from "myfile.txt", ## without reordering lineByLine(infile = "myfile.txt", outfile = "myfile_modified.txt", choose.columns = c(1:10)) ## End(Not run)
This is a help function to count the number of families in an object read in with genDataRead (or loaded with genDataLoad). Note: it is assumed that the study design is either 'triad' or 'cc.triad'.
nfam(data.in)
nfam(data.in)
data.in |
The data read in by genDataRead. |
How many families (integer).
This is a help function to count the number of individuals in an object read in with genDataRead (or loaded with genDataLoad).
nindiv(data.in, design = "triad")
nindiv(data.in, design = "triad")
data.in |
The data read in by genDataRead. |
design |
The design used in the study - choose from:
|
How many individuals (integer).
This is a help function to count the number of markers in an object read in with genDataRead (or loaded with genDataLoad).
nsnps(data.in, design = "triad")
nsnps(data.in, design = "triad")
data.in |
The data read in by genDataRead. |
design |
The design used in the study - choose from:
|
How many markers (integer).
Create summary tables and figure from a haplin
object. Save results as separate files in a specified directory.
output(object, dirname, ask = T)
output(object, dirname, ask = T)
object |
A |
dirname |
Text string, for instance "c:/work/haplinresults". Name of directory where results should be saved. Default is to save results in the current working directory. |
ask |
Logical. If TRUE (default), you will be asked before overwriting any files with the same name. If FALSE, |
After having run haplin
and saved the result (in the R workspace), the output
function will extract summary results, a summary table, and a plot of the results and save them to the specified directory. The filenames will be haplin_summary.txt, haplin_table.txt and haplin_plot.jpg, respectively. output
simply uses the available functions summary
, haptable
, and plot
to produce the files, but is a quick way of saving all the relevant results.
Further information is found on the web page.
Hakon K. Gjessing
Professor of Biostatistics
Division of Epidemiology
Norwegian Institute of Public Health
[email protected]
Gjessing HK and Lie RT. Case-parent triads: Estimating single- and double-dose effects of fetal and maternal disease gene haplotypes. Annals of Human Genetics (2006) 70, pp. 382-396.
Web Site: https://haplin.bitbucket.io
## Not run: # Run haplin and save results in separate files # in the c:\work\haplinresults directory: res <- haplin("data.dat", use.missing = T, maternal = T) output(res, dirname = "c:/work/haplinresults") ## End(Not run)
## Not run: # Run haplin and save results in separate files # in the c:\work\haplinresults directory: res <- haplin("data.dat", use.missing = T, maternal = T) output(res, dirname = "c:/work/haplinresults") ## End(Not run)
Plot a haplin object and (optionally) produce picture files
## S3 method for class 'haplin' plot(x, reference, separate.plots = F, filename, filetype = "png", use.dd, ...)
## S3 method for class 'haplin' plot(x, reference, separate.plots = F, filename, filetype = "png", use.dd, ...)
x |
A |
reference |
Same as |
separate.plots |
Logical. If you estimate effects of both fetal and maternal genes you can decide whether or not to plot them in the same plot. The default is the same plot (TRUE), the alternative (FALSE) means in separate plots. If you choose separate plots you may have to set the graphics window to "recording" to make sure you can scroll back to the first plot. |
filename |
If you want a file containing the plot to be produced, give a character string for the filename. |
filetype |
The default filetype is |
use.dd |
Numeric vector indicating which double dose estimates should be plotted. For instance, if set to c(1,3) only the first and third haplotypes will be drawn with double dose estimates. This is useful if some haplotypes are rare and you want to exclude the uncertain estimates from the plot. |
... |
Further arguments to be passed on to the plot function |
Further information is found on the web page.
Hakon K. Gjessing
Professor of Biostatistics
Division of Epidemiology
Norwegian Institute of Public Health
[email protected]
Gjessing HK and Lie RT. Case-parent triads: Estimating single- and double-dose effects of fetal and maternal disease gene haplotypes. Annals of Human Genetics (2006) 70, pp. 382-396.
Web Site: https://haplin.bitbucket.io
## Not run: # Produce separate plots for child and mother, dump plots to files: res <- haplin("data.dat", use.missing = T, maternal = T) plot(res, separate.plots = T, filename = "Haplinres.png") ## End(Not run)
## Not run: # Produce separate plots for child and mother, dump plots to files: res <- haplin("data.dat", use.missing = T, maternal = T) plot(res, separate.plots = T, filename = "Haplinres.png") ## End(Not run)
This will plot any haplinSlide object in one figure.
## S3 method for class 'haplinSlide' plot( x, filename, title, windows, plot.signif.only = FALSE, signif.thresh = 0.05, y.lim, x.title, y.title, file.width, file.height, ... )
## S3 method for class 'haplinSlide' plot( x, filename, title, windows, plot.signif.only = FALSE, signif.thresh = 0.05, y.lim, x.title, y.title, file.width, file.height, ... )
x |
The |
filename |
If the plot should be saved to the disk, give the name of the output file including the file extension. |
title |
If the user wishes to override the default title of the plot, give it here. |
windows |
Numerical vector. If given, this will only plot the chosen windows. |
plot.signif.only |
Logical: whether to filter out the "non-significant" markers from the plot. Default: FALSE, i.e., plot everything. |
signif.thresh |
The threshold defining the significant p-values: if
|
y.lim |
Vector with two numbers setting the Y limits of the plotted graph. |
x.title |
Title for the X axis (default: "marker"). |
y.title |
Title for the Y axis (default: "RR (log scale)"). |
file.width |
Width (in inches) for the output plot, if a filename was given. |
file.height |
Height (in inches) for the output plot, if a filename was given. |
... |
other arguments (ignored). |
The haplinSlide
object is a list of haplin
results - by
default in haptable
form. This is used to plot the relative risk
estimates for all the markers in one plot, to enable easy comparison. NB: those
estimates that have infinite confidence interval will not be plotted.
ggplot
object.
This will automatically plot all haplinStrat results on one figure.
## S3 method for class 'haplinStrat' plot( x, filename, title, windows, plot.signif.only = FALSE, signif.thresh = 0.05, y.lim, x.title, y.title, file.width, file.height, ... )
## S3 method for class 'haplinStrat' plot( x, filename, title, windows, plot.signif.only = FALSE, signif.thresh = 0.05, y.lim, x.title, y.title, file.width, file.height, ... )
x |
The |
filename |
If the plot should be saved to the disk, give the name of the output file including the file extension. |
title |
If the user wishes to override the default title of the plot, give it here. |
windows |
Numerical vector. If given, this will only plot the chosen windows. |
plot.signif.only |
Logical: whether to filter out the "non-significant" markers from the plot. Default: FALSE, i.e., plot everything. |
signif.thresh |
The threshold defining the significant p-values: if
|
y.lim |
Vector with two numbers setting the Y limits of the plotted graph. |
x.title |
Title for the X axis (default: "marker"). |
y.title |
Title for the Y axis (default: "RR (log scale)"). |
file.width |
Width (in inches) for the output plot, if a filename was given. |
file.height |
Height (in inches) for the output plot, if a filename was given. |
... |
other arguments (ignored). |
This function uses the same style as plot.haplinSlide
and plots all of
the haplinStrat
results on one figure, for easy comparison. NB: those
estimates that have infinite confidence interval will not be plotted.
ggplot
object.
Plot a haptable object, which is the result of running haptable
on a haplin
result, and (optionally) produce picture files.
## S3 method for class 'haptable' plot(x, separate.plots = F, filename, filetype = "png", use.dd, verbose = T, ...)
## S3 method for class 'haptable' plot(x, separate.plots = F, filename, filetype = "png", use.dd, verbose = T, ...)
x |
A |
separate.plots |
Logical. If you estimate effects of both fetal and maternal genes you can decide whether or not to plot them in the same plot. The default is the same plot (TRUE), the alternative (FALSE) means in separate plots. If you choose separate plots you may have to set the graphics window to "recording" to make sure you can scroll back to the first plot. |
filename |
If you want a file containing the plot to be produced, give a character string for the filename. |
filetype |
The default filetype is |
use.dd |
Numeric vector indicating which double dose estimates should be plotted. For instance, if set to c(1,3) only the first and third haplotypes will be drawn with double dose estimates. This is useful if some haplotypes are rare and you want to exclude the uncertain estimates from the plot. |
verbose |
Turns on or off some minor comments when plotting |
... |
Further arguments to be passed on to the plot function |
Further information is found on the web page.
Hakon K. Gjessing
Professor of Biostatistics
Division of Epidemiology
Norwegian Institute of Public Health
[email protected]
Gjessing HK and Lie RT. Case-parent triads: Estimating single- and double-dose effects of fetal and maternal disease gene haplotypes. Annals of Human Genetics (2006) 70, pp. 382-396.
Web Site: https://haplin.bitbucket.io
## Not run: # Directly plotting the haplin result. Produce separate plots for child and mother, # dump plots to files: res <- haplin("data.dat", use.missing = T, maternal = T) plot(res, separate.plots = T, filename = "Haplinres.png") # Create haptable from the haplin result and plot the results in the table: res <- haplin("data.dat", use.missing = T, maternal = T) tab <- haptable(res) plot(tab) # Create haptables for the 10 first markers of a data file using haplinSlide. # Create plots for each result: res <- haplinSlide("data.dat", markers = 1:10, use.missing = T, maternal = T, table.output = T) lapply(res, plot) ## End(Not run)
## Not run: # Directly plotting the haplin result. Produce separate plots for child and mother, # dump plots to files: res <- haplin("data.dat", use.missing = T, maternal = T) plot(res, separate.plots = T, filename = "Haplinres.png") # Create haptable from the haplin result and plot the results in the table: res <- haplin("data.dat", use.missing = T, maternal = T) tab <- haptable(res) plot(tab) # Create haptables for the 10 first markers of a data file using haplinSlide. # Create plots for each result: res <- haplinSlide("data.dat", markers = 1:10, use.missing = T, maternal = T, table.output = T) lapply(res, plot) ## End(Not run)
This function plots p-values for the relative risks calculated by haplinSlide
.
plotPValues( object, windows, which.p.val = "overall", plot.signif.only = FALSE, signif.thresh = 0.05, title, filename )
plotPValues( object, windows, which.p.val = "overall", plot.signif.only = FALSE, signif.thresh = 0.05, title, filename )
object |
The |
windows |
Numerical vector; if given, the plot will be restricted to only those. |
which.p.val |
Character string specifying which p-values to choose for plotting:
"overall" (default), "child", "child.double", "maternal", "maternal.double",
"paternal". The last three options can be chosen only if |
plot.signif.only |
Logical: whether to filter out the "non-significant" markers from the plot. Default: FALSE, i.e., plot everything. |
signif.thresh |
The threshold defining the significant p-values: if
|
title |
Optional character string for the title of the figure. |
filename |
If the plot should be saved to the disk, give the name of the output file including the file extension. |
The output of haplinSlide
can be very lengthy and not suitable for
an overall plot of all the relative risks (RR) on one figure. Therefore, it's advised
to first plot only the p-values for each window (user can choose which p-values to
plot - see parameter which.p.val
), and only then plot the RRs for specific
windows, for which the p-values are significant.
Invisibly returns the table with only the plotted p-values.
Produces a QQ-plot of p-values. The x-axis is -log10 of the expected p-values (under a null hypothesis of no effects), the y-axis is -log10 values of the actual p-values. A (pointwise) confidence interval can be drawn, and names of snps/genes corresponding to the most significant ones can be added.
pQQ(pvals, nlabs = 6, conf = 0.95, lim, mark = 0.05, ...)
pQQ(pvals, nlabs = 6, conf = 0.95, lim, mark = 0.05, ...)
pvals |
A vector of p-values. |
nlabs |
The number of (most significant) p-values to be labeled using |
conf |
The confidence level of the pointwise confidence band. The default is 0.95. The confidence intervals are computed under the assumption of the p-values being drawn independently from a uniform [0,1] distribution. To leave out the confidence interval, set this to FALSE. |
lim |
A vector of length 2 giving the plot limits (on a log10 scal, for instance c(0,4)). Plot limits are computed automatically. However, if other plot limits are desireable, they can be set using this argument. |
mark |
By default, the 0.05 significance level is marked by lines. Can be changed to a different value, or set to FALSE. |
... |
Other arguments passed on to the plotting function. |
The pvals
argument should be a vector of p-values to be plotted. If the vector has names corresponding to marker (snp) names, the plot will label some of the most significant points with the marker names.
No value is returned.
Hakon K. Gjessing
Professor of Biostatistics
Division of Epidemiology
Norwegian Institute of Public Health
[email protected]
Gjessing HK and Lie RT. Case-parent triads: Estimating single- and double-dose effects of fetal and maternal disease gene haplotypes. Annals of Human Genetics (2006) 70, pp. 382-396.
Web Site: https://haplin.bitbucket.io
Print basic information about a haplin object
## S3 method for class 'haplin' print(x, ...)
## S3 method for class 'haplin' print(x, ...)
x |
A |
... |
Other arguments, passed on to |
Further information is found on the web page
Hakon K. Gjessing
Professor of Biostatistics
Division of Epidemiology
Norwegian Institute of Public Health
[email protected]
Gjessing HK and Lie RT. Case-parent triads: Estimating single- and double-dose effects of fetal and maternal disease gene haplotypes. Annals of Human Genetics (2006) 70, pp. 382-396.
Web Site: https://haplin.bitbucket.io
Print the result of applying summary
to a haplin object
## S3 method for class 'summary.haplin' print(x, digits, ...)
## S3 method for class 'summary.haplin' print(x, digits, ...)
x |
A |
digits |
The number of digits to be used in the printout. Defaults to 3. |
... |
Other arguments (ignored). |
Further information is found on the web page
Hakon K. Gjessing
Professor of Biostatistics
Division of Epidemiology
Norwegian Institute of Public Health
[email protected]
Gjessing HK and Lie RT. Case-parent triads: Estimating single- and double-dose effects of fetal and maternal disease gene haplotypes. Annals of Human Genetics (2006) 70, pp. 382-396.
Web Site: https://haplin.bitbucket.io
## Not run: # Standard summary: res <- haplin("data.dat", use.missing = T, maternal = T) summary(res) # Increase number of digits in printout print(summary(res), digits = 8) ## End(Not run)
## Not run: # Standard summary: res <- haplin("data.dat", use.missing = T, maternal = T) summary(res) # Increase number of digits in printout print(summary(res), digits = 8) ## End(Not run)
Takes a sequence of files and combines them by rows, without reading the full files into memory. This is especially useful when dealing with large datasets, where the reading of entire files may be time consuming and require a large amount of memory.
rbindFiles(infiles, outfile, col.sep, header = FALSE, ask = TRUE, verbose = FALSE, add.file.number = FALSE, blank.lines.skip = FALSE)
rbindFiles(infiles, outfile, col.sep, header = FALSE, ask = TRUE, verbose = FALSE, add.file.number = FALSE, blank.lines.skip = FALSE)
infiles |
A character vector of names (and paths) of the files to combine. |
outfile |
A character string giving the name of the modified file. The name of the file is relative to the current working directory, unless the file name contains a definite path. |
col.sep |
Specifies the separator used to split the columns in the files. To split at all types of spaces or blank characters, set |
header |
A logical variable which indicates if the first line in each file contains the names of the variables. If "TRUE", |
ask |
Logical. Default is "TRUE". If set to "FALSE", an already existing outfile will be overwritten without asking. |
verbose |
Logical. Default is "TRUE", which means that the line number is displayed for each iteration, i.e. each combined line. |
add.file.number |
A logical variable which equals "FALSE" by deafult. If "TRUE", an extra first column will be added to the outfile, consisting of the file numbers for each line. |
blank.lines.skip |
Logical. If "TRUE" (default), |
The function rbind
combines R objects by rows. However, reading large data files may require a large amount of memory and be extremely time consuming.
rbindFiles
avoids reading the full files into memory. It reads the files line by line, possibly modifies each line, then writes to outfile.
If however, header
, verbose
, add.file.number
and blank.lines.skip
are all set to "FALSE"
(their default values), the files are appended directly, thus evading line-by-line modifications.
In the case where infiles
contains only one file and no output or modifications are requested
(verbose
, add.file.number
and blank.lines.skip
equal "FALSE"), an identical copy of this file is made.
There is no useful output; the objective of rbindFiles
is to produce outfile
.
Combining the files by reading each file line by line is less time efficient than appending the files directly. For this reason, if header = FALSE
, changing the values of the logical variables verbose
, add.file.number
and blank.lines.skip
from "FALSE" to "TRUE" should not be done unless absolutely necessary.
Miriam Gjerdevik,
with Hakon K. Gjessing
Professor of Biostatistics
Division of Epidemiology
Norwegian Institute of Public Health
[email protected]
Web Site: https://haplin.bitbucket.io
## Not run: # Combines the three infiles, by rows rbindFiles(file.names = c("myfile1.txt", "myfile2.txt", "myfile3.txt"), outfile = "myfile_combined_by_rows.txt", col.sep = " ", header = TRUE, verbose = TRUE) ## End(Not run)
## Not run: # Combines the three infiles, by rows rbindFiles(file.names = c("myfile1.txt", "myfile2.txt", "myfile3.txt"), outfile = "myfile_combined_by_rows.txt", col.sep = " ", header = TRUE, verbose = TRUE) ## End(Not run)
This is a help function to extract genotypes from an object read in with genDataRead (or loaded with genDataLoad).
showGen(data.in, design = "triad", n = 5, from, to, sex, markers = 1:5)
showGen(data.in, design = "triad", n = 5, from, to, sex, markers = 1:5)
data.in |
The data read in by genDataRead. |
design |
The design used in the study - choose from:
. |
n |
Number of rows to display or "all" (default: 5). |
from |
From which row to display (optional, default: from the first). |
to |
To which row to display (optional). |
sex |
If the sex column is part of the phenotypic information, the user can choose based on one of the categories used in this column (optional); NB: this does not combine with the 'to' and 'from' arguments. |
markers |
A vector specifying which markers to display or "all" (default: first 5); NB: the user can specify the markers by numbers or by their names. |
A table with genotypes extracted from 'data.in'.
This is a help function to extract phenotypic (and covariate) data from an object read in with genDataRead (or loaded with genDataLoad).
showPheno(data.in, n = 5, from, to, sex)
showPheno(data.in, n = 5, from, to, sex)
data.in |
The data read in by genDataRead). |
n |
Number of rows to display or "all" (default: 5). |
from |
From which row to display (optional, default: from the first). |
to |
To which row to display (optional). |
sex |
If the sex column is part of the phenotypic information, the user can choose based on one of the categories used in this column (optional); NB: this does not combine with the 'to' and 'from' arguments. |
A table with phenotypic and covariate data (if any) extracted from 'data.in'.
This is a help function to extract marker names from an object generated by genDataRead, genDataLoad, genDataGetPart or genDataPreprocess.
showSNPnames(data.in, n = 5, from, to)
showSNPnames(data.in, n = 5, from, to)
data.in |
The data as outputted by genDataRead, genDataLoad, genDataGetPart or genDataPreprocess. |
n |
Number of names to display or "all" (default: 5). |
from |
From which marker to display (optional, default: from the first). |
to |
To which marker to display (optional). |
A vector with marker names, as read in from map.file or generated dummy names.
Gives the column numbers of SNP identifiers or SNP numbers in a standard ped file, calculated from the SNP's positions in the corresponding map file. The column numbers are sorted in the same order as snp.select
.
These positions may be useful when extracting a selection of SNPs from a ped file.
snpPos(snp.select, map.file, blank.lines.skip = TRUE)
snpPos(snp.select, map.file, blank.lines.skip = TRUE)
snp.select |
A character vector of the SNP identifiers (RS codes) or a numeric vector of the SNP numbers. |
map.file |
A character string giving the name and path of the standard map file to be used. See Details for a description of the standard map format. |
blank.lines.skip |
Logical. If "TRUE" (default), |
To extract certain SNPs from a standard ped file, one has to know their positions in the ped file.
This can be obtained from the corresponding map file.
The map file should look something like this:
Chromosome SNP-identifier Base-pair-position 1 RS9629043 554636 1 RS12565286 711153 1 RS12138618 740098
Alternatively, the map file could contain four columns. The column values should then be:
Chromosome, SNP-identifier, Genetic-distance, Base-pair-position.
A header must be added to the map file if this does not already exist.
The format of the corresponding ped file should be something like this:
1104 1104-1 1104-2 1104-3 1 2 4 1 3 2 1104 1104-2 0 0 1 1 4 1 2 2 1104 1104-3 0 0 2 1 0 0 0 0 1105 1105-1 1105-2 1105-3 2 2 1 1 2 2 1105 1105-2 0 0 1 1 1 1 2 2 1105 1105-3 0 0 2 1 1 1 3 2
The column values are: Family id, Individual id, Father's id, Mother's id, Sex (1 = male, 2 = female, alternatively: 1 = male, 0 = female), and Case-control status (1 = controls, 2 = cases, alternatively: 0 = controls, 1 = cases).
Column 7 and onwards contain the genotype data, with alleles in separate columns. A “0” is used to denote missing data.
A vector of the column numbers of the SNP identifiers/SNP numbers in the ped file, sorted in the same order as given in snp.select
.
The function does not check if the map file is formatted correctly or if the map and ped file have the same number of SNPs. The corresponding positions of the SNPs in the ped file may not be correct if the ped file has a different format from the given example.
Miriam Gjerdevik,
with Hakon K. Gjessing
Professor of Biostatistics
Division of Epidemiology
Norwegian Institute of Public Health
[email protected]
Web Site: https://haplin.bitbucket.io
## Not run: # Find the column numbers of the SNP identifiers "RS9629043" and "RS12565286" in # a standard ped file snpPos(snp.select = c("RS9629043", "RS12565286"), map.file = "mygwas.map") ## End(Not run)
## Not run: # Find the column numbers of the SNP identifiers "RS9629043" and "RS12565286" in # a standard ped file snpPos(snp.select = c("RS9629043", "RS12565286"), map.file = "mygwas.map") ## End(Not run)
Calculates power for a single SNP. Allows for power computations of several combinations simultaneously.
snpPower(cases, controls, RR, MAF, alpha = 0.05)
snpPower(cases, controls, RR, MAF, alpha = 0.05)
cases |
A list of the number of case families. Each element contains the number of families of a specified family design. The possible family designs, i.e. the possible names of the elements, are |
controls |
A list of the number of control families. Each element contains the number of families of a specified family design. The possible family designs are |
RR |
A numeric vector of the relative risks (the effect sizes of interest). |
MAF |
A numeric vector of the minor allele frequencies. |
alpha |
A numeric vector of the Type I Errors. Equals 0.05 by default. |
snpPower
computes power for a single SNP by counting the number of "real" case alleles, "real" control alleles and pseudo-control alleles.
The pseudo-control alleles are the non-transmitted alleles, possibly from both case families and control families. It assumes a multiplicative dose-response model.
snpPower
uses the asymptotic normal approximation for the natural logarithm of the odds ratio for calculating power
(the relative risks and odds ratios are used interchangeably due to the "rare disease assumption").
snpPower
allows for power calculations for mixtures of the possible case family designs and control family designs.
The argument cases
could, for example, consist of a combination of 1000 full case triads (family design "mfc"
) and 500 single case children (family design "c"
).
It is also feasible to compute power for several combinations of the input variables simultaneously. See Examples for further details.
snpPower
returns a data frame containing the combinations of input variables and the corresponding power calculations.
Miriam Gjerdevik,
with Hakon K. Gjessing
Professor of Biostatistics
Division of Epidemiology
Norwegian Institute of Public Health
Skare O, Jugessur A, Lie RT, Wilcox AJ, Murray JC, Lunde A, Nguyen TT, Gjessing HK. Application of a novel hybrid study design to explore gene-environment interactions in orofacial clefts. Annals of Human Genetics (2012) 76, pp. 221-236.
Web Site: https://haplin.bitbucket.io
haplin
, snpSampleSize
, hapRun
, hapPower
, hapPowerAsymp
## Compute power for a single SNP, ## for the combination of 1000 case triads, RR = 1.2, MAF = 0.1 and alpha = 0.05 snpPower(cases = list(mfc = 1000), controls = list(mfc = 0), RR = 1.2, MAF = 0.1) ## Compute power for a single SNP, ## for the combination of 1000 case triads and 500 single case children (altogether), ## 5000 control triads, RR = 1.1, MAF = 0.1 and alpha = 0.05 snpPower(cases = list(mfc = 1000, c = 500), controls = list(mfc=5000), RR = 1.1, MAF = 0.1, alpha = 0.05) ## Compute power for a single SNP, ## for the combination of 500 case triads, 10000 control triads, ## relative risk of 1.2 and minor allele frequency of 0.1, ## and also for the combination of 1000 case triads, 10000 control triads, ## relative risk of 1.1 and minor allele frequency of 0.1 snpPower(cases = list(mfc = c(500, 1000)), controls = list(mfc = 10000), RR = c(1.2, 1.1), MAF = 0.1)
## Compute power for a single SNP, ## for the combination of 1000 case triads, RR = 1.2, MAF = 0.1 and alpha = 0.05 snpPower(cases = list(mfc = 1000), controls = list(mfc = 0), RR = 1.2, MAF = 0.1) ## Compute power for a single SNP, ## for the combination of 1000 case triads and 500 single case children (altogether), ## 5000 control triads, RR = 1.1, MAF = 0.1 and alpha = 0.05 snpPower(cases = list(mfc = 1000, c = 500), controls = list(mfc=5000), RR = 1.1, MAF = 0.1, alpha = 0.05) ## Compute power for a single SNP, ## for the combination of 500 case triads, 10000 control triads, ## relative risk of 1.2 and minor allele frequency of 0.1, ## and also for the combination of 1000 case triads, 10000 control triads, ## relative risk of 1.1 and minor allele frequency of 0.1 snpPower(cases = list(mfc = c(500, 1000)), controls = list(mfc = 10000), RR = c(1.2, 1.1), MAF = 0.1)
Sample size calculations for a single SNP. Allows for sample size calculations of several combinations simultaneously.
snpSampleSize(fam.cases, fam.controls, fraction = 0.5, RR, MAF, alpha = 0.05, power = 0.80)
snpSampleSize(fam.cases, fam.controls, fraction = 0.5, RR, MAF, alpha = 0.05, power = 0.80)
fam.cases |
A character vector of the case family design. The possible family designs are |
fam.controls |
A character vector of the control family design. The possible family designs are |
fraction |
A numeric vector of the proportion of case families. Equals 0.5 by default, i.e. there are as many case families as control families. If |
RR |
A numeric vector of the relative risks (the effect sizes of interest). |
MAF |
A numeric vector of the minor allele frequencies. |
alpha |
A numeric vector of the Type I Errors. Equals 0.05 by default. |
power |
A numeric vector of the desired probability of identifying a difference in the relative risks. Default is 0.80. |
snpSampleSize
computes the number of case and control families required for a single SNP to attain the desired power.
It assumes a multiplicative dose-response model.
snpSampleSize
calculates the fraction of case alleles corresponding to the given family designs
and then uses the asymptotic normal approximation for the natural logarithm of the odds ratio for calculating the sample sizes
(the relative risks and odds ratios are used interchangeably due to the "rare disease assumption").
snpSampleSize
allows for sample size calculations of several combinations of the input variables at once. The Examples section provides further details.
snpSampleSize
returns a data frame containing the combinations of input variables and the corresponding sample size calculations.
Miriam Gjerdevik,
with Hakon K. Gjessing
Professor of Biostatistics
Division of Epidemiology
Norwegian Institute of Public Health
Skare O, Jugessur A, Lie RT, Wilcox AJ, Murray JC, Lunde A, Nguyen TT, Gjessing HK. Application of a novel hybrid study design to explore gene-environment interactions in orofacial clefts. Annals of Human Genetics (2012) 76, pp. 221-236.
Web Site: https://haplin.bitbucket.io
## Compute sample sizes for a single SNP, ## when the specified case family design is a full triad, there are no control families, ## RR = 1.1, MAF = 0.1, alpha = 0.05 and power = 0.9 snpSampleSize(fam.cases = "mfc", fam.controls = "no_controls", RR = 1.1, MAF = 0.1, alpha = 0.05, power = 0.9) ## Compute sample sizes for a single SNP, ## for the combination of case triads, control triads, fraction = 0.5, RR = 1.2, ## MAF = 0.1, alpha = 0.05 and power = 0.8, and also for the combination of case triads, ## control children, fraction = 0.5, RR = 1.2, MAF = 0.1, alpha = 0.05 and power = 0.9 snpSampleSize(fam.cases = "mfc", fam.controls = c("mfc", "c"), RR = 1.2, MAF = 0.1, alpha = 0.05, power = c(0.8, 0.9))
## Compute sample sizes for a single SNP, ## when the specified case family design is a full triad, there are no control families, ## RR = 1.1, MAF = 0.1, alpha = 0.05 and power = 0.9 snpSampleSize(fam.cases = "mfc", fam.controls = "no_controls", RR = 1.1, MAF = 0.1, alpha = 0.05, power = 0.9) ## Compute sample sizes for a single SNP, ## for the combination of case triads, control triads, fraction = 0.5, RR = 1.2, ## MAF = 0.1, alpha = 0.05 and power = 0.8, and also for the combination of case triads, ## control children, fraction = 0.5, RR = 1.2, MAF = 0.1, alpha = 0.05 and power = 0.9 snpSampleSize(fam.cases = "mfc", fam.controls = c("mfc", "c"), RR = 1.2, MAF = 0.1, alpha = 0.05, power = c(0.8, 0.9))
The first argument to suest
should be a list of haplin estimation results (from the same data file), usually the output from haplinSlide
. suest
produces as a result a joint overall p-value based on aggregating the individual p-values and then correcting for multiple testing. The correction is achieved by using the principle of "seemingly unrelated" estimation, taking into account the correlation between the individual estimation results.
suest(reslist)
suest(reslist)
reslist |
A list whose elements are different haplin runs on the same data file, typically the output of |
haplinSlide
runs haplin
on a series of overlapping
windows of markers from the same data file, typically within the same gene.
Since each run produces a separate overall p-value, suest
computes a
joint overall p-value for the gene (or region) that has been scanned. It
corrects the overall p-value for multiple testing, also taking into account
the fact that the sequence of estimates produced by haplinSlide
will
be dependent, both because they are computed on the same data set and also
since the windows are overlapping (if the window length is larger than 1).
If the suest
estimation fails (which doesn't happen very often), a
standard Bonferroni correction is used instead. Important:
haplinSlide
must be run with the option table.output = FALSE
to provide suest
with enough information.
A list is returned, the most important elements of which are:
pval.obs |
The overall score p-values from each haplin run |
pval.obs.corr |
The joint p-value, corrected for multiple testing |
bonferroni |
A logical, usually FALSE, which means the |
Further information is found on the web page.
Hakon K. Gjessing
Professor of Biostatistics
Division of Epidemiology
Norwegian Institute of Public Health
[email protected]
Gjessing HK and Lie RT. Case-parent triads: Estimating single- and double-dose effects of fetal and maternal disease gene haplotypes. Annals of Human Genetics (2006) 70, pp. 382-396.
Web Site: https://haplin.bitbucket.io
## Not run: # (Almost) all standard haplin runs can be done with haplinSlide. # Below is an illustration. See the haplin help page for more # examples. # # Analyzing the effect of fetal genes, including triads with missing data, # using a multiplicative response model. When winlength = 1, separate # markers are used. To make longer windows, winlength can be increased # correspondingly: result.1 <- haplinSlide("C:/work/data.dat", use.missing = T, response = "mult", reference = "ref.cat", winlength = 1, table.output = F) # Provide summary of separate results: lapply(result.1, summary) # Plot results: par(ask = T) lapply(result.1, plot) # Compute an overall p-value for the scan, corrected for multiple testing # and dependencies between windows: suest(result.1) ## End(Not run)
## Not run: # (Almost) all standard haplin runs can be done with haplinSlide. # Below is an illustration. See the haplin help page for more # examples. # # Analyzing the effect of fetal genes, including triads with missing data, # using a multiplicative response model. When winlength = 1, separate # markers are used. To make longer windows, winlength can be increased # correspondingly: result.1 <- haplinSlide("C:/work/data.dat", use.missing = T, response = "mult", reference = "ref.cat", winlength = 1, table.output = F) # Provide summary of separate results: lapply(result.1, summary) # Plot results: par(ask = T) lapply(result.1, plot) # Compute an overall p-value for the scan, corrected for multiple testing # and dependencies between windows: suest(result.1) ## End(Not run)
Provides detailed information about estimation results from a haplin object.
## S3 method for class 'haplin' summary(object, reference, ...)
## S3 method for class 'haplin' summary(object, reference, ...)
object |
A |
reference |
Same as |
... |
Other arguments (ignored). |
Further information is found on the web page
Hakon K. Gjessing
Professor of Biostatistics
Division of Epidemiology
Norwegian Institute of Public Health
[email protected]
Gjessing HK and Lie RT. Case-parent triads: Estimating single- and double-dose effects of fetal and maternal disease gene haplotypes. Annals of Human Genetics (2006) 70, pp. 382-396.
Web Site: https://haplin.bitbucket.io
## Not run: # Produce separate plots for child and mother, dump plots to files: res <- haplin("data.dat", use.missing = T, maternal = T) summary(res) ## End(Not run)
## Not run: # Produce separate plots for child and mother, dump plots to files: res <- haplin("data.dat", use.missing = T, maternal = T) summary(res) ## End(Not run)
When haplinSlide is run with the option table.output = T
, the result is a list of haptable
s, i.e. tables with summary haplin results for each window haplinSlide
is run on. toDataFrame
stacks the separate dataframes into one large dataframe containing all results.
toDataFrame(x, reduce = F)
toDataFrame(x, reduce = F)
x |
The output from |
reduce |
Reduce output to one line per marker |
When haplinSlide
is run with winlength = 1
on SNP markers, each table in the output has only two rows, and can be condensed to a single row. By setting the argument reduce
to TRUE, toDataFrame
reduces each table to one line and returns a dataframe with one line for each SNP. In more general situations, with multi-allelic markers or, more commonly, winlength
set to 2 or more, each output table will typically have more than two rows and cannot be reduced, so reduce
should be set to FALSE.
The output is a dataframe. First column contains the marker names. Second column are row numbers, counted within each output table. The remaining columns are identical to the individual output columns, which are described in more detail in the help file for haptable
.
Hakon K. Gjessing
Professor of Biostatistics
Division of Epidemiology
Norwegian Institute of Public Health
[email protected]
Gjessing HK and Lie RT. Case-parent triads: Estimating single- and double-dose effects of fetal and maternal disease gene haplotypes. Annals of Human Genetics (2006) 70, pp. 382-396.
Web Site: https://haplin.bitbucket.io