Documentation

Input Files

Input Files

 

A description of all input files that could be used with DISSECT.

Genotypes files

 

Genotype files use the PLINK binary ped format (bed). When a genotype file must be specified, it can be provided as the name of one single file, or as a file with a list of bed files. This allows to have genotype files splitted among several smaller files  (e.g. a file for each chromosome) which could help on their managing when the number of individuals is high. In addition, for some analysis (i.e. GRM computation), this could also be translated in a lower memory consumption.

Genotype files could be subdivided performing individuals subsets or SNPs subsets. DISSECT can deal with both situations, however we strongly recommend only use the case where SNPs subsets are made. The other situation is not tested properly. For loading all files, a file with a list of all genotype files must be passed as an argument to DISSECT. These file must contain a list with each filename and its full path without any extension (i.e. without .bed/.bim/.fam).

GRM files

There are two types of stored GRMs. Diagonalized GRMs and normal GRMs.

Normal GRMs are stored in three files with extensions:

.grm.snps Stores the list of SNPs used for computing the GRM.
.grm.ids Stores the family ID and individuals ID of the individuals in the GRM.
.grm.dat
Stores the GRM matrix and normalization matrix.

Diagonalized GRMs are stored in four files with extensions:

.grm.snps Stores the list of SNPs used for computing the GRM.
.grm.ids Stores the family ID and individuals ID of the individuals in the GRM.
.grm.dat
Stores the GRM matrix eigenvectors.
.grm.diag
Stores the GRM eigenvalues.

 

Phenotypes files

Phenotype files are files without header and with three or more columns. Their column distribution is:

Column 1 Family ID
Column 2 Individual ID
Cloumns 3 and above Each column could contain a different phenotypes for each individual. Valid values are any number and “NA”. The latter will assume data is missing. By default, phenotypes on the first column will be used (second column for the second trait in a bivariate analysis). Although this behaviour could be changed with the correspondent option.

Covariates files

Covariate files are files without header. These files could be for discrete and quantitative covariates. Their column distribution is:

Column 1 Family ID
Column 2 Individual ID
Cloumns 3 and above Each column could contain different covariates for each individual. For quantitative covariates files, valid values are any number and “NA” (which will be swapped by the column mean). For discrete covariates files, values could be anything. Each discrete covariate in a column is recognized as a categorical factor with several levels. In these files, “NA” is interpreted also as a category.

Effects files

Effects files could be used for prediction analysis. These files could follow the format of result files from MLM or GWAS analysis.

SNPs effects sizes files

File that could be used for specifying SNP effect sizes for phenotype simulation. It is without header. It must have two columns

Column 1 SNP name
Column 2 Effect sizes. This column could contain a number or be empty for one or all SNPs. The effect of empty SNPs will be assumed to be randomly distributed following a normal distribution.

 

Regional SNP groups files

File that could be used for assigning SNP to different groups.

Column 1 SNP name.
Column 2 One or more groups separated by spaces to which the SNP belongs.