imputation

`imputation`

This command is a highly experimental command that will work only if just one RH data set has been loaded. The error model strongly depends on a given order so a reliable order for the markers must be established. The command saves a corrected dataset for the current RH dataset with errors based on the last evaluated map (using sem). A .log file is also created with the a posteriori errors.

Synopsis:

The imputation command is invoked as:

imputation ConversionCutoff CorrectionCutoff UnknownCorrectionCutoff Filename

Description:

The imputation command uses the a posteriori probability of each "genotype" (Absent or present) computed during the last mapping command to create a new dataset where known genotypes are replaced by possibly corrected (more likely) genotypes or set to unknown (if they are undecided) and where unknowns may be replaced by a sufficiently likely genotype if there is one.

Arguments:

ConversionCutoff: A known data point with an a posteriori probability of "error" below this threshold will be just kept unchange (conserved).
CorrectionCutoff : A known data point with an a posteriori probability of "error" above this threshold will be converted to the other genotype (corrected). Should be close to and lerger than the previous threshold.
All the known genotypes with an a posteriori probability of "error" between these two thresholds will be set to "unknown" (converted).
UnknownCorrectionCutoff : for unknown data points only, if the a posteriori probability for one of the "genotype" is above this threshold, it will be set to this genotype (corrected). Otherwise it is kept as unknown (conserved). Should be close to .
Filename : the path of the file where the imputated data will be saved.

Returns:

the number of data points that have been conserved as is, corrected and converted.

Example:

   CarthaGene version 1.2-LKH, Copyright (c) 1997-2010 (INRA).

   CarthaGene comes with ABSOLUTELY NO WARRANTY.
   CarthaGene is free software. You are welcome to redistribute it,
   under certain conditions. See the License file for information.

Type 'help' for help.

# we load an RH dataset with the error header set
CG> dsload Data/rh1-error.cg
{1 haploid RH with Errors 13 118 /home/tschiex/Dev/carthagene/doc/user/exem...
# we assume we know the markers order. We compute a map
CG> sem

Map -1 : log10-likelihood =  -303.45
-------:
 Set : Marker List ...
Loglikelihood = -3.034519e+02, retention = 0.29 Error pi = 0.0203, nu = 0.4651

# we then ask for a corrected version. This clones the dataset, then impute...
CG> imputation 1 0.1 0.5 0.9
{2 corrected imputed genotypes from 1, 13 markers}
# finally we can save this newly created dataset
# (the given 'imputation.cg' will be appended to the filename as an extensi...
CG> dsave 2 imputation.cg
/home/tschiex/Dev/carthagene/doc/user/exemple/Data/rh1-error.cg.imputation.cg
CG>