CarthaGène FAQ

FAQ

How can I produce a map image to include in my publication(s) ?
I would like to know if there is a way to load a marker order in CarthaGene and get that order treated as an initial map?
When the parents are diploid outbreds whose genotypes are Ab and AC how do you encode the descendance with codominant markers ?
In a 1:1 segregation, I have dominant markers whith genotypes AB x BB (a = presence, B =absence). How shall I encode this situation ?
Can CarthaGène infer the phases automatically ?
Is it possible to execute commands from a file in order to automatically excute a series of commands ?
How does CarthaGène handles missing data, especially data sets are merged ?
To avoid repeated long startup times on big data-sets (60k markers and more), I would like to use Carthagene as a "mapping server" that can be launched once and controlled from 'R'. Is it feasible ?
Could you please let me know for Doubled Haploid (DH) data which header and encoded pattern should I use?

How can I produce a map image to include in my paper ?
First, you must build your map under CarthaGčne. Then use the graphical interface of CarthaGčne (CGW.tcl). In the "Maps" menu, select the "Graphical" item. You can configure how the map(s) are rendered using the contextual menu (right mouse button). Then print the map to a file as Postscript file.

This file can be either directly included in your publication or edited. Under Windows this can be achieved using a dedicated editor such as Adobe Illustrator. Under Unix, you can translate the Postscript to an editable file format using the "pstoedit" utility. Assuming that the Postscript file is "map.ps", the translation to SVG (Scalable vector Graphics) in "map.svg" can be achived as follows:
```
> pstoedit -f plot-svg map.ps map.svg
	    
```
This file can then be edited under a free SVG editor such as "Inkscape" and exported to formats that can be included under most word processors. Both "pstoedit" and "inkscape" are available as packages under most Linux distributions. Ask your system administrator.
I would like to know if there is a way to load a marker order in CarthaGene and get that order treated as an initial map?
Yes. We'll use the dataset rh1.cg provided in the distribution in the "data" directory to give you an example:
1. I load the data set
```
CG> dsload Data/rh1.cg
{1 haploid RH 13 118 /homes/thomas/CartaGene/dev/test/Data/rh1.cg}
	    
```
2. I set the current marker selection to the order of markers I'm interested in:
```
CG> mrkselset [mrkids {MS5 MS1 MS15 MS4 G37}]
	    
```
  (mrkids will transform the marker names in to numerical ids for mrkselset that simply set the current marker selection used)
3. I estimate the distances and puts the map in the "heap":
```
CG> sem

Map -1 : log10-likelihood =  -121.24
-------:
 Set : Marker List ...
   1 : MS5 MS1 MS15 MS4 G37
	    
```
This is it. Now you can use flip, polish or greedy to check the validity of the map. Have a look to cgrobustness in the documentation too. Note that you can also give an initial order to buildfw for building framework maps (as a forced starting order).

When the parents are diploid outbreds whose genotypes are Ab and AC how do you encode the descendance with codominant markers ?
This is all explained in the manual (outbred section). Remember that phase MUST be known. Imagine the parents (F and M) are F0|F1 x M0|M1= A | B x A | C. The sibling can be:
```
A | A = F0|M0 = code 1 
A | C = F0|M1 = code 2
B | A = F1|M0 = code 4
B | C = F1|M1 = code 8
```
Warning. This is phase dependent. If I change the pahe known genotype of the F(ather), we get F0|F1 x M0|M1= B | A x A | C and the code are:
```
A | A = F1|M0 = code 4
A | C = F1|M1 = code 8
B | A = F0|M0 = code 1
B | C = F0|M1 = code 2
```

In a 1:1 segregation, I have dominant markers whith genotypes AB x BB (a = presence, B =absence). How shall I encode this situation ?
If all the markers follow this schema, you are faced with a classical backcross situation. Use the "backcross" data type. If this occurs inside an outbred situation (phases known !) and if I assume the following phases F0|F1 x M0|M1=A | B x B | B, then
```
B|B: (F1|M0 or F1|M1) which means code c 
A|B: (F0|M0 or F0|M1) which means code 3
```
The two codes are inverted if the heterozygous parent phase changes.

Can CarthaGène infer the phases automatically ?
No. In the current state of things, you have to infer the phases on your own. A software like CRIMAP with the chrompic command can help. More recently, Dustin Cartwright has produced the phasing program whose output is compatible with Carthagene and which can be used when many offsprings are available.

Is it possible to execute commands from a file in order to automatically excute a series of commands ?
Yes. CarthaGene contains a complete programming langage called Tcl and you can use all the facilities of this langage (see the TCL site for exhaustive documentation). For executing commands from a file called for example comfile, simply type the following command in CarthaGène window:
```
source
	comfile
```
I want to work on a chromosome by chromosome basis. How can I do that ?
Well, CarthaGčne on its own does not have "chromosome-oriented" command (except for linkage group identification). You can again rely on the programming langage (see the TCL site for exhaustive documentation) that hosts CarthaGčne to work on a chromosome by chromosome basis. You can use variables to store the list of markers in each group and then use "mrkselset" to shift from one linkage group to the next chromosome.
```
CG> dsload mouse.raw
{1 intercross 308 46 /home/tschiex/Dev/carthagene/distrib/data/mouse.raw}
CG> group 0.2 3
...
CG> set chrom1 [groupget 23]
5 12 218 270 293 300 274 232 171...
CG> mrkselset $chrom1
CG> ....
	
```
If you don't want to redo this again and again, store just the corresponding commands in a text file and you will be able to reexecute it easily (see question above).
How does CarthaGène handles missing data, especially data sets are merged ?
CarthaGene integrates several specific versions of the EM algorithm for handling missing data. For intercross, outbreds and diploid radiated hybrids, this is essentially a classical EM algorithm implementation for HMM. For backcross (and related pedigree like RIL) and for haploid radiated hybrids, there is a specific EM algorithm that can run 1 or 2 orders of magnitude faster than traditional EM implementations (without loss of precision). See this paper. When 2 data sets are merged with
```
dsmergen
```
, all the markers typed in a data set but missing in another one are considered as missing data in this data set. The E (expectation) phase of EM is distributed in each data set (depending on its type) and the sum of expectation is used to estimate the parameters in the M phase. When 2 data-sets are merged using
```
dsmergor
```
, a traditional EM algorithm is used independently on each dataset and the loglikelihood are cumulated together. For a given markers order, this yields an overall loglikelihood for the merged dataset and estimated parameters for each dataset.
To avoid repeated long startup times on big data-sets (60k markers and more), I would like to use Carthagene as a "mapping server" that can be launched once and controlled from 'R'. Is it feasible ?
Yes. This is feasible and easy to achieve under Unix/Linux. You have to launch Carthagene with standard output and input assigned respectively to two named fifos. An 'R' script "skeleton" to do this is available under the "Utilities" link on the left menu (Thanks to Matthieu Falque, from INRA Moulon for making this available to the community).
Note that the 1.3 version of CarthaGene is now much faster on startup as it does not recompute 2pts measures (LOD, distances) on startup but use a cache file and multi-threading to handle 2pts measures and grouping.
Could you please let me know for Doubled Haploid (DH) data which header and encoded pattern should I use?
Doubled Haploids (DH) are equivalent to backcross data. They should be handled exactly as such using the "data type f2 backcross" header type and "H" and "A" to encode the two possible genotypes ("-" being used for unknowns).

Last modified: June 20 02:20 CET 2012