Recode

Program to recode raw data into numbered alleles.

Version: 1.00

Last Update: Dec 6, 2001

Author: Dan Weeks

Copyright: (c) Dan Weeks,2001-2002, University of Pittsburgh

Language: C

Platforms: C

Currently RECODE has been successfully compiled and tested for Unix platforms.
.

Distribution: C
Currently, we are distributing the source as a tar file recode.tar.z. This contains:
recode.c
arrays.c defs.h
main.h
pro.h

Compilation: C
To compile, issue the command

  > cc -o recode recode.c arrays.c -lm 
or use the Makefile by typing at the prompt
  > make 

Running Recode: C
To run RECODE type at the prompt

  > recode  


Contact info: Use the feedback page for comments, questions and bug reports

Detailed documentation:
This program was written to recode alphabeticly coded systems into numbered alleles. It also will recode numerically coded systems into numbered alleles (This would be useful if the alleles represent base pair counts).

Input:

1) A LINKAGE-format file (up to the proband field). Untyped people must have a '0 0' or '-1 -1' genotype. Alphabetic systems must be coded without a space between the alleles. Homozygotes can have just one allele ('H') or two alleles ('HH'). Numeric systems must have at least one space between alleles.

2) names.dat containing the names of the markers, one per line. This file is optional, and is used automatically if it resides in the current directory.



Output:
1) A recoded pedigree file, using the alleles encountered, numbered from one up.
2) key.dat: A key file indicating the recoding scheme used.
3) recode.dat: The portion of the 'datain.dat' file containing the information pertaining to the recoded markers. The allele frequencies here are incorrect, as they simply represent the number of times each allele appeared in the pedigree, without regard to how many times the allele entered the pedigree. However, these should provide a reasonable starting point for estimating the allele frequencies.
4) recode.gs: The portion of the GAS-format locus file containing the information pertaining to the markers. GAS stands for the "Genetic Analysis System" software package written by Alan Young at the University of Oxford - it is available by anonymous ftp from 'ftp.well.ox.ac.uk' in the pub/genetics/gas directory.
5) ped.gs: A GAS-format pedigree file (containing the original alleles).

Example:

Input:

1) pedfile.dat:


 701  1  0  0  3  0  0 1 1    0 1    0 0      0 0     0 0     0 0     0 0                     0 0  
 701  2  0  0 11  0  0 1 0    1 1      JN      GP      DD      3 4      HH                      JL             
 701  3  1  4 17  5  5 1 0    1 1      GL      GP      DF      4 4      GH                      CK             
 701  4  0  0  3  0  0 2 0    0 2    0 0     0 0    0  0    0  0     0 0                       0 0            
 701  5  1  4  9  6  6 1 0    1 1      LL      CG      DF      3 4      HH                      CJ             
 701  6  1  4  0  8  8 1 0    1 1      LL      CG      DF      3 4      HH                      CJ             
 701  7  0  0 13  0  0 1 0    1 1    0 0     0 0      0 0    0 0      0 0                      0 0            
 701  8  1  4  0 12 12 1 0    1 1      LM      CG      DF      3 4      HH                      FJ             
 701  9  5 10  0 20 20 1 0    1 1      GL      CP      DD      3 4      HH                      EJ             
 701 10  0  0  9  0  0 2 0    1 2      GL      CP      DF      3 4      HH                      EJ             
 701 11  2 12  0 16 16 1 0    1 1      LN      GG      DF      4 4      HH                      CJ             
 701 12  1  4 11 14 14 2 0    1 2      LL      CG      DF      3 4      HH                      CJ             
 701 13  7 14  0 21 21 1 0    1 1      GL      CC      DF      3 4      HH                      CH             
 701 14  1  4 13  0  0 2 0    1 2      LL      CG      DF      3 4      HH                      CJ             
 701 15  0  0 17  0  0 2 0    1 2      IL      GG      FF      3 5      BG                      GL             
 701 16  2 12  0  0  0 2 0    2 2      JL      GP      DF      3 4      HH                      CL             
 701 17  3 15  0 18 18 2 0    1 2      GI      GP      DF      4 5      BG                      KL             
 701 18  3 15  0 19 19 2 0    1 2      GL      GP      DF      3 4      GG                      GK             
 701 19  3 15  0  0  0 2 0    2 2      IL      GG      FF      4 5      BH                      CL             
 701 20  5 10  0  0  0 2 0    2 2      LL      CG      DF      3 4      HH                      CJ             
 701 21  7 14  0  0  0 2 0    2 2      IL      GG      DF      3 4      HH                      CI             

2) names.dat


M1
M2
M3
M4
M5
M6

Output:

1) Recoded pedigree file


 701   1   0   0   3   0   0 1 1 0 1   0  0   0  0   0  0   0  0   0  0   0  0
 701   2   0   0  11   0   0 1 0 1 1   3  6   2  3   1  1   1  2   3  3   7  9
 701   3   1   4  17   5   5 1 0 1 1   1  4   2  3   1  2   2  2   2  3   1  8
 701   4   0   0   3   0   0 2 0 0 2   0  0   0  0   0  0   0  0   0  0   0  0
 701   5   1   4   9   6   6 1 0 1 1   4  4   1  2   1  2   1  2   3  3   1  7
 701   6   1   4   0   8   8 1 0 1 1   4  4   1  2   1  2   1  2   3  3   1  7
 701   7   0   0  13   0   0 1 0 1 1   0  0   0  0   0  0   0  0   0  0   0  0
 701   8   1   4   0  12  12 1 0 1 1   4  5   1  2   1  2   1  2   3  3   3  7
 701   9   5  10   0  20  20 1 0 1 1   1  4   1  3   1  1   1  2   3  3   2  7
 701  10   0   0   9   0   0 2 0 1 2   1  4   1  3   1  2   1  2   3  3   2  7
 701  11   2  12   0  16  16 1 0 1 1   4  6   2  2   1  2   2  2   3  3   1  7
 701  12   1   4  11  14  14 2 0 1 2   4  4   1  2   1  2   1  2   3  3   1  7
 701  13   7  14   0  21  21 1 0 1 1   1  4   1  1   1  2   1  2   3  3   1  5
 701  14   1   4  13   0   0 2 0 1 2   4  4   1  2   1  2   1  2   3  3   1  7
 701  15   0   0  17   0   0 2 0 1 2   2  4   2  2   2  2   1  3   1  2   4  9
 701  16   2  12   0   0   0 2 0 2 2   3  4   2  3   1  2   1  2   3  3   1  9
 701  17   3  15   0  18  18 2 0 1 2   1  2   2  3   1  2   2  3   1  2   8  9
 701  18   3  15   0  19  19 2 0 1 2   1  4   2  3   1  2   1  2   2  2   4  8
 701  19   3  15   0   0   0 2 0 2 2   2  4   2  2   2  2   2  3   1  3   1  9
 701  20   5  10   0   0   0 2 0 2 2   4  4   1  2   1  2   1  2   3  3   1  7
 701  21   7  14   0   0   0 2 0 2 2   2  4   2  2   1  2   1  2   3  3   1  6

2) key.dat


 Marker 1: M1 has 6 alleles
  Allele Code Frequency Count
  1	 G  0.16667	 6
  2	 I  0.11111	 4
  3	 J  0.05556	 2
  4	 L  0.58333	 21
  5	 M  0.02778	 1
  6	 N  0.05556	 2
 Marker 2: M2 has 3 alleles
  Allele Code Frequency Count
  1	 C  0.27778	 10
  2	 G  0.52778	 19
  3	 P  0.19444	 7
 Marker 3: M3 has 2 alleles
  Allele Code Frequency Count
  1	 D  0.50000	 18
  2	 F  0.50000	 18
 Marker 4: M4 has 3 alleles
  Allele Code Frequency Count
  1	 3  0.38889	 14
  2	 4  0.52778	 19
  3	 5  0.08333	 3
 Marker 5: M5 has 3 alleles
  Allele Code Frequency Count
  1	 B  0.08333	 3
  2	 G  0.13889	 5
  3	 H  0.77778	 28
 Marker 6: M6 has 9 alleles
  Allele Code Frequency Count
  1	 C  0.30556	 11
  2	 E  0.05556	 2
  3	 F  0.02778	 1
  4	 G  0.05556	 2
  5	 H  0.02778	 1
  6	 I  0.02778	 1
  7	 J  0.27778	 10
  8	 K  0.08333	 3
  9	 L  0.13889	 5

3) recode.dat


3 6 #M1
  0.16667  0.11111  0.05556  0.58333  0.02778  0.05556
3 3 #M2
  0.27778  0.52778  0.19444
3 2 #M3
  0.50000  0.50000
3 3 #M4
  0.38889  0.52778  0.08333
3 3 #M5
  0.08333  0.13889  0.77778
3 9 #M6
  0.30556  0.05556  0.02778  0.05556  0.02778  0.02778  0.27778  0.08333  0.13889