Program for detecting marker typing incompatibilities in pedigree data.
Version: 1.00
Last Update: November 24, 1998
Author: Jeff O'Connell
Copyright: (c) Jeff O'Connell 1997 , University of Pittsburgh
Citation reference:
If you use PedCheck in any published work, please cite:
"PedCheck: A program for identifying genotype incompatibilities in linkage analysis," O'Connell JR, Weeks DE, Am J Hum Genet 63:259-266
Language: C, C++
Documentation: See the file WhatsNew for additional documentation
Executables
1. Unix executables for the Sun, Dec and Sgi are available at present. pedcheck_sol is for Solaris 2.5 and and pedcheck_sunOS is for SunOS 4.1.
2. Windows console application pedcheck_win.exe was compiled using the Microsoft C compiler.
3. Mac PPC console application pedcheck_MacPPC.bin was compiled using Code Warrior. The Mac version is compiled to use 1MB of space, so you might have to increase this using the Get Info box in the Finder if your problems are large.
If you have a different platform or the executables won't run on your operating system, contact the author. Remember to ftp the executables using binary mode.
Error detection algorithms
Level 1:
Uses the individual's genotypes as given in the pedigree to check for inconsistencies between parents and offspring. Thus Level 1 does checking on the nuclear family level. It will detect the following errors.
a.) A child and parent's alleles are incompatible.
b.) A person is half-typed. This is checked because current programs cannot handle this situation.
c.) More than 4 alleles in a sibship.
d.) More that 3 alleles in a sibship when there is a homozygous child.
e.) The allele is out of bounds, if any is specified.
f.) That in X-linked pedigrees, males are scored as homozygous.
Level 2:
Uses the Lange-Goradia algorithm to do genotype elimination. It reports errors for a nuclear family only if there was no Level 1 error for that family.
Level 2 is guaranteed to detect if there is an inconsistency. Thus, if there are NO Level 2 errors detected, then the pedigree is Mendelian consistent.
Level 3:
Determines the "critical genotypes". They are typed individuals who when set to "unknown" then remove the inconsistency in the pedigree.
Level 4:
Determines the alternative typings that a critical genotype can have, and then computes an odds ratio statistic to assist you in determining the most likely person to be in error.
NOTE:
Levels 3 and 4 will not run if there are Level 1 errors, since they were designed to help "harder" errors. So Level 1 errors need to be corrected first.
Examples:
- pedcheck (* will run Level 1 *)
- pedcheck -2(* will run Levels 1 and 2*)
- pedcheck -3(* will run Levels 2 and 3 if no Level 1 errors*)
Input Files
PedCheck will look for 'datafile.dat' and 'pedfile.dat' as default inputs. These are LINKAGE format files ( which means that makeped has been run on the pedigree file ). If either of these is not found, PedCheck will prompt you to input a file name.
Command line options are also available for inputing these files. One, none or both command line options may be used. Order of the options is irrelevant.
- -p <pedigree file>
- -d <locus file>
Examples:
- pedcheck -p myfile.ped -d myfile.dat
- pedcheck -p myfile.ped (will look for "datafile.dat")
- pedcheck -d myfile.dat -p myfile.ped
PRE-LINKAGE FORMAT
PedCheck can handle pre-makeped files, so the pedigree and individual names can be any string, not just numeric.
(a) WITH DATAFILE:
If you have a pre-makeped file with a datafile, then it is ASSUMED that the markers alleles are NUMBERED ALLELES, i.e., integers ranging from 0 to n, where n is the number of alleles indicated in the datafile.
To use pre-makeped files, include command line option '-m'
Example:
- pedcheck -m -p my_pre_file.ped
Running PedCheck without a datafile
PedCheck also has the option of running pedigrees (both pre-makeped and post-makeped) without specifying a datafile. This is useful for checking raw data before having to specify allele frequencies. It also allows the use of base-pair sizes for the allele names, so you can check the data before recoding. Also this option will allow you to check any subset of your markers
The datafile is replaced by a 'names' file. This file tells PedCheck which markers to check and which to skip. You need to specify a name to each marker you want to process and an X or x for each column you want PedCheck to ignore.
NOTE: If you use this option, you must skip the trait locus.
Examples:
(1) Your pedigree has 10 markers and you want to check markers 2 and 4 only.
<start file >
x CANDIDATE_GENE X marker4<end file >
(2) Trait locus with 1 liability class, 2 markers.
<start file >
x D12 CANDIDATE_GENE<end file >
(3) Trait locus with 3 liability classes, 2 markers.
<start file >
X x D12 WONDER_DRUG<end file >
To use a names file, include the command line option ' -n <file> ', where <file> is your names file.
Example:
pedcheck -n markers
Running PedCheck with X-linked data
If you are using a LINKAGE datafile, then there is a flag in there that tells PedCheck that it is X-linked data. If you are using a names file, you must include the command line option '-x'.
Example:
- pedcheck -n markers -x
Allele Frequencies in PedCheck
Levels 1, 2, 3 do not use allele frequencies. However, since Level 4 computes the likelihood of the pedigree, allele frequencies are required.
There are 3 options:
- (1) use user-specified allele frequencies in a LINKAGE datafile (default option when datafile is used)
- (2) use equifrequent allele frequencies (default option with the names file)
- (3) use estimated allele frequencies (requires flag "-e').
NOTE: Both options 2 and 3 are available even if you are using a datafile; the values in the datafile will be ignored.
Allele frequencies are estimated by counting the total number of occurrences in the data divided by the total number of alleles in the data.Thus it is the frequency of the allele in the data. No attempt is made to count only independent occurrences of the allele in a pedigree.
NOTE: A file with the allele estimates can be obtained by using the '-a' option together with the '-e' option at ANY level of checking. The output file name is 'allelefile'.
Examples:
- pedcheck -p myfile.ped -d myfile.dat -4 (* will use the allele frequencies in myfile.dat)
- pedcheck -p myfile.ped -d myfile.dat -4 -e (* will use estimated allele frequencies)
- pedcheck -p myfile.ped -n names.dat (* will use equal allele frequencies)
- pedcheck -p myfile.ped -d myfile.dat -e -a (* will print the estimated allele frequencies in "allelefile")
Output Options
All debugging information is printed to the screen and to a file named 'pedcheck.err'. This file will be overwritten each time pedcheck is run.
This name can be reset by using the option '-o'.
Examples:
- pedcheck (* output file will be pedcheck.err *)
- pedcheck -o marker.err (* output file will be marker.err *)
Help
Type 'pedcheck -h' to get a listing of all the options available.
Author's note:
This program is likely to change over the next few months. Future developments include an option for checking the trait marker (Level 2 only), doing higher-order critical genotypes (explained in the paper), error analysis for Mendelian consistent pedigrees and statistics on errors by markers and also pedigrees (suggested by Meg Gelder-Ehm). These statistics could be helpful in identifying non-paternity errors. Also more robust checking of input formats could be added.
If you have any suggestions on how the program could be improved or on adding additional features, the author would welcome your input. If you want to be on a mailing list to be informed of future releases, send jeff an e-mail.