Multipoint analyses using Markov chain Monte Carlo and simulated annealing

Version: 2.60

Last Updated: December 1999

Author: Eric Sobel

Copyright (c) 1995 - 1999 Eric Sobel

Collaborators: Kenneth Lange, Jeffrey R. O'Connell, and Daniel E. Weeks

Download from:

Citation reference:

Sobel E and Lange K (1996) "Descent graphs in pedigree analysis: applications to haplotyping, location scores, and marker sharing statistics " Am J Hum Genet 58:1323-1337.

If you publish results generated by SimWalk2, then please cite this reference.


ANSI standard Fortran 77


SimWalk2 is a computer application for haplotype, location score, identity by descent, and non-parametric statistical analyses on any size of pedigree. SimWalk2 performs these analyses using Markov chain Monte Carlo and simulated annealing algorithms.


Extended pedigrees have enormous power for detection of linkage to disease loci, however, such large pedigrees are also very difficult to analyze. The problem is the astronomical number of underlying configurations which must be considered to obtain exact results. SimWalk2 is able to analyze large pedigrees because it considers the underlying configurations in proportion to their likelihood. Thus a configuration which is theoretically possible but highly unlikely (probably due to the large number of recombinations that configuration would require) will often not be considered by SimWalk2. Hence the results obtained from SimWalk2 are estimates and not exact. However, for pedigrees small enough that exact results could be obtained, SimWalk2's estimates were found to be in excellent agreement with the exact results. Of course, if one can obtain in a reasonable length of time exact results on the same analysis from another program, that would be preferred.


SimWalk2 uses as input: the pedigree and locus data, a marker map, and, optionally, reduced penetrance values for a trait locus and other run-specifying parameters. SimWalk2 has been used successfully on data with over 1000 individuals per pedigree and over 30 markers per chromosome. The run time for SimWalk2 is linear in the number of individuals and the number of markers.


SimWalk2 uses a Markov chain Monte Carlo (MCMC) algorithm to traverse the space of legal genetic descent graphs, or inheritance vectors, for each pedigree. The initial legal genetic descent state is found using an iterative genotype elimination technique and then converted to a descent graph. Simulated annealing is then performed to search for the single most likely descent graph. The MCMC random walk proceeds from there to sample the possible underlying configurations in proportion to their likelihood. A sample average is then used to give estimated results for the original pedigree. For a detailed description of the methods used, please see the first paper in the REFERENCES section below.

Using this MCMC 'engine' SimWalk2 is capable of many types of analyses. The following is a brief overview of the analysis options. Within each SimWalk2 run, the type of analysis performed is set in batch item #1 (see the INPUT FILES section below).


Haplotype analysis estimates the most likely set of fully typed maternal and paternal marker haplotypes at each individual in the pedigree. This analysis uses simulated annealing to search the space of legal genetic descent graphs for the highest energy. Here the energy of a descent graph is set equal to the likelihood of the most likely genetic descent state consistent with that graph. This provides an estimate for the genetic descent state with the largest likelihood, i.e., the best haplotype vector for the pedigree, which is the output. The conserved region of haplotypes of unrelated affecteds can exhibit a smaller trait localization interval, with flanking ancient recombination events, than standard linkage analysis.


Location scores indicate the relative likelihood of several positions, among the marker loci, for the trait locus given the pedigree data and the marker map. Reduced penetrance values may be specified for various liability classes. The location scores are directly comparable to multipoint LOD scores and are presented in log10 units. In summary, during this location score analysis, using the estimate for the most likely genetic descent graph as the initial position, a random walk is performed on the space of descent graphs using the Metropolis acceptance criterion. By sampling from this random walk, a number of completely typed representative pedigrees is obtained, proportional to their true likelihood. These pedigrees are then used to estimate the location score curve for the original pedigree. [If one is compiling SimWalk2 using the included source code files, then to enable the location score option one needs the general pedigree analysis computer package MENDEL version 3.35 or later. For instructions on obtaining the latest version of the MENDEL package, please see the ADDITIONAL RESOURCES section below.


Non-parametric analysis, also known as marker-allele sharing statistics or cluster analysis, is model-free and based on identity-by-descent (IBD), and thus is quite robust. If a marker is linked to a disease locus, one expects to see a clustering among affecteds of a few marker genes descended from the pedigree founders. SimWalk2 reports four statistics and their empirical p-values which measure the degree of clustering and its significance. For a detailed description of these four statistics, please see the first paper in the REFERENCES section below. Statistic (A) is apt to be the most powerful for a recessive trait; statistic (B) for a dominant trait. Statistics (C) and (D) are more generic, simply indicating the degree of clustering. SimWalk2 increases the power of such clustering statistics by using the information in the unaffecteds as well as the affecteds to sample all the IBD configurations proportional to their likelihood.


IBD analysis estimates the probabilities that pairs of individuals share marker alleles identical by descent, i.e., inherited from a common ancestor within the pedigree. SimWalk2 can report either the standard 0, 1, and 2 allele sharing probabilities or the more specific condensed identity state probabilities, which are useful for consanguinious pedigrees. This multipoint analysis is reported for all pairs within a user specified subset of the individuals.


The sampling option for each input pedigree results in a user-specified number of simulated pedigrees each fully typed at the marker loci. These simulated pedigrees are sampled in proportion to their likelihood conditioned on all input marker data. These simulated pedigrees can be written out in either MENDEL or (pre-makeped) LINKAGE pedigree format.


The setup option performs no likelihood-based analysis on the data. This option merely checks that the data files are consistent and that the pedigrees have no incompatibilities. This option also reports the minimum data constraints that are required for the data (see the COMPILING INSTRUCTIONS section below).

Executable versions of this program are available for common platforms at the distribution site ( The executables posted there are capable of all the analysis options. The ANSI standard Fortran 77 source code is also available there. When compiling the SimWalk2 files, to enable the location score option one needs the general pedigree analysis computer package MENDEL version 3.35 or later. To obtain the MENDEL package, please see the ADDITIONAL RESOURCES section below. If one does NOT have the MENDEL package, then to create the SimWalk2 executable simply compile together the two files simwalk2.f and nomendel.f . If one does have the MENDEL package, then to create the SimWalk2 executable simply compile together the two files simwalk2.f and mendel.f . At least the standard optimization level, usually invoked with the flag -O, is highly recommended. (The additional optimization invoked by the common flag -fast has not caused any problems, as far as we know.)

For Unix a simple shell script is provided, called make.simwalk2, to automatically compile SimWalk2 with mendel.f if available and otherwise with nomendel.f . Invoke this script in the directory containing the source code files simply by typing its filename.

When creating a Mac OS executable, optionally, for ease of use, locate the string 'MAC!' in the file simwalk2.f and uncomment the indicated lines.

Detailed documentation