The Mega2R R package: tools for accessing and processing common genetic data formats in R

Mega2 has been enhanced to use a SQLite database as an intermediate data representation. Additionally, Mega2 now stores bialleleic genotype data in a highly compressed form, much like that of the GenABEL R facility and the PLINK binary format. Concurrently, the R community and Bioconductor community have developed a variety of genetic analysis programs complimentary to the programs available through Mega2. We have now made it easy to load SQLite3 Mega2 databases directly into R as data frames to use these R facilities. In addition, we have developed C++ functions for R to decompress needed subsets of the genotype data, on the fly, in a memory efficient manner. We have also created several more R functions that illustrate how to use the data frames as well as perform useful functions: these permit one to run the 'pedgene' R package to carry out gene-based association tests on family data using selected marker subsets, to output the 'mega2r' data as a VCF file and related files (for phenotype and family data), and to convert the data frames into 'GenABEL' R objects. The Mega2R package enhances GenABEL since it supports additional input data formats (such as PLINK, VCF and IMPUTE2) not currently supported by GenABEL.

The R package

The Mega2R package is available from The Comprehensive R Archive Network (CRAN):

Mega2R on CRAN.

To easily install within R, issue the R command


The Mega2R Tutorial

To learn to use the Mega2R package, please use this tutorial:

Mega2R Tutorial text (html): mega2rtutorial.html

The Mega2R poster

This gives an overview of Mega2R:

The Mega2R poster (PDF), which was presented at the 2017 American Society of Human Genetics meeting.


Mega2R uses SQLite databases produced by Mega2; documentation for Mega2 can be found here.