This folder (XtraSNPlocs.Hsapiens.dbSNP144.GRCh38/inst/tools) contains the
tools used for making the .rda files contained in this package from the
dbSNP dump files.

dbSNP Home Page:

  http://www.ncbi.nlm.nih.gov/snp/

Here is how these .rda files were made:

  1. Download the ds_flat_ch*.flat.gz files for chromosomes 1-22, X, Y,
     and MT from:

       ftp://ftp.ncbi.nih.gov/snp/organisms/human_9606/ASN1_flat

     You can use the download_ds_flat.sh script located in this folder
     for this.

  2. Uncompress the downloaded files.
     These uncompressed files are the "source files".
     NB: The ASN.1 flatfile format (and many other formats used on
     the snp section of the FTP site) is described here:

       ftp://ftp.ncbi.nih.gov/snp/00readme.txt

  3. Check the source files with for example

       ./prechecking.sh path/to/ds_flat_ch16.flat

     and pay attention to the output.
     Note that the final nb of "extra" SNPs per chromosome will be less than
     the nb of records NOT tagged with "snp" because of additional filtering
     during step 5.

  4. Compile filter2_ds_flat.c with:

       gcc -Wall filter2_ds_flat.c -o filter2_ds_flat

  5. Adjust settings in make_rdas.sh and run it. This script will extract and
     curate the "extra" SNPs from the flat files (see man/package.Rd for how
     the SNPs are filtered), and dump them into .rda files (those files will
     be created in the current folder).
     This step took about 40 minutes on rhino04 (64-bit Ubuntu 14.04 with 16
     cores and 384GB of RAM) and resulted in the extraction of 12,298,599
     "extra" SNPs in total.

  6. Make sure the .rda files generated in 5. are in the inst/extdata/ folder
     of the XtraSNPlocs.Hsapiens.dbSNP144.GRCh38 package (move them here if
     necessary). Install the package (this will install the .rda files) and
     test it.

