Downloads a subset vcf of the 1KG database that matches your locus coordinates. Then uses ld to calculate LD on the fly.

LD_1KG(
  locus_dir,
  dat,
  LD_reference = "1KGphase1",
  superpopulation = NULL,
  samples = NULL,
  local_storage = NULL,
  leadSNP_LD_block = FALSE,
  force_new_vcf = FALSE,
  force_new_MAF = FALSE,
  fillNA = 0,
  stats = "R",
  verbose = TRUE
)

Arguments

locus_dir

Storage directory to use.

dat

GWAS summary statistics subset to query the LD panel with.

LD_reference

LD reference to use:

  • "1KGphase1" : 1000 Genomes Project Phase 1

  • "1KGphase3" : 1000 Genomes Project Phase 3

  • "UKB" : Pre-computed LD from a British European-decent subset of UK Biobank.

superpopulation

Superpopulation to subset LD panel by (used only if LD_reference is "1KGphase1" or "1KGphase3".)

samples

Sample names to subset the VCF by before computing LD.

local_storage

Storage folder for previously downloaded LD files. If LD_reference is "1KGphase1" or "1KGphase3", local_storage is where VCF files are stored. If LD_reference is "UKB", local_storage is where LD compressed numpy array (npz) files are stored. Set to NULL to download VCFs/LD npz from remote storage system.

leadSNP_LD_block

Only return SNPs within the same LD block as the lead SNP (the SNP with the smallest p-value).

fillNA

When pairwise LD (r) between two SNPs is NA, replace with 0.

verbose

Print messages.

Details

This approach is taken, because other API query tools have limitations with the window size being queried. This approach does not have this limitations, allowing you to fine-map loci more completely.

See also