Iterate queries to eQTL Catalogue

Uses coordinates from stored summary stats files (e.g. GWAS) to determine which regions to query from eQTL Catalogue.

eQTL_Catalogue.iterate_fetch(
  sumstats_paths,
  output_dir = "./catalogueR_queries",
  qtl_id,
  quant_method = "ge",
  infer_region = T,
  use_tabix = T,
  multithread_loci = T,
  multithread_tabix = F,
  nThread = 4,
  split_files = T,
  merge_with_gwas = F,
  force_new_subset = F,
  progress_bar = F,
  genome_build = "hg19",
  verbose = T
)

Arguments

sumstats_paths	A list of paths to any number of summary stats files whose coordinates you want to use to make queries to eQTL Catalogue. If you wish to add custom names to the loci, simply add these as the names of the path list (e.g. `c(BST1="<path>/<to>/<BST1_file>", LRRK2="<path>/<to>/<LRRK2_file>")`). Otherwise, loci will automatically named based on their min/max genomic coordinates. The minimum columns in these files required to make queries include: SNP RSID of each SNP. CHR Chromosome (can be in "chr12" or "12" format). POS Genomic position of each SNP. ... Optional extra columns.
output_dir	The folder you want the merged gwas/qtl results to be saved to (set `output_dir=F` if you don't want to save the results). If `split_files=F`, all query results will be merged into one and saved as <output_dir>/eQTL_Catalogue.tsv.gz. If `split_files=T`, all query results will instead be split into smaller files and stored in <output_dir>/.
quant_method	eQTL Catalogue actually contains more than just eQTL data. For each dataset, the following kinds of QTLs can be queried: gene expression QTL `quant_method="ge"` (default) or `quant_method="microarray"`, depending on the dataset. catalogueR will automatically select whichever option is available. exon expression QTL under construction `quant_method="ex"` transcript usage QTL under construction `quant_method="tx"` promoter, splice junction and 3' end usage QTL under construction `quant_method="txrev"`
use_tabix	Tabix is about ~17x faster (default: =T) than the REST API (=F).
nThread	The number of CPU cores you want to use to speed up your queries through parallelization.
split_files	Save the results as one file per QTL dataset (with all loci within each file). If this is set to `=T`, then this function will return the list of paths where these files were saved. A helper function is provided to import and merge them back together in R. If this is set to `=F`, then this function will instead return one big merged data.table containing results from all QTL datasets and all loci. `=F` is not recommended when you have many large loci and/or many QTL datasets, because you can only fit so much data into memory.
merge_with_gwas	Whether you want to merge your QTL query results with your GWAS data (convenient, but takes up more storage).
force_new_subset	By default, catalogueR will use any pre-existing files that match your query. Set `force_new_subset=T` to override this and force a new query.
progress_bar	Show progress bar during parallelization across loci. WARNING!: Progress bar (via `pbmclapply`) only works on Linux/Unix systems (e.g. mac) and NOT on Windows.
genome_build	The genome build of your query coordinates (e.g. `gwas_data`). If your coordinates are in hg19, catalogueR will automatically lift them over to hg38 (as this is the build that eQTL Catalogue uses).
verbose	Show more (`=T`) or fewer (`=F`) messages.

Examples

sumstats_paths <- example_sumstats_paths()
qtl_id  <- eQTL_Catalogue.list_datasets()$unique_id[1]
GWAS.QTL <- eQTL_Catalogue.iterate_fetch(sumstats_paths=sumstats_paths, qtl_id=qtl_id, force_new_subset=T, multithread_loci=T, nThread=1, split_files=F, progress_bar=F)
#> [1] "++ Extracting locus name from `sumstats_paths` names."
#> _+_+_+_+_+_+_+_+_--- Locus: BST1 ---_+_+_+_+_+_+_+_+_
#> Start at 2021-02-20 00:18:25
#> 
#> 
#> End at 2021-02-20 00:18:29
#> Runtime in total is: 4 secs
#> [1] "++ Inferring coordinates from gwas_data"
#> Warning: error in running command
#> Warning: File '/var/folders/zq/h7mtybc533b1qzkys_ttgpth0000gn/T//RtmpIjYyRa/file1802243838924' has size 0. Returning a NULL data.table.
#> [1] "Data dimensions: 0 x 0"
#> [1] "Data dimensions: 0 x 0"
#> [1] "++ Adding `Locus.GWAS` column."
#> [1] "Data dimensions: 1 x 1"
#> [1] "++ Extracting locus name from `sumstats_paths` names."
#> _+_+_+_+_+_+_+_+_--- Locus: LRRK2 ---_+_+_+_+_+_+_+_+_
#> Start at 2021-02-20 00:18:30
#> 
#> 
#> End at 2021-02-20 00:18:30
#> Runtime in total is: 0 secs
#> [1] "++ Inferring coordinates from gwas_data"
#> Warning: error in running command
#> Warning: File '/var/folders/zq/h7mtybc533b1qzkys_ttgpth0000gn/T//RtmpIjYyRa/file18022772c2faf' has size 0. Returning a NULL data.table.
#> [1] "Data dimensions: 0 x 0"
#> [1] "Data dimensions: 0 x 0"
#> [1] "++ Adding `Locus.GWAS` column."
#> [1] "Data dimensions: 1 x 1"
#> [1] "++ Extracting locus name from `sumstats_paths` names."
#> _+_+_+_+_+_+_+_+_--- Locus: MEX3C ---_+_+_+_+_+_+_+_+_
#> Start at 2021-02-20 00:18:31
#> 
#> 
#> End at 2021-02-20 00:18:32
#> Runtime in total is: 1 secs
#> [1] "++ Inferring coordinates from gwas_data"
#> Warning: error in running command
#> Warning: File '/var/folders/zq/h7mtybc533b1qzkys_ttgpth0000gn/T//RtmpIjYyRa/file18022175252ea' has size 0. Returning a NULL data.table.
#> [1] "Data dimensions: 0 x 0"
#> [1] "Data dimensions: 0 x 0"
#> [1] "++ Adding `Locus.GWAS` column."
#> [1] "Data dimensions: 1 x 1"
#> [1] "Data dimensions:  x "
#> [1] "+ Returning merged data.table of query results."
#> [1] " x "

Arguments

See also

Examples