vignettes/intro.Rmd
intro.Rmd
library(catalogueR)
eQTL Catalogue includes a large number of standardized QTL datasets (110 datasets from 20 studies as of 7/4/2020). It actually contains more than just eQTL data. For each dataset, the following kinds of QTLs can be queried:
quant_method="ge"
(default) or quant_method="microarray"
, depending on the dataset. catalogueR will automatically select whichever option is available.quant_method="ge"
quant_method="tx"
quant_method="txrev"
You can view of a metadata table for all current datasets:
data("meta") createDT(meta)
You can seach through the metadata for datasets with certain keywords (case-insensitive substrings across multiple columns).
qtl_datasets <- eQTL_Catalogue.search_metadata(qtl_search=c("Alasoo_2018","monocyte")) print(qtl_datasets) #> [1] "Alasoo_2018.macrophage_naive" #> [2] "Alasoo_2018.macrophage_IFNg" #> [3] "Alasoo_2018.macrophage_Salmonella" #> [4] "Alasoo_2018.macrophage_IFNg+Salmonella" #> [5] "BLUEPRINT.monocyte" #> [6] "CEDAR.monocyte_CD14" #> [7] "Fairfax_2014.monocyte_naive" #> [8] "Fairfax_2014.monocyte_IFN24" #> [9] "Fairfax_2014.monocyte_LPS2" #> [10] "Fairfax_2014.monocyte_LPS24" #> [11] "Quach_2016.monocyte_naive" #> [12] "Quach_2016.monocyte_LPS" #> [13] "Quach_2016.monocyte_Pam3CSK4" #> [14] "Quach_2016.monocyte_R848" #> [15] "Quach_2016.monocyte_IAV" #> [16] "Schmiedel_2018.monocyte_CD16_naive" #> [17] "Schmiedel_2018.monocyte_naive"
Supply one or more paths to [GWAS] summary stats files (one per locus) and automatically download any eQTL data within that range. The files can be any of these formats, either gzip-compressed (.gz
) or uncompressed: .csv
, .tsv
, space-separated
The summary stats files must have the following column names (order doesn’t matter): - SNP
(rsid for each SNP) - CHR
(chromosome; with or without the “chr” prefix is fine) - POS
(basepair position) - … (optional extra columns)
sumstats_paths <- example_sumstats_paths() gwas.qtl_paths <- eQTL_Catalogue.query(sumstats_paths = sumstats_paths, qtl_search = c("myeloid","Alasoo_2018"), output_dir = "./catalogueR_queries", split_files = T, merge_with_gwas = T, force_new_subset = T, nThread=4) #> [1] "+ Optimizing multi-threading..." #> [1] "++ Multi-threading across QTL datasets." #> [1] "eQTL_Catalogue:: Querying 4 QTL datasets x 3 GWAS loci (12 total)" #> [1] "++ Returning list of split files paths." #> Time difference of 1.1 mins GWAS.QTL <- gather_files(file_paths = gwas.qtl_paths) #> [1] "+ Merging 3 files." #> [1] "+ Using 4 cores." #> [1] "+ Merged data.table: 68334 rows x 51 columns." # Interactive datatable of results ## WARNING: Don't use this function on large datatables, might cause freezing. createDT(head(GWAS.QTL))
You can also makes queries to eQTL Catalogue by manually specifying the coordinates of the region you want to extract, as well as the unique_id
of the QTL dataset (see data("meta")
for IDs).
GWAS.QTL_manual <- eQTL_Catalogue.fetch(unique_id="Alasoo_2018.macrophage_IFNg", nThread = 4, chrom = 8, bp_lower=21527069, bp_upper=23525543) #> [1] "eQTL_Catalogue:: 177356 SNPs returned in 18.6 seconds." #> [1] "++ Converting: Ensembl IDs ==> HGNC gene symbols" #> Warning: Unable to map 1 of 37 requested IDs. createDT(head(GWAS.QTL_manual))