library(catalogueR)

Tutorial

eQTL Catalogue includes a large number of standardized QTL datasets (110 datasets from 20 studies as of 7/4/2020). It actually contains more than just eQTL data. For each dataset, the following kinds of QTLs can be queried:

  • gene expression QTL: quant_method="ge" (default) or quant_method="microarray", depending on the dataset. catalogueR will automatically select whichever option is available.
  • exon expression QTL: under construction quant_method="ge"
  • transcript usage QTL under construction quant_method="tx"
  • promoter, splice junction and 3ʹ end usage QTL under construction quant_method="txrev"

You can view of a metadata table for all current datasets:

data("meta")
createDT(meta)

You can seach through the metadata for datasets with certain keywords (case-insensitive substrings across multiple columns).

qtl_datasets <- eQTL_Catalogue.search_metadata(qtl_search=c("Alasoo_2018","monocyte"))
print(qtl_datasets)
#>  [1] "Alasoo_2018.macrophage_naive"          
#>  [2] "Alasoo_2018.macrophage_IFNg"           
#>  [3] "Alasoo_2018.macrophage_Salmonella"     
#>  [4] "Alasoo_2018.macrophage_IFNg+Salmonella"
#>  [5] "BLUEPRINT.monocyte"                    
#>  [6] "CEDAR.monocyte_CD14"                   
#>  [7] "Fairfax_2014.monocyte_naive"           
#>  [8] "Fairfax_2014.monocyte_IFN24"           
#>  [9] "Fairfax_2014.monocyte_LPS2"            
#> [10] "Fairfax_2014.monocyte_LPS24"           
#> [11] "Quach_2016.monocyte_naive"             
#> [12] "Quach_2016.monocyte_LPS"               
#> [13] "Quach_2016.monocyte_Pam3CSK4"          
#> [14] "Quach_2016.monocyte_R848"              
#> [15] "Quach_2016.monocyte_IAV"               
#> [16] "Schmiedel_2018.monocyte_CD16_naive"    
#> [17] "Schmiedel_2018.monocyte_naive"

[Approach 1] Query with summary stats

Supply one or more paths to [GWAS] summary stats files (one per locus) and automatically download any eQTL data within that range. The files can be any of these formats, either gzip-compressed (.gz) or uncompressed: .csv, .tsv, space-separated


The summary stats files must have the following column names (order doesn’t matter): - SNP (rsid for each SNP) - CHR (chromosome; with or without the “chr” prefix is fine) - POS (basepair position) - … (optional extra columns)

sumstats_paths <- example_sumstats_paths()

gwas.qtl_paths <- eQTL_Catalogue.query(sumstats_paths = sumstats_paths,
                                       qtl_search = c("myeloid","Alasoo_2018"),
                                       output_dir = "./catalogueR_queries",
                                       split_files = T,
                                       merge_with_gwas = T,
                                       force_new_subset = T,
                                       nThread=4)
#> [1] "+ Optimizing multi-threading..."
#> [1] "++ Multi-threading across QTL datasets."
#> [1] "eQTL_Catalogue:: Querying 4 QTL datasets x 3 GWAS loci (12 total)"
#> [1] "++ Returning list of split files paths."
#> Time difference of 1.1 mins
GWAS.QTL <- gather_files(file_paths = gwas.qtl_paths)
#> [1] "+ Merging 3 files."
#> [1] "+ Using 4 cores."
#> [1] "+ Merged data.table: 68334 rows x 51 columns."
# Interactive datatable of results 
## WARNING: Don't use this function on large datatables, might cause freezing.
createDT(head(GWAS.QTL))

[Approach 2] Query with coordinates

You can also makes queries to eQTL Catalogue by manually specifying the coordinates of the region you want to extract, as well as the unique_id of the QTL dataset (see data("meta") for IDs).

GWAS.QTL_manual <- eQTL_Catalogue.fetch(unique_id="Alasoo_2018.macrophage_IFNg",
                                        nThread = 4,
                                        chrom = 8,
                                        bp_lower=21527069,
                                        bp_upper=23525543)
#> [1] "eQTL_Catalogue:: 177356 SNPs returned in 18.6 seconds."
#> [1] "++ Converting: Ensembl IDs ==> HGNC gene symbols"
#> Warning: Unable to map 1 of 37 requested IDs.
createDT(head(GWAS.QTL_manual))