echolocatoR
: Automated statistical and functional fine-mapping
with extensive access to genome-wide datasets.
echolocatoR
is part of the echoverse, a suite of R packages designed to facilitate different steps in genetic fine-mapping.
echolocatoR
calls each of these other packages (i.e. “modules”) internally to create a unified pipeline. However, you can also use each module independently to create your own custom workflows.
Made with
echodeps
, yet another echoverse module. See here for the interactive version with package descriptions and links to each GitHub repo.
If you use echolocatoR
, or any of the echoverse modules, please cite:
Brian M Schilder, Jack Humphrey, Towfique Raj (2021) echolocatoR: an automated end-to-end statistical and functional genomic fine-mapping pipeline, Bioinformatics; btab658, https://doi.org/10.1093/bioinformatics/btab658
if(!require("remotes")) install.packages("remotes")
remotes::install_github("RajLabMSSM/echolocatoR")
library(echolocatoR)
echolocatoR
now relies on many subpackages that rely on one another, sometimes errors can occur when R tries to update one R package before updating its echoverse dependencies (and thus is unable to find new functions). As echoverse stabilizes over time, this should happen less frequently. However, in the meantime the solution is to simply rerun remotes::install_github("RajLabMSSM/echolocatoR")
until all subpackages are fully updates.susieR
: Sometimes an older version of susieR
is installed from CRAN (e.g. 0.11.92), but echofinemap
requires version >= 0.12.0. To get around this, you can install susieR
directly from GitHub: devtools::install_github("stephenslab/susieR")
XML
(which some echoverse subpackages depend on) has some additional system dependencies that must be installed beforehand. If XML
does not install automatically, try installing lbxml
on your system using brew install libxml2
(MacOS), sudo apt-get install libxml2
(Linux) or conda install r-xml
if you are running echolocatoR
from within a conda environment.echolocatoR
now has its own dedicated Docker/Singularity container! This greatly reduces issues related to system dependency conflicts and provides a containerized interface for Rstudio through your web browser. See here for installation instructions.
echoverse <- c('echolocatoR','echodata','echotabix',
'echoannot','echoconda','echoLD',
'echoplot','catalogueR','downloadR',
'echofinemap','echodeps', # under construction
'echogithub')
toc <- echogithub::github_pages_vignettes(owner = "RajLabMSSM",
repo = echoverse,
as_toc = TRUE,
verbose = FALSE)
Fine-mapping methods are a powerful means of identifying causal variants underlying a given phenotype, but are underutilized due to the technical challenges of implementation. echolocatoR
is an R package that automates end-to-end genomics fine-mapping, annotation, and plotting in order to identify the most probable causal variants associated with a given phenotype.
It requires minimal input from users (a GWAS or QTL summary statistics file), and includes a suite of statistical and functional fine-mapping tools. It also includes extensive access to datasets (linkage disequilibrium panels, epigenomic and genome-wide annotations, QTL).
The elimination of data gathering and preprocessing steps enables rapid fine-mapping of many loci in any phenotype, complete with locus-specific publication-ready figure generation. All results are merged into a single per-SNP summary file for additional downstream analysis and results sharing. Therefore echolocatoR
drastically reduces the barriers to identifying causal variants by making the entire fine-mapping pipeline rapid, robust and scalable.
echolocatoR
in the literature, please see:
- E Navarro, E Udine, K de Paiva Lopes, M Parks, G Riboldi, BM Schilder…T Raj (2020) Dysregulation of mitochondrial and proteo-lysosomal genes in Parkinson’s disease myeloid cells. Nature Genetics. https://doi.org/10.1101/2020.07.20.212407
- BM Schilder, T Raj (2021) Fine-Mapping of Parkinson’s Disease Susceptibility Loci Identifies Putative Causal Variants. Human Molecular Genetics, ddab294, https://doi.org/10.1093/hmg/ddab294
- K de Paiva Lopes, G JL Snijders, J Humphrey, A Allan, M Sneeboer, E Navarro, BM Schilder…T Raj (2022) Genetic analysis of the human microglial transcriptome across brain regions, aging and disease pathologies. Nature Genetics, https://doi.org/10.1038/s41588-021-00976-y
echolocatoR
v1.0 vs. v2.0
There have been a series of major updates between echolocatoR
v1.0 and v2.0. Here are some of the most notable ones (see Details):
echolocatoR
has been broken into separate subpackages, making it much easier to edit/debug each step of the full finemap_loci
pipeline, and improving robustness throughout. It also provides greater flexibility for users to construct their own custom pipelines from these modules.GITHUB_TOKEN
: GitHub now requires users to create Personal Authentication Tokens (PAT) to avoid download limits. This is essential for installing echolocatoR
as many resources from GitHub need to be downloaded. See here for further instructions. = echodata::construct_colmap()
: Previously, users were required to input key column name mappings as separate arguments to echolocatoR::finemap_loci
. This functionality has been deprecated and replaced with a single argument, colmap=
. This allows users to save the construct_colmap()
output as a single variable and reuse it later without having to write out each mapping argument again (and helps reduce an already crowded list of arguments).MungeSumstats
: finemap_loci
now accepts the output of MungeSumstats::format_sumstats
/import_sumstats
as-is (without requiring colmap=
, so long as munged=TRUE
). Standardizing your GWAS/QTL summary stats this way greatly reduces (or eliminates) the time taken to do manual formatting.echolocatoR::finemap_loci
arguments: Several arguments have been deprecated or had their names changed to be harmonized across all the subpackages and use a unified naming convention. See ?echolocatoR::finemap_loci
for details.echoconda
: The echoverse subpackage echoconda
now handles all conda environment creation/use internally and automatically, without the need for users to create the conda environment themselves as a separate step. Also, the default conda env echoR
has been replaced by echoR_mini
, which reduces the number of dependencies to just the bare minimum (thus greatly speeding up build time and reducing potential version conflicts).FINEMAP
: More outputs from the tool FINEMAP
are now recorded in the echolocatoR
results (see ?echofinemap::FINEMAP
or this Issue for details). Also, a common dependency conflict between FINEMAP
>=1.4 and MacOS has been resolved (see this Issue for details.echodata
: All example data and data transformation functions have been moved to the echoverse subpackage echodata
.LD_reference=
: In addition to the UKB, 1KGphase1/3 LD reference panels, finemap_loci()
can now take custom LD panels by supplying finemap_loci(LD_reference=)
with a list of paths to VCF files (.vcf / vcf.gz / vcf.bgz) or pre-computed LD matrices with RSIDs as the row/col names (.rda / .rds / .csv / .tsv. / .txt / .csv.gz / tsv.gz / txt.gz).FINEMAP
fixed: There were a number of issues with FINEMAP
due to differing output formats across different versions, system dependency conflicts, and the fact that it can produce multiple Credible Sets. All of these have been fixed and the latest version of FINEMAP
can be run on all OS platforms.finemap_loci()
I use a tryCatch()
when iterating across loci so that if one locus fails, the rest can continue. However this prevents using traceback feature in R, making debugging hard. Thus I now enabled debugging mode via a new argument: use_tryCatch=FALSE
.By default, echolocatoR::finemap_loci()
returns a nested list containing grouped by locus names (e.g. $BST1
, $MEX3C
). The results of each locus contain the following elements:
finemap_dat
: Fine-mapping results from all selected methods merged with the original summary statistics (i.e. Multi-finemap results).locus_plot
: A nested list containing one or more zoomed views of locus plots.LD_matrix
: The post-processed LD matrix used for fine-mapping.LD_plot
: An LD plot (if used).locus_dir
: Locus directory results are saved in.arguments
: A record of the arguments supplied to finemap_loci
.In addition, the following object summarizes the results from the locus-specific elements:
- merged_dat
: A merged data.table
with all fine-mapping results from all loci.
The main output of echolocatoR
are the multi-finemap files (for example, echodata::BST1
). They are stored in the locus-specific Multi-finemap subfolders.
SNP
,CHR
,POS
,Effect
,StdErr
. See ?finemap_loci()
for descriptions of each.NA
(or 0
for the purposes of plotting).N
fine-mapping tool(s), i.e. Support>1
(default: N=1
).mean.PP>0.95
) then mean.CS
is 1, else 0. This tends to be a very stringent threshold as it requires a high degree of agreement between fine-mapping tools.Fine-mapping functions are now implemented via echofinemap
:
echolocatoR
will automatically check whether you have the necessary columns to run each tool you selected in echolocatoR::finemap_loci(finemap_methods=...)
. It will remove any tools that for which there are missing necessary columns, and produces a message letting you know which columns are missing.MAF
,N
,t-stat
) will be automatically inferred if missing.?echodata::construct_colmap()
for descriptions of these columns.SNP
,CHR
,POS
,Effect
,StdErr
fm_methods <- echofinemap::required_cols(add_versions = FALSE,
embed_links = TRUE,
verbose = FALSE)
## Registered S3 method overwritten by 'GGally':
## method from
## +.gg ggplot2
knitr::kable(x = fm_methods)
method | required | suggested | source | citation |
---|---|---|---|---|
ABF | SNP, CHR…. | source | cite | |
COJO_conditional | SNP, CHR…. | Freq, P, N | source | cite |
COJO_joint | SNP, CHR…. | Freq, P, N | source | cite |
COJO_stepwise | SNP, CHR…. | Freq, P, N | source | cite |
FINEMAP | SNP, CHR…. | A1, A2, …. | source | cite |
PAINTOR | SNP, CHR…. | MAF | source | cite |
POLYFUN_FINEMAP | SNP, CHR…. | MAF, N | source | cite |
POLYFUN_SUSIE | SNP, CHR…. | MAF, N | source | cite |
SUSIE | SNP, CHR…. | N | source | cite |
Datasets are now stored/retrieved via the following echoverse subpackages:
- echodata
: Pre-computed fine-mapping results. Also handles the semi-automated standardization of summary statistics.
- echoannot
: Annotates GWAS/QTL summary statistics using epigenomics, pre-compiled annotation matrices, and machine learning model predictions of variant-specific functional impacts.
- catalogueR
: Large compendium of fully standardized e/s/t-QTL summary statistics.
For more detailed information about each dataset, use ?
:
### Examples ###
library(echoannot)
?NOTT_2019.interactome # epigenomic annotations
library(echodata)
?BST1 # fine-mapping results
MungeSumstats
:
MungeSumstats
, specifically the functions find_sumstats
and import_sumstats
.catalogueR
: QTLs
catalogueR::eQTL_Catalogue.query()
catalogueR
R package.echodata
: fine-mapping results
echodata::portal_query()
.echoannot
: Epigenomic & genome-wide annotations
echoannot::NOTT2019_*()
echoannot::CORCES2020_*()
FitHiChIP
) from postmortem adult human brain tissue.echoannot::XGR_download_and_standardize()
echoannot::ROADMAP_query()
echoannot::annotate_snps()
echoannot::annotate_snps()
Annotation enrichment functions are now implemented via echoannot
:
echoannot::XGR_enrichment()
echoannot::MOTIFBREAKR()
echoannot::test_enrichment()
GRangesList
objects.LD reference panels are now queried/processed by echoLD
, specifically the function get_LD()
:
Plotting functions are now implemented via:
- echoplot
: Multi-track locus plots with GWAS, fine-mapping results, and functional annotations (plot_locus()
). Can also plot multi-GWAS/QTL and multi-ancestry results (plot_locus_multi()
).
- echoannot
: Study-level summary plots showing aggregted info across many loci at once (super_summary_plot()
).
- echoLD
: Plot an LD matrix using one of several differnt plotting methods (plot_LD()
).
All queries of tabix
-indexed files (for rapid data subset extraction) are implemented via echotabix
.
echotabix::convert_and_query()
detects whether the GWAS summary statistics file you provided is already tabix
-indexed, and it not, automatically performs all steps necessary to convert it (sorting, bgzip
-compression, indexing) across a wide variety of scenarios.echotabix::query()
contains many different methods for making tabix queries (e.g. Rtracklayer
,echoconda
,VariantAnnotation
,seqminer
), each of which fail in certain circumstances. To avoid this, query()
automatically selects the method that will work for the particular file being queried and your machine’s particular versions of R/Bioconductor/OS, taking the guesswork and troubleshooting out of tabix
queries.Single- and multi-threaded downloads are now implemented via downloadR
.
axel
, and is particularly useful for speeding up downloads of large files.axel
is installed via the official echoverse conda environment: “echoR_mini”. This environment is automatically created by the function echoconda::yaml_to_env()
when needed.Brian M. Schilder, Bioinformatician II
Raj Lab
Department of Neuroscience, Icahn School of Medicine at Mount Sinai
utils::sessionInfo()
## R Under development (unstable) (2023-01-11 r83598)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 22.04.1 LTS
##
## Matrix products: default
## BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
## LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.20.so; LAPACK version 3.10.0
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## time zone: UTC
## tzcode source: system (glibc)
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] rmarkdown_2.20
##
## loaded via a namespace (and not attached):
## [1] ProtGenerics_1.31.0 fs_1.5.2
## [3] matrixStats_0.63.0 bitops_1.0-7
## [5] httr_1.4.4 RColorBrewer_1.1-3
## [7] gh_1.3.1 Rgraphviz_2.43.0
## [9] tools_4.3.0 backports_1.4.1
## [11] utf8_1.2.2 R6_2.5.1
## [13] DT_0.27 lazyeval_0.2.2
## [15] prettyunits_1.1.1 GGally_2.1.2
## [17] gridExtra_2.3 cli_3.6.0
## [19] Biobase_2.59.0 ggbio_1.47.0
## [21] mvtnorm_1.1-3 readr_2.1.3
## [23] proxy_0.4-27 mixsqp_0.3-48
## [25] Rsamtools_2.15.1 yulab.utils_0.0.6
## [27] foreign_0.8-84 R.utils_2.12.2
## [29] echogithub_0.99.1 dichromat_2.0-0.1
## [31] BSgenome_1.67.3 readxl_1.4.1
## [33] susieR_0.12.27 rstudioapi_0.14
## [35] RSQLite_2.2.20 httpcode_0.3.0
## [37] badger_0.2.2 generics_0.1.3
## [39] BiocIO_1.9.1 echoconda_0.99.9
## [41] dplyr_1.0.10 zip_2.2.2
## [43] Matrix_1.5-3 interp_1.1-3
## [45] fansi_1.0.3 DescTools_0.99.47
## [47] S4Vectors_0.37.3 R.methodsS3_1.8.2
## [49] lifecycle_1.0.3 yaml_2.3.6
## [51] SummarizedExperiment_1.29.1 BiocFileCache_2.7.1
## [53] grid_4.3.0 blob_1.2.3
## [55] crayon_1.5.2 dir.expiry_1.7.0
## [57] lattice_0.20-45 GenomicFeatures_1.51.2
## [59] KEGGREST_1.39.0 pillar_1.8.1
## [61] knitr_1.41 GenomicRanges_1.51.4
## [63] rjson_0.2.21 osfr_0.2.9
## [65] boot_1.3-28.1 gld_2.6.6
## [67] codetools_0.2-18 glue_1.6.2
## [69] data.table_1.14.6 coloc_5.1.0.1
## [71] vctrs_0.5.1 png_0.1-8
## [73] XGR_1.1.8 testthat_3.1.6
## [75] cellranger_1.1.0 gtable_0.3.1
## [77] assertthat_0.2.1 cachem_1.0.6
## [79] dnet_1.1.7 xfun_0.36
## [81] openxlsx_4.2.5.1 survival_3.5-0
## [83] dlstats_0.1.6 rvcheck_0.2.1
## [85] ellipsis_0.3.2 nlme_3.1-161
## [87] bit64_4.0.5 progress_1.2.2
## [89] filelock_1.0.2 GenomeInfoDb_1.35.12
## [91] rprojroot_2.0.3 irlba_2.3.5.1
## [93] rpart_4.1.19 colorspace_2.0-3
## [95] BiocGenerics_0.45.0 DBI_1.1.3
## [97] Hmisc_4.7-2 nnet_7.3-18
## [99] Exact_3.2 tidyselect_1.2.0
## [101] bit_4.0.5 compiler_4.3.0
## [103] curl_5.0.0 graph_1.77.1
## [105] htmlTable_2.4.1 expm_0.999-7
## [107] basilisk.utils_1.11.1 xml2_1.3.3
## [109] desc_1.4.2 DelayedArray_0.25.0
## [111] rtracklayer_1.59.1 checkmate_2.1.0
## [113] scales_1.2.1 hexbin_1.28.2
## [115] RBGL_1.75.0 echoLD_0.99.9
## [117] RCircos_1.2.2 rappdirs_0.3.3
## [119] stringr_1.5.0 supraHex_1.37.0
## [121] digest_0.6.31 piggyback_0.1.4
## [123] basilisk_1.11.2 XVector_0.39.0
## [125] htmltools_0.5.4 pkgconfig_2.0.3
## [127] jpeg_0.1-10 base64enc_0.1-3
## [129] MatrixGenerics_1.11.0 echodata_0.99.16
## [131] highr_0.10 ensembldb_2.23.1
## [133] dbplyr_2.3.0 fastmap_1.1.0
## [135] rlang_1.0.6 htmlwidgets_1.6.1
## [137] echofinemap_0.99.5 jsonlite_1.8.4
## [139] BiocParallel_1.33.9 R.oo_1.25.0
## [141] VariantAnnotation_1.45.0 RCurl_1.98-1.9
## [143] magrittr_2.0.3 Formula_1.2-4
## [145] GenomeInfoDbData_1.2.9 ggnetwork_0.5.10
## [147] patchwork_1.1.2 munsell_0.5.0
## [149] Rcpp_1.0.9 ape_5.6-2
## [151] viridis_0.6.2 reticulate_1.27
## [153] stringi_1.7.12 rootSolve_1.8.2.3
## [155] brio_1.1.3 zlibbioc_1.45.0
## [157] MASS_7.3-58.1 plyr_1.8.8
## [159] parallel_4.3.0 ggrepel_0.9.2
## [161] snpStats_1.49.0 lmom_2.9
## [163] deldir_1.0-6 echoannot_0.99.10
## [165] Biostrings_2.67.0 splines_4.3.0
## [167] hms_1.1.2 igraph_1.3.5
## [169] rworkflows_0.99.5 reshape2_1.4.4
## [171] biomaRt_2.55.0 stats4_4.3.0
## [173] crul_1.3 XML_3.99-0.13
## [175] evaluate_0.20 biovizBase_1.47.0
## [177] latticeExtra_0.6-30 BiocManager_1.30.19
## [179] tzdb_0.3.0 tidyr_1.2.1
## [181] purrr_1.0.1 reshape_0.8.9
## [183] ggplot2_3.4.0 echotabix_0.99.9
## [185] AnnotationFilter_1.23.0 restfulr_0.0.15
## [187] e1071_1.7-12 gitcreds_0.1.2
## [189] downloadR_0.99.6 viridisLite_0.4.1
## [191] class_7.3-20.1 OrganismDbi_1.41.0
## [193] tibble_3.1.8 memoise_2.0.1
## [195] AnnotationDbi_1.61.0 GenomicAlignments_1.35.0
## [197] IRanges_2.33.0 cluster_2.1.4
## [199] here_1.0.1