Get started
Author: Brian M. Schilder
Author: Brian M. Schilder
Updated: Mar-16-2026
Source: Updated: Mar-16-2026
vignettes/echolocatoR.Rmd
echolocatoR.Rmd##
## ββ π¦ π¦ π¦ e c h o l o c a t o R π¦ π¦ π¦ βββββββββββββββββββββββββββββββββ
##
## ββ v2.0.5 ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
##
## ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
## β β β ‘β£β£β β β ‘β£β£β β β ’β£β‘ β β β ’β£β‘ β β β ’β£β‘ β β β ’β£β‘ β β β ’β£β‘ β β β ’β£β‘
## β β’β‘β β£β β’β‘β β£β β’β‘β β£β β’β‘β β£β β’β‘β β£β β’β‘β β£β β’β‘β β£β β’β‘β ‘β£
## β β‘β‘β’β’β β‘β‘β’β’β β‘β‘β’β’β β‘β‘β’β’β ‘β β‘β €β’β ‘β β‘β €β’β ‘β β‘β‘ β’β’β β‘β‘β’
## β β‘β‘β’β’β β‘β‘β’β’β β‘β‘β’β’β β‘β‘β’β’β ‘β β‘β €β’β ‘β β‘β €β’β ‘β β‘β‘ β’β’β β‘β‘β’
## β β’β‘β β£β β’β‘β β£β β’β‘β β£β β’β‘β β£β β’β‘β β£β β’β‘β β£β β’β‘β β£β β’β‘β ‘β£
## β β β ‘β£β£β β β ‘β£β£β β β ’β£β‘ β β β ’β£β‘ β β β ’β£β‘ β β β ’β£β‘ β β β ’β£β‘ β β β ’β£β‘
## β If you use echolocatoR or any of the echoverse subpackages, please cite:
## βΆ Brian M Schilder, Jack Humphrey, Towfique
## Raj (2021) echolocatoR: an automated
## end-to-end statistical and functional
## genomic fine-mapping pipeline,
## Bioinformatics; btab658,
## https://doi.org/10.1093/bioinformatics/btab658
## β Please report any bugs/feature requests on GitHub:
## βΆ
## https://github.com/RajLabMSSM/echolocatoR/issues
## β Contributions are welcome!:
## βΆ
## https://github.com/RajLabMSSM/echolocatoR/pulls
has_internet <- tryCatch(
!is.null(curl::nslookup("github.com", error = FALSE)),
error = function(e) FALSE
)Full pipeline
All examples below use data from the Parkinsonβs disease GWAS by Nalls et al.Β (2019).
Prepare top_SNPs data.frame
- To enable rapid fine-mapping of many loci, you can create a
top_SNPsdata.frame which contains the position of the lead/index SNP within each locus. -
finemap_loci()(see next step) will then use this info to extract subsets of the full GWAS/QTL summary statistics using windows centered on each lead/index SNP. - The
topSSargument can either be a data.frame, or a path to a topSS file saved somewhere. Most common tabular data formats (e.g.Β .tsv, .csv, .xlsx) are accepted.
#### Load example top SNPs (pre-formatted) ####
topSS <- echodata::topSNPs_Nalls2019_raw
#### construct a column mapping object ####
colmap <- echodata::construct_colmap(P = "P, all studies",
Effect = "Beta, all studies",
Locus = "Nearest Gene",
Gene = "QTL Nominated Gene (nearest QTL)")
#### Import top SNPs ####
topSNPs <- echodata::import_topSNPs(
topSS = echodata::topSNPs_Nalls2019_raw,
colmap = colmap,
grouping_vars = "Locus Number")## Loading required namespace: MungeSumstats
## Renaming column: P, all studies ==> P
## Renaming column: Beta, all studies ==> Effect
## Renaming column: Nearest Gene ==> Locus
## Renaming column: QTL Nominated Gene (nearest QTL) ==> Gene
## [1] "+ Assigning Gene and Locus independently."
## Standardising column headers.
## First line of summary statistics file:
## SNP CHR BP Locus Gene Effect allele Other allele Effect allele frequency Effect SE, all studies P P, COJO, all studies P, random effects, all studies P, Conditional 23AndMe only P, 23AndMe only I2, all studies Freq1, previous studies Beta, previous studies StdErr, previous studies P, previous studies I2, previous studies Freq1, new studies Beta, new studies StdErr, new studies P, new studies I2, new studies Passes pooled 23andMe QC Known GWAS locus within 1MB Failed final filtering and QC Locus within 250KB Locus Number
## Returning unmapped column names without making them uppercase.
## + Mapping colnames from MungeSumstats ==> echolocatoR
head(topSNPs)## Key: <Locus>
## SNP CHR POS Locus Gene A2 A1 Freq Effect
## <char> <num> <num> <char> <char> <char> <char> <num> <num>
## 1: rs1941685 18 31304318 ASXL3 <NA> t g 0.4983 0.0531
## 2: rs2280104 8 22525980 BIN3 BIN3 t c 0.3604 0.0556
## 3: rs61169879 17 59917366 BRIP1 MED13 t c 0.1641 0.0820
## 4: rs4698412 4 15737348 BST1 BST1 a g 0.5529 0.1035
## 5: rs11950533 5 134199105 C5orf24 TXNDC15 a c 0.1020 -0.0916
## 6: rs9568188 13 49927732 CAB39L CAB39L t c 0.7397 0.0617
## SE, all studies P P, COJO, all studies P, random effects, all studies
## <num> <num> <num> <num>
## 1: 0.0094 1.69e-08 1.61e-08 1.690e-08
## 2: 0.0098 1.16e-08 1.40e-08 1.860e-06
## 3: 0.0134 9.28e-10 9.40e-10 6.210e-06
## 4: 0.0094 2.06e-28 9.73e-36 1.680e-19
## 5: 0.0158 7.16e-09 6.73e-09 2.680e-08
## 6: 0.0108 1.15e-08 1.11e-08 2.458e-04
## P, Conditional 23AndMe only P, 23AndMe only I2, all studies
## <num> <num> <num>
## 1: 1.64e-08 1.60e-08 0.0
## 2: 4.35e-02 4.94e-02 8.9
## 3: 9.07e-07 2.05e-06 16.4
## 4: 1.05e-07 1.15e-07 13.9
## 5: 5.08e-04 4.22e-04 1.9
## 6: 4.29e-06 4.41e-06 21.4
## Freq1, previous studies Beta, previous studies StdErr, previous studies
## <num> <num> <num>
## 1: 0.4978 0.0507 0.0113
## 2: 0.3595 0.0647 0.0117
## 3: 0.1641 0.0849 0.0163
## 4: 0.5547 0.1023 0.0112
## 5: 0.1008 -0.0986 0.0190
## 6: 0.7405 0.0657 0.0130
## P, previous studies I2, previous studies Freq1, new studies
## <num> <num> <num>
## 1: 6.82e-06 0.0 0.4995
## 2: 3.23e-08 15.0 0.3626
## 3: 1.90e-07 0.0 0.1641
## 4: 7.32e-20 49.3 0.5485
## 5: 2.05e-07 43.8 0.1048
## 6: 4.00e-07 39.8 0.7380
## Beta, new studies StdErr, new studies P, new studies I2, new studies
## <num> <num> <num> <num>
## 1: 0.0586 0.0171 6.147e-04 14.3
## 2: 0.0350 0.0177 4.724e-02 1.9
## 3: 0.0761 0.0236 1.242e-03 23.7
## 4: 0.1062 0.0170 4.160e-10 10.9
## 5: -0.0756 0.0287 8.388e-03 0.0
## 6: 0.0526 0.0196 7.353e-03 22.3
## Passes pooled 23andMe QC Known GWAS locus within 1MB
## <char> <num>
## 1: T 0
## 2: T 1
## 3: T 0
## 4: T 1
## 5: T 0
## 6: T 0
## Failed final filtering and QC Locus within 250KB Locus Number
## <num> <char> <char>
## 1: 0 0 73
## 2: 0 0 39
## 3: 0 0 71
## 4: 0 0 20
## 5: 0 0 28
## 6: 0 0 53
Path to full summary stats file
Since a full GWAS summary stats file would be too large to include within echolocatoR, we instead provide an example subset of the full summary stats.
To simulate how youβd actually use your own full summary stats file, we will save our example dataset to your computer (you can change the path to wherever you like).
We highly recommend munging your full summary stats using the Bioconductor package
MungeSumstatsfirst. Itβs easy to use and very robust. It also means you donβt have to provide most column mapping arguments infinemap_lociwhenmunged=TRUE.
Hereβs an example of how to munge your full summary stats file:
fullSS_path <- echodata::example_fullSS(munged = FALSE)
fullSS_path <- MungeSumstats::format_sumstats(path = fullSS_path, ref_genome = "GRCH37")
We have already munged the following example summary stats for you.
fullSS_path <- echodata::example_fullSS(dataset = "Nalls2019")## Writing file to ==> /var/folders/x7/97p7bsjj3tg3y7vvq_mwjggc0000gn/T//RtmpQYSy0i/nalls2019.fullSS_subset.tsv
Run fine-mapping pipeline
For a full description of all arguments, see
?finemap_loci.
Here are some key arguments:
- results_dir: Where you want to store all of your results.
-
finemap_methods: Which fine-mapping methods you want to
run. For a full list of currently supported methods, run the function
echofinemap::lfm(). -
bp_distance: Controls window size. Specifically,
bp_distanceis the number of basepairs upstream/downstream you want to extract for each locus. For example, if you want a 2Mb window (+/- 1Mb from the lead/index SNP intop_SNPs), setbp_distance=1e+06. -
plot_zoom: Zoom in/out from the center of each locus when
producing the multiview plot. You can adjust this separately from
bp_distanceso that you donβt have rerun the whole pipeline each time (locus subsets, LD matrices, and fine-mapping results are all automatically saved in locus-specific folders).
Note: Please use the full absolute paths (instead of
relative paths) wherever possible (e.g.Β results_dir). This
is especially important for the tool FINEMAP.
The following call fine-maps two loci (BST1 and MEX3C) using three statistical fine-mapping methods. Depending on your machine and internet speed, this may take 10-30 minutes per locus.
results <- echolocatoR::finemap_loci(
fullSS_path = fullSS_path,
topSNPs = topSNPs,
loci = c("BST1","MEX3C"),
LD_reference = "1KGphase3",
dataset_name = "Nalls23andMe_2019",
fullSS_genome_build = "hg19",
bp_distance = 1000,
finemap_methods = c("ABF","SUSIE","FINEMAP"),
munged = TRUE)Inspect results
The output of finemap_loci() is a named list of
data.tables, one per locus. Each data.table contains the original GWAS
summary statistics augmented with fine-mapping posterior probabilities
and credible set assignments.
Here we use the bundled BST1 example results to show what the output looks like:
dat <- echodata::BST1
## Top fine-mapped SNPs (consensus across methods)
consensus <- subset(dat, Consensus_SNP == TRUE)
knitr::kable(
consensus[, c("SNP","CHR","POS","P","mean.PP","Support","Consensus_SNP")],
caption = "Consensus fine-mapped SNPs for the BST1 locus",
row.names = FALSE
)| SNP | CHR | POS | P | mean.PP | Support | Consensus_SNP |
|---|---|---|---|---|---|---|
| rs4541502 | 4 | 15712787 | 0 | 0.7500001 | 3 | TRUE |
| rs34559912 | 4 | 15730146 | 0 | 0.5155283 | 2 | TRUE |
| rs4389574 | 4 | 15730398 | 0 | 0.5000000 | 2 | TRUE |
Next steps
- Explore and interpret results:
vignette("explore_results") - Visualize loci:
vignette("plotting") - Summarise across loci:
vignette("summarise") - Fine-map QTL data:
vignette("QTLs") - Learn about sub-packages:
vignette("echoverse_modules")
Session info
utils::sessionInfo()## R version 4.5.1 (2025-06-13)
## Platform: aarch64-apple-darwin20
## Running under: macOS Tahoe 26.3.1
##
## Matrix products: default
## BLAS: /Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/lib/libRlapack.dylib; LAPACK version 3.12.1
##
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
##
## time zone: America/New_York
## tzcode source: internal
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] echolocatoR_2.0.5 BiocStyle_2.38.0
##
## loaded via a namespace (and not attached):
## [1] splines_4.5.1 aws.s3_0.3.22
## [3] BiocIO_1.20.0 bitops_1.0-9
## [5] filelock_1.0.3 tibble_3.3.1
## [7] R.oo_1.27.1 cellranger_1.1.0
## [9] basilisk.utils_1.22.0 graph_1.88.1
## [11] rpart_4.1.24 XML_3.99-0.22
## [13] lifecycle_1.0.5 mixsqp_0.3-54
## [15] pals_1.10 OrganismDbi_1.52.0
## [17] ensembldb_2.34.0 lattice_0.22-9
## [19] MASS_7.3-65 backports_1.5.0
## [21] magrittr_2.0.4 Hmisc_5.2-5
## [23] openxlsx_4.2.8.1 sass_0.4.10
## [25] rmarkdown_2.30 jquerylib_0.1.4
## [27] yaml_2.3.12 otel_0.2.0
## [29] zip_2.3.3 reticulate_1.45.0
## [31] ggbio_1.58.0 gld_2.6.8
## [33] mapproj_1.2.12 DBI_1.3.0
## [35] RColorBrewer_1.1-3 maps_3.4.3
## [37] abind_1.4-8 expm_1.0-0
## [39] GenomicRanges_1.62.1 purrr_1.2.1
## [41] R.utils_2.13.0 AnnotationFilter_1.34.0
## [43] biovizBase_1.58.0 BiocGenerics_0.56.0
## [45] RCurl_1.98-1.17 nnet_7.3-20
## [47] VariantAnnotation_1.56.0 IRanges_2.44.0
## [49] S4Vectors_0.48.0 echofinemap_1.0.0
## [51] echoLD_0.99.12 catalogueR_2.0.1
## [53] irlba_2.3.7 pkgdown_2.2.0
## [55] echodata_1.0.0 piggyback_0.1.5
## [57] codetools_0.2-20 DelayedArray_0.36.0
## [59] DT_0.34.0 xml2_1.5.2
## [61] tidyselect_1.2.1 UCSC.utils_1.6.1
## [63] farver_2.1.2 viridis_0.6.5
## [65] matrixStats_1.5.0 stats4_4.5.1
## [67] base64enc_0.1-6 Seqinfo_1.0.0
## [69] echotabix_1.0.0 GenomicAlignments_1.46.0
## [71] jsonlite_2.0.0 e1071_1.7-17
## [73] Formula_1.2-5 survival_3.8-6
## [75] systemfonts_1.3.2 ggnewscale_0.5.2
## [77] tools_4.5.1 ragg_1.5.1
## [79] DescTools_0.99.60 Rcpp_1.1.1
## [81] glue_1.8.0 gridExtra_2.3
## [83] SparseArray_1.10.9 xfun_0.56
## [85] MatrixGenerics_1.22.0 GenomeInfoDb_1.46.2
## [87] dplyr_1.2.0 withr_3.0.2
## [89] BiocManager_1.30.27 fastmap_1.2.0
## [91] basilisk_1.22.0 boot_1.3-32
## [93] digest_0.6.39 R6_2.6.1
## [95] colorspace_2.1-2 textshaping_1.0.5
## [97] dichromat_2.0-0.1 RSQLite_2.4.6
## [99] cigarillo_1.0.0 R.methodsS3_1.8.2
## [101] utf8_1.2.6 tidyr_1.3.2
## [103] generics_0.1.4 data.table_1.18.2.1
## [105] rtracklayer_1.70.1 class_7.3-23
## [107] httr_1.4.8 htmlwidgets_1.6.4
## [109] S4Arrays_1.10.1 pkgconfig_2.0.3
## [111] gtable_0.3.6 Exact_3.3
## [113] blob_1.3.0 S7_0.2.1
## [115] XVector_0.50.0 echoconda_1.0.0
## [117] htmltools_0.5.9 susieR_0.14.2
## [119] bookdown_0.46 RBGL_1.86.0
## [121] ProtGenerics_1.42.0 scales_1.4.0
## [123] Biobase_2.70.0 lmom_3.2
## [125] png_0.1-8 knitr_1.51
## [127] rstudioapi_0.18.0 reshape2_1.4.5
## [129] tzdb_0.5.0 rjson_0.2.23
## [131] checkmate_2.3.4 curl_7.0.0
## [133] proxy_0.4-29 cachem_1.1.0
## [135] stringr_1.6.0 rootSolve_1.8.2.4
## [137] parallel_4.5.1 foreign_0.8-91
## [139] AnnotationDbi_1.72.0 restfulr_0.0.16
## [141] desc_1.4.3 pillar_1.11.1
## [143] grid_4.5.1 reshape_0.8.10
## [145] vctrs_0.7.1 cluster_2.1.8.2
## [147] htmlTable_2.4.3 evaluate_1.0.5
## [149] readr_2.2.0 GenomicFeatures_1.62.0
## [151] mvtnorm_1.3-3 cli_3.6.5
## [153] compiler_4.5.1 Rsamtools_2.26.0
## [155] rlang_1.1.7 crayon_1.5.3
## [157] aws.signature_0.6.0 ieugwasr_1.1.0
## [159] plyr_1.8.9 forcats_1.0.1
## [161] fs_1.6.7 stringi_1.8.7
## [163] coloc_5.2.3 echoannot_1.0.1
## [165] viridisLite_0.4.3 BiocParallel_1.44.0
## [167] Biostrings_2.78.0 lazyeval_0.2.2
## [169] Matrix_1.7-4 downloadR_1.0.0
## [171] echoplot_0.99.9 dir.expiry_1.18.0
## [173] MungeSumstats_1.18.1 BSgenome_1.78.0
## [175] hms_1.1.4 patchwork_1.3.2
## [177] bit64_4.6.0-1 ggplot2_4.0.2
## [179] KEGGREST_1.50.0 SummarizedExperiment_1.40.0
## [181] haven_2.5.5 memoise_2.0.1
## [183] snpStats_1.60.0 bslib_0.10.0
## [185] bit_4.6.0 readxl_1.4.5