echolocatoR will automatically fine-map each locus. Uses the topSNPs data.frame to define locus coordinates.

finemap_loci(
  loci = NULL,
  fullSS_path,
  fullSS_genome_build = NULL,
  results_dir = file.path(tempdir(), "results"),
  dataset_name = "dataset_name",
  dataset_type = "GWAS",
  topSNPs = "auto",
  force_new_subset = FALSE,
  force_new_LD = FALSE,
  force_new_finemap = FALSE,
  finemap_methods = c("ABF", "FINEMAP", "SUSIE"),
  finemap_args = NULL,
  n_causal = 5,
  credset_thresh = 0.95,
  consensus_thresh = 2,
  fillNA = 0,
  conditioned_snps = "auto",
  priors_col = NULL,
  munged = FALSE,
  colmap = echodata::construct_colmap(munged = munged),
  compute_n = "ldsc",
  LD_reference = "1KGphase3",
  LD_genome_build = "hg19",
  leadSNP_LD_block = FALSE,
  superpopulation = "EUR",
  download_method = "axel",
  bp_distance = 5e+05,
  min_POS = NA,
  max_POS = NA,
  min_MAF = NA,
  trim_gene_limits = FALSE,
  max_snps = NULL,
  min_r2 = 0,
  remove_variants = FALSE,
  remove_correlates = FALSE,
  query_by = "tabix",
  case_control = TRUE,
  qtl_suffixes = NULL,
  plot_types = c("simple"),
  show_plot = TRUE,
  zoom = "1x",
  tx_biotypes = NULL,
  nott_epigenome = FALSE,
  nott_show_placseq = FALSE,
  nott_binwidth = 200,
  nott_bigwig_dir = NULL,
  xgr_libnames = NULL,
  roadmap = FALSE,
  roadmap_query = NULL,
  remove_tmps = TRUE,
  conda_env = "echoR_mini",
  return_all = TRUE,
  use_tryCatch = TRUE,
  seed = 2022,
  nThread = 1,
  verbose = TRUE,
  top_SNPs = deprecated(),
  PP_threshold = deprecated(),
  consensus_threshold = deprecated(),
  plot.Nott_epigenome = deprecated(),
  plot.Nott_show_placseq = deprecated(),
  plot.Nott_binwidth = deprecated(),
  plot.Nott_bigwig_dir = deprecated(),
  plot.Roadmap = deprecated(),
  plot.Roadmap_query = deprecated(),
  plot.XGR_libnames = deprecated(),
  server = deprecated(),
  plot.types = deprecated(),
  plot.zoom = deprecated(),
  QTL_prefixes = deprecated(),
  vcf_folder = deprecated(),
  probe_path = deprecated(),
  file_sep = deprecated(),
  chrom_col = deprecated(),
  chrom_type = deprecated(),
  position_col = deprecated(),
  snp_col = deprecated(),
  pval_col = deprecated(),
  effect_col = deprecated(),
  stderr_col = deprecated(),
  tstat_col = deprecated(),
  locus_col = deprecated(),
  freq_col = deprecated(),
  MAF_col = deprecated(),
  A1_col = deprecated(),
  A2_col = deprecated(),
  gene_col = deprecated(),
  N_cases_col = deprecated(),
  N_controls_col = deprecated(),
  N_cases = deprecated(),
  N_controls = deprecated(),
  proportion_cases = deprecated(),
  sample_size = deprecated(),
  PAINTOR_QTL_datasets = deprecated()
)

Arguments

loci

Character list of loci in Locus col of topSNPs.

fullSS_path

Path to the full summary statistics file (GWAS or QTL) that you want to fine-map. It is usually best to provide the absolute path rather than the relative path.

fullSS_genome_build

Genome build of the full summary statistics (fullSS_path). Can be "GRCH37" or "GRCH38" or one of their synonyms.. If fullSS_genome_build==NULL and munged=TRUE, infers genome build (hg19 vs. hg38) from summary statistics using get_genome_builds.

results_dir

Where to store all results. IMPORTANT!: It is usually best to provide the absolute path rather than the relative path. This is especially important for FINEMAP.

dataset_name

The name you want to assign to the dataset being fine-mapped, This will be used to name the subdirectory where your results will be stored (e.g. Data/GWAS/<dataset_name>). Don't use special characters (e.g.".", "/").

dataset_type

The kind dataset you're fine-mapping (e.g. GWAS, eQTL, tQTL). This will also be used when creating the subdirectory where your results will be stored (e.g. Data/<dataset_type>/Kunkle_2019).

topSNPs

A data.frame with the genomic coordinates of the lead SNP for each locus. The lead SNP will be used as the center of the window when extracting subset from the full GWAS/QTL summary statistics file. Only one SNP per Locus should be included. At minimum, topSNPs should include the following columns:

Locus

A unique name for each locus. Often, loci are named after a relevant gene (e.g. LRRK2) or based on the name/coordinates of the lead SNP (e.g. locus_chr12_40734202)

CHR

The chromosome that the SNP is on. Can be "chr12" or "12" format.

POS

The genomic position of the SNP (in basepairs)

force_new_subset

By default, if a subset of the full summary stats file for a given locus is already present, then echolocatoR will just use the pre-existing file. Set force_new_subset=T to override this and extract a new subset. Subsets are saved in the following path structure: Data/\<dataset_type\>/\<dataset_name\>/\<locus\>/Multi-finemap/ \<locus\>_\<dataset_name\>_Multi-finemap.tsv.gz

force_new_LD

Force new LD subset.

force_new_finemap

By default, if an fine-mapping results file for a given locus is already present, then echolocatoR will just use the preexisting file. Set force_new_finemap=T to override this and re-run fine-mapping.

finemap_methods

Which fine-mapping methods you want to use.

finemap_args

A named nested list containing additional arguments for each fine-mapping method. e.g. finemap_args = list(FINEMAP=list(), PAINTOR=list(method=""))

n_causal

The maximum number of potential causal SNPs per locus. This parameter is used somewhat differently by different fine-mapping tools. See tool-specific functions for details.

credset_thresh

The minimum fine-mapped posterior probability for a SNP to be considered part of a Credible Set. For example, credset_thresh=.95 means that all Credible Set SNPs will be 95% Credible Set SNPs.

consensus_thresh

The minimum number of fine-mapping tools in which a SNP is in the Credible Set in order to be included in the "Consensus_SNP" column.

fillNA

Value to fill LD matrix NAs with.

conditioned_snps

Which SNPs to conditions on when fine-mapping with (e.g. COJO).

priors_col

[Optional] Name of the a column in dat to extract SNP-wise prior probabilities from.

munged

Whether fullSS_path have already been standardised/filtered full summary stats with format_sumstats. If munged=FALSE you'll need to provide the necessary column names to the colmap argument.

colmap

Column name mappings in in fullSS_path. Must be a named list. Can use construct_colmap to assist with this. This function can be used in two different ways:

  • munged=FALSE : When munged=FALSE, you will need to provide the necessary column names to the colmap argument (default).

  • munged=TRUE : Alternatively, instead of filling out each argument in construct_colmap, you can simply set munged=TRUE if fullSS_path has already been munged with format_sumstats.

compute_n

How to compute per-SNP sample size (new column "N").
If the column "N" is already present in dat, this column will be used to extract per-SNP sample sizes and the argument compute_n will be ignored.
If the column "N" is not present in dat, one of the following options can be supplied to compute_n:

  • 0: N will not be computed.

  • >0: If any number >0 is provided, that value will be set as N for every row. **Note**: Computing N this way is incorrect and should be avoided if at all possible.

  • "sum": N will be computed as: cases (N_CAS) + controls (N_CON), so long as both columns are present.

  • "ldsc": N will be computed as effective sample size: Neff =(N_CAS+N_CON)*(N_CAS/(N_CAS+N_CON)) / mean((N_CAS/(N_CAS+N_CON))(N_CAS+N_CON)==max(N_CAS+N_CON)).

  • "giant": N will be computed as effective sample size: Neff = 2 / (1/N_CAS + 1/N_CON).

  • "metal": N will be computed as effective sample size: Neff = 4 / (1/N_CAS + 1/N_CON).

LD_reference

LD reference to use:

  • "1KGphase1" : 1000 Genomes Project Phase 1 (genome build: hg19).

  • "1KGphase3" : 1000 Genomes Project Phase 3 (genome build: hg19).

  • "UKB" : Pre-computed LD from a British European-decent subset of UK Biobank. Genome build : hg19

  • "<vcf_path>" : User-supplied path to a custom VCF file to compute LD matrix from.
    Accepted formats: .vcf / .vcf.gz / .vcf.bgz
    Genome build : defined by user with target_genome.

  • "<matrix_path>" : User-supplied path to a pre-computed LD matrix Accepted formats: .rds / .rda / .csv / .tsv / .txt
    Genome build : defined by user with target_genome.

LD_genome_build

Genome build of the LD panel. This is automatically assigned to the correct genome build for each LD panel except when the user supplies custom vcf/LD files.

leadSNP_LD_block

Only return SNPs within the same LD block as the lead SNP (the SNP with the smallest p-value).

superpopulation

Superpopulation to subset LD panel by (used only if LD_reference is "1KGphase1" or "1KGphase3"). See popDat_1KGphase1 and popDat_1KGphase3 for full tables of their respective samples.

download_method
  • "axel" : Multi-threaded

  • "wget" : Single-threaded

  • "download.file" : Single-threaded

  • "internal" : Single-threaded (passed to download.file)

  • "wininet" : Single-threaded (passed to download.file)

  • "libcurl" : Single-threaded (passed to download.file)

  • "curl" : Single-threaded (passed to download.file)

bp_distance

Distance around the lead SNP to include.

min_POS

Minimum genomic position to include.

max_POS

Maximum genomic position to include.

min_MAF

Minimum Minor Allele Frequency (MAF) of SNPs to include.

trim_gene_limits

If a gene name is supplied to this argument (e.g. trim_gene_limits="BST"), only SNPs within the gene body will be included.

max_snps

Maximum number of SNPs to include.

min_r2

Correlation threshold for remove_correlates.

remove_variants

A list of SNP RSIDs to remove.

remove_correlates

A list of SNPs. If provided, all SNPs that correlates with these SNPs (at r2>=min_r2) will be removed from both dat and LD list items..

query_by

Choose which method you want to use to extract locus subsets from the full summary stats file. Methods include:

"tabix"

Convert the full summary stats file in an indexed tabix file. Makes querying lightning fast after the initial conversion is done. (default)

"coordinates"

Extract locus subsets using min/max genomic coordinates with awk.

case_control

Whether the summary statistics come from a case-control study (e.g. a GWAS of having Alzheimer's Disease or not) (TRUE) or a quantitative study (e.g. a GWAS of height, or an eQTL) (FALSE).

qtl_suffixes

If columns with QTL data is included in dat, you can indicate which columns those are with one or more string suffixes (e.g. qtl_suffixes=c(".eQTL1",".eQTL2") to use the columns "P.QTL1", "Effect.QTL1", "P.QTL2", "Effect.QTL2").

plot_types

Which kinds of plots to include. Options:

  • "simple"Just plot the following tracks: GWAS, fine-mapping, gene models

  • "fancy"Additionally plot XGR annotation tracks (XGR, Roadmap, Nott2019). '

  • "LD"LD heatmap showing the 10 SNPs surrounding the lead SNP.

show_plot

Print plot to screen.

zoom

Zoom into the center of the locus when plotting (without editing the fine-mapping results file). You can provide either:

  • The size of your plot window in terms of basepairs (e.g. zoom=50000 for a 50kb window).

  • How much you want to zoom in (e.g. zoom="1x" for the full locus, zoom="2x" for 2x zoom into the center of the locus, etc.).

You can pass a list of window sizes (e.g. c(50000,100000,500000)) to automatically generate multiple views of each locus. This can even be a mix of different style inputs: e.g. c("1x","4.5x",25000).

tx_biotypes

Transcript biotypes to include in the gene model track. By default (NULL), all transcript biotypes will be included. See get_tx_biotypes for a full list of all available biotypes

nott_epigenome

Include tracks showing brain cell-type-specific epigenomic data from Nott et al. (2019).

nott_show_placseq

Include track generated by NOTT2019_plac_seq_plot.

nott_binwidth

When including Nott et al. (2019) epigenomic data in the track plots, adjust the bin width of the histograms.

nott_bigwig_dir

Instead of pulling Nott et al. (2019) epigenomic data from the UCSC Genome Browser, use a set of local bigwig files.

xgr_libnames

Passed to XGR_plot. Which XGR annotations to check overlap with. For full list of libraries see here. Passed to the RData.customised argument in xRDataLoader. Examples:

  • "ENCODE_TFBS_ClusteredV3_CellTypes"

  • "ENCODE_DNaseI_ClusteredV3_CellTypes"

  • "Broad_Histone"

roadmap

Find and plot annotations from Roadmap.

roadmap_query

Only plot annotations from Roadmap whose metadata contains a string or any items from a list of strings (e.g. "brain" or c("brain","liver","monocytes")).

remove_tmps

Whether to remove any temporary files (e.g. FINEMAP output files) after the pipeline is done running.

conda_env

Conda environment to use.

return_all

Return a nested list of various the pipeline's outputs including plots, tables, and file paths (default: TRUE). If FALSE, instead only returns a single merged data.table containing the results from all loci.

use_tryCatch

If an error is encountered in one locus, the pipeline will continue to try running the rest of the loci (default: use_tryCatch=TRUE). This avoid stopping all analyses due to errors that only affect some loci, but currently prevents debugging via traceback.

seed

Set the seed for all functions where this is possible.

nThread

Number of threads to parallelise saving across.

verbose

Print messages.

top_SNPs

[deprecated]

PP_threshold

[deprecated]

consensus_threshold

[deprecated]

plot.Nott_epigenome

[deprecated]

plot.Nott_show_placseq

[deprecated]

plot.Nott_binwidth

[deprecated]

plot.Nott_bigwig_dir

[deprecated]

plot.Roadmap

[deprecated]

plot.Roadmap_query

[deprecated]

plot.XGR_libnames

[deprecated]

server

[deprecated]

plot.types

[deprecated]

plot.zoom

[deprecated]

QTL_prefixes

[deprecated]

vcf_folder

[deprecated]

probe_path

[deprecated]

file_sep

[deprecated]

chrom_col

[deprecated]

chrom_type

[deprecated]

position_col

[deprecated]

snp_col

[deprecated]

pval_col

[deprecated]

effect_col

[deprecated]

stderr_col

[deprecated]

tstat_col

[deprecated]

locus_col

[deprecated]

freq_col

[deprecated]

MAF_col

[deprecated]

A1_col

[deprecated]

A2_col

[deprecated]

gene_col

[deprecated]

N_cases_col

[deprecated]

N_controls_col

[deprecated]

N_cases

[deprecated]

N_controls

[deprecated]

proportion_cases

[deprecated]

sample_size

[deprecated]

PAINTOR_QTL_datasets

[deprecated]

Value

By default, returns a nested list containing grouped by locus names (e.g. BST1, MEX3C). The results of each locus contain the following elements:

  • finemap_dat : Fine-mapping results from all selected methods merged with the original summary statistics (i.e. Multi-finemap results).

  • locus_plot : A nested list containing one or more zoomed views of locus plots.

  • LD_matrix : The post-processed LD matrix used for fine-mapping.

  • LD_plot : An LD plot (if used).

  • locus_dir : Locus directory results are saved in.

  • arguments : A record of the arguments supplied to finemap_loci.

In addition, the following object summarizes the results from all the locus-specific results:

  • merged_dat : A merged data.table with all fine-mapping results from all loci.

See also

Other MAIN: finemap_locus()

Examples

topSNPs <- echodata::topSNPs_Nalls2019
fullSS_path <- echodata::example_fullSS(dataset = "Nalls2019")
#> Writing file to ==> /tmp/Rtmp0dbVNr/nalls2019.fullSS_subset.tsv

res <- echolocatoR::finemap_loci(
  fullSS_path = fullSS_path,
  topSNPs = topSNPs,
  loci = c("BST1","MEX3C"),
  finemap_methods = c("ABF","FINEMAP","SUSIE"),
  dataset_name = "Nalls23andMe_2019",
  fullSS_genome_build = "hg19",
  bp_distance = 1000,
  munged = TRUE)
#> [1] "+ Assigning Gene and Locus independently."
#> Standardising column headers.
#> First line of summary statistics file: 
#> Locus	Gene	CHR	POS	SNP	P	Effect	
#> Returning unmapped column names without making them uppercase.
#> + Mapping colnames from MungeSumstats ==> echolocatoR
#> ┌────────────────────────────────────────┐
#> │                                        │
#>)))> 🦇 BST1 [locus 1 / 2] 🦇 <(((
#> │                                        │
#> └────────────────────────────────────────┘
#> 
#> ────────────────────────────────────────────────────────────────────────────────
#> 
#> ── Step 1 ▶▶▶ Query 🔎 ─────────────────────────────────────────────────────────
#> 
#> ────────────────────────────────────────────────────────────────────────────────
#> + Query Method: tabix
#> Constructing GRanges query using min/max ranges within a single chromosome.
#> query_dat is already a GRanges object. Returning directly.
#> ========= echotabix::convert =========
#> Converting full summary stats file to tabix format for fast querying.
#> Inferred format: 'table'
#> Explicit format: 'table'
#> Inferring comment_char from tabular header: 'SNP'
#> Determining chrom type from file header.
#> Assuming fullSS_path summary stats have already been processed with MungeSumstats.
#> Chromosome format: 1
#> Detecting column delimiter.
#> Identified column separator: \t
#> Sorting rows by coordinates via bash.
#> Searching for header row with grep.
#> ( grep ^'SNP' .../nalls2019.fullSS_subset.tsv; grep
#>     -v ^'SNP' .../nalls2019.fullSS_subset.tsv | sort
#>     -k2,2n
#>     -k3,3n ) > .../file132597ef28cad_sorted.tsv
#> Constructing outputs
#> bgzipping file with Rsamtools.
#> Reading bgzipped file using: data.table
#> Header preview:
#>           SNP CHR       BP A1 A2   FREQ    BETA     SE      P N_CAS   N_CON
#> 1:   rs741214   4 13737637  C  A 0.2341  0.0091 0.0119 0.4460 49053 1411006
#> 2: rs79385021   4 13737659  T  C 0.9109  0.0009 0.0198 0.9642 30435  974587
#> 3:   rs741215   4 13737767  C  T 0.0733  0.0190 0.0194 0.3284 49053 1411006
#> 4:   rs759261   4 13737915  G  A 0.4511  0.0075 0.0100 0.4543 49053 1411006
#> 5: rs35006360   4 13738014  G  A 0.1148 -0.0195 0.0160 0.2208 49053 1411006
#> Tabix-indexing file using: Rsamtools
#> Data successfully converted to bgzip-compressed, tabix-indexed format.
#> ========= echotabix::query =========
#> query_dat is already a GRanges object. Returning directly.
#> Inferred format: 'table'
#> Querying tabular tabix file using: Rsamtools.
#> Checking query chromosome style is correct.
#> Chromosome format: 1
#> Retrieving data.
#> Converting query results to data.table.
#> Processing query: 4:15736348-15738348
#> Adding 'query' column to results.
#> Retrieved data with 17 rows
#> Saving query ==> /tmp/Rtmp0dbVNr/results/GWAS/Nalls23andMe_2019/BST1/BST1_Nalls23andMe_2019_subset.tsv.gz
#> + Query: 17 SNPs x 11 columns.
#> Standardizing summary statistics subset.
#> Standardizing main column names.
#> ++ Preparing A1,A1 cols
#> ++ Preparing MAF,Freq cols.
#> ++ Could not infer MAF from Freq.
#> ++ Preparing N_cases,N_controls cols.
#> ++ Preparing proportion_cases col.
#> ++ Calculating proportion_cases from N_cases and N_controls.
#> Preparing sample size column (N).
#> Computing effective sample size using the LDSC method:
#>  Neff = (N_CAS+N_CON) * (N_CAS/(N_CAS+N_CON)) / mean((N_CAS/(N_CAS+N_CON))[(N_CAS+N_CON)==max(N_CAS+N_CON)]))
#> + Mapping colnames from MungeSumstats ==> echolocatoR
#> + Imputing t-statistic from Effect and StdErr.
#> + leadSNP missing. Assigning new one by min p-value.
#> ++ Ensuring Effect,StdErr,P are numeric.
#> ++ Ensuring 1 SNP per row and per genomic coordinate.
#> ++ Removing extra whitespace
#> + Standardized query: 17 SNPs x 15 columns.
#> ++ Saving standardized query ==> /tmp/Rtmp0dbVNr/results/GWAS/Nalls23andMe_2019/BST1/BST1_Nalls23andMe_2019_subset.tsv.gz
#> 
#> ────────────────────────────────────────────────────────────────────────────────
#> 
#> ── Step 2 ▶▶▶ Extract Linkage Disequilibrium 🔗 ────────────────────────────────
#> 
#> ────────────────────────────────────────────────────────────────────────────────
#> LD_reference identified as: 1kg.
#> Using 1000Genomes as LD reference panel.
#> Constructing GRanges query using min/max ranges across one or more chromosomes.
#> + as_blocks=TRUE: Will query a single range per chromosome that covers all regions requested (plus anything in between).
#> LD Reference Panel = 1KGphase3
#> Querying 1KG remote server.
#> Selecting 503 EUR individuals from 1kgphase3.
#> ========= echotabix::query =========
#> query_dat is already a GRanges object. Returning directly.
#> Explicit format: 'vcf'
#> Querying VCF tabix file.
#> Querying VCF file using: VariantAnnotation
#> Checking query chromosome style is correct.
#> Chromosome format: 1
#> Filtering query to 503 samples and returning ScanVcfParam object.
#> Retrieving data.
#> Time difference of 3.7 secs
#> Removing 17 / 66 non-overlapping SNPs.
#> Saving VCF subset ==> /tmp/Rtmp0dbVNr/VCF/Rtmp0dbVNr.chr4-15736493-15738253.ALL.chr4.phase3_shapeit2_mvncall_integrated_v5a.20130502.genotypes.vcf.bgz
#> Warning: converting NULL pointer to R NULL
#> Time difference of 0.4 secs
#> Retrieved data with 17 rows across 503 samples.
#> echoLD::snpStats:: Filling `MAF` column with MAF from LD panel.
#> echoLD:snpStats:: Computing pairwise LD between 17 SNPs across 503 individuals (stats = R).
#> Time difference of 0 secs
#> 17 x 17 LD_matrix (sparse)
#> Converting obj to sparseMatrix.
#> Saving sparse LD matrix ==> /tmp/Rtmp0dbVNr/results/GWAS/Nalls23andMe_2019/BST1/LD/BST1.1KGphase3_LD.RDS
#> Removing 1 temp files.
#> + FILTER:: Filtering by LD features.
#> 
#> ────────────────────────────────────────────────────────────────────────────────
#> 
#> ── Step 3 ▶▶▶ Filter SNPs 🚰 ───────────────────────────────────────────────────
#> 
#> ────────────────────────────────────────────────────────────────────────────────
#> FILTER:: Filtering by SNP features.
#> + FILTER:: Post-filtered data: 17 x 16
#> + Subsetting LD matrix and dat to common SNPs...
#> Removing unnamed rows/cols
#> Replacing NAs with 0
#> + LD_matrix = 17 SNPs.
#> + dat = 17 SNPs.
#> + 17 SNPs in common.
#> Converting obj to sparseMatrix.
#> 
#> ────────────────────────────────────────────────────────────────────────────────
#> 
#> ── Step 4 ▶▶▶ Fine-map 🔊 ──────────────────────────────────────────────────────
#> 
#> ────────────────────────────────────────────────────────────────────────────────
#> Gathering method sources.
#> Gathering method citations.
#> Preparing sample size column (N).
#> Using existing 'N' column.
#> Gathering method sources.
#> Gathering method citations.
#> ABF
#> ✅ All required columns present.
#> FINEMAP
#> ✅ All required columns present.
#> ✅ All optional columns present.
#> SUSIE
#> ✅ All required columns present.
#> ✅ All optional columns present.
#> ++ Fine-mapping using 3 tool(s): ABF, FINEMAP, SUSIE
#> 
#> +++ Multi-finemap:: ABF +++
#> Using all default values for finemap_args$ABF
#> Preparing sample size column (N).
#> Using existing 'N' column.
#> Running ABF.
#> ++ Credible Set SNPs identified = 0
#> ++ Merging ABF results with multi-finemap data.
#> 
#> +++ Multi-finemap:: FINEMAP +++
#> Using all default values for finemap_args$FINEMAP
#> Preparing sample size column (N).
#> Using existing 'N' column.
#> + Subsetting LD matrix and dat to common SNPs...
#> Removing unnamed rows/cols
#> Replacing NAs with 0
#> + LD_matrix = 17 SNPs.
#> + dat = 17 SNPs.
#> + 17 SNPs in common.
#> Converting obj to sparseMatrix.
#> Constructing master file.
#> Constructing data.z file.
#> Constructing data.ld file.
#> FINEMAP path: /github/home/.cache/R/echofinemap/FINEMAP/finemap_v1.4.1_x86_64/finemap_v1.4.1_x86_64
#> Inferred FINEMAP version: 1.4.1
#> Running FINEMAP.
#> cd .../BST1 &&
#>     .../finemap_v1.4.1_x86_64
#>    
#>     --sss
#>    
#>     --in-files .../master
#>    
#>     --log
#>    
#>     --n-causal-snps 5
#> 
#> |--------------------------------------|
#> | Welcome to FINEMAP v1.4.1            |
#> |                                      |
#> | (c) 2015-2022 University of Helsinki |
#> |                                      |
#> | Help :                               |
#> | - ./finemap --help                   |
#> | - www.finemap.me                     |
#> | - www.christianbenner.com            |
#> |                                      |
#> | Contact :                            |
#> | - finemap@christianbenner.com        |
#> | - matti.pirinen@helsinki.fi          |
#> |--------------------------------------|
#> 
#> --------
#> SETTINGS
#> --------
#> - dataset            : all
#> - corr-config        : 0.95
#> - n-causal-snps      : 5
#> - n-configs-top      : 50000
#> - n-conv-sss         : 100
#> - n-iter             : 100000
#> - n-threads          : 1
#> - prior-k0           : 0
#> - prior-std          : 0.05 
#> - prob-conv-sss-tol  : 0.001
#> - prob-cred-set      : 0.95
#> 
#> ------------
#> FINE-MAPPING (1/1)
#> ------------
#> - GWAS summary stats               : FINEMAP/data.z
#> - SNP correlations                 : FINEMAP/data.ld
#> - Causal SNP stats                 : FINEMAP/data.snp
#> - Causal configurations            : FINEMAP/data.config
#> - Credible sets                    : FINEMAP/data.cred
#> - Log file                         : FINEMAP/data.log_sss
#> 
- Reading summary statistics       : +
- Reading summary statistics       : -
- Reading summary statistics       : +
- Reading summary stats            : done!   
- Reading SNP correlations         : 0.000%
- Reading SNP correlations         : done!   
- Reading input                    : done!   
#> 
#> - Number of GWAS samples           : 1474097
#> - Number of SNPs                   : 17
#> - Prior-Pr(# of causal SNPs is k)  : 
#>   (0 -> 0)
#>    1 -> 0.59
#>    2 -> 0.295
#>    3 -> 0.0921
#>    4 -> 0.0202
#>    5 -> 0.00327
#> 0 configurations evaluated (0.000/100%)
- 17 configurations evaluated (0.000/100%)
- 31 configurations evaluated (0.000/100%)
- 260 configurations evaluated (0.050/100%)
- 370 configurations evaluated (0.100/100%)
- 670 configurations evaluated (0.150/100%)
- 686 configurations evaluated (0.200/100%)
- 701 configurations evaluated (0.225/100%) : converged after 225 iterations
#> 
- Computing causal SNP statistics  : 0.143%
- Computing causal SNP statistics  : 7.275%
- Computing causal SNP statistics  : 14.408%
- Computing causal SNP statistics  : 21.541%
- Computing causal SNP statistics  : 28.673%
- Computing causal SNP statistics  : 35.806%
- Computing causal SNP statistics  : 42.939%
- Computing causal SNP statistics  : 50.071%
- Computing causal SNP statistics  : 57.204%
- Computing causal SNP statistics  : 64.337%
- Computing causal SNP statistics  : 71.469%
- Computing causal SNP statistics  : 78.602%
- Computing causal SNP statistics  : 85.735%
- Computing causal SNP statistics  : 92.867%
- Computing causal SNP statistics  : 100.000%
- Computing causal SNP statistics  : done!   
#> - Regional SNP heritability        : 9.66e-05 (SD: 1.54e-05 ; 95% CI: [6.86e-05,0.000129])
#> - Log10-BF of >= one causal SNP    : 187
#> - Post-expected # of causal SNPs   : 5
#> - Post-Pr(# of causal SNPs is k)   : 
#>   (0 -> 0)
#>    1 -> 1.38e-164
#>    2 -> 4.49e-163
#>    3 -> 2.41e-163
#>    4 -> 3.98e-163
#>    5 -> 1
#> - Computing credible sets          : 
- Writing causal configurations    : 0.143%
- Writing causal configurations    : 7.275%
- Writing causal configurations    : 14.408%
- Writing causal configurations    : 21.541%
- Writing causal configurations    : 28.673%
- Writing causal configurations    : 35.806%
- Writing causal configurations    : 42.939%
- Writing causal configurations    : 50.071%
- Writing causal configurations    : 57.204%
- Writing causal configurations    : 64.337%
- Writing causal configurations    : 71.469%
- Writing causal configurations    : 78.602%
- Writing causal configurations    : 85.735%
- Writing causal configurations    : 92.867%
- Writing causal configurations    : 100.000%
- Writing causal SNP statistics    : 5.882%
- Writing causal SNP statistics    : 64.706%
- Writing output                   : done!   
#> - Run time                         : 0 hours, 0 minutes, 0 seconds
#> 1 data.cred* file(s) found in the same subfolder.
#> Selected file based on postPr_k: data.cred5
#> Importing conditional probabilities (.cred file).
#> Importing marginal probabilities (.snp file).
#> Importing configuration probabilities (.config file).
#> ++ Credible Set SNPs identified = 12
#> ++ Merging FINEMAP results with multi-finemap data.
#> 
#> +++ Multi-finemap:: SUSIE +++
#> Using all default values for finemap_args$SUSIE
#> Loading required namespace: Rfast
#> Preparing sample size column (N).
#> Using existing 'N' column.
#> + SUSIE:: sample_size=1,474,097
#> + Subsetting LD matrix and dat to common SNPs...
#> Removing unnamed rows/cols
#> Replacing NAs with 0
#> + LD_matrix = 17 SNPs.
#> + dat = 17 SNPs.
#> + 17 SNPs in common.
#> Converting obj to sparseMatrix.
#> + SUSIE:: Using `susie_rss()` from susieR v0.12.27
#> + SUSIE:: Extracting Credible Sets.
#> ++ Credible Set SNPs identified = 0
#> ++ Merging SUSIE results with multi-finemap data.
#> Identifying Consensus SNPs...
#> + support_thresh = 2
#> + Calculating mean Posterior Probability (mean.PP)...
#> + 3 fine-mapping methods used.
#> + 12 Credible Set SNPs identified.
#> + 0 Consensus SNPs identified.
#> Saving merged finemapping results ==> /tmp/Rtmp0dbVNr/results/GWAS/Nalls23andMe_2019/BST1/Multi-finemap/1KGphase3_LD.Multi-finemap.tsv.gz
#> + Fine-mapping with 'ABF, FINEMAP, SUSIE' completed:
#> 
#> ────────────────────────────────────────────────────────────────────────────────
#> 
#> ── Step 5 ▶▶▶ Plot 📈 ──────────────────────────────────────────────────────────
#> 
#> ────────────────────────────────────────────────────────────────────────────────
#> +-------- Locus Plot:  BST1 --------+
#> + support_thresh = 2
#> + Calculating mean Posterior Probability (mean.PP)...
#> + 3 fine-mapping methods used.
#> + 12 Credible Set SNPs identified.
#> + 0 Consensus SNPs identified.
#> + Filling NAs in CS cols with 0.
#> + Filling NAs in PP cols with 0.
#> LD_matrix detected. Coloring SNPs by LD with lead SNP.
#> ++ echoplot:: GWAS full window track
#> ++ echoplot:: GWAS track
#> ++ echoplot:: Merged fine-mapping track
#> Melting PP and CS from 4 fine-mapping methods.
#> + echoplot:: Constructing SNP labels.
#> Adding SNP group labels to locus plot.
#> ++ echoplot:: Adding Gene model track.
#> Converting dat to GRanges object.
#> Loading required namespace: EnsDb.Hsapiens.v75
#> max_transcripts= 1 . 
#> 1  transcripts from  1  genes returned.
#> Fetching data...
#> OK
#> Parsing exons...
#> OK
#> Defining introns...
#> OK
#> Defining UTRs...
#> OK
#> Defining CDS...
#> OK
#> aggregating...
#> Done
#> Constructing graphics...
#> + Adding vertical lines to highlight SNP groups.
#> +>+>+>+>+ zoom =  1x  +<+<+<+<+
#> + echoplot:: Get window suffix...
#> + echoplot:: Removing GWAS full window track @ zoom=1x
#> + Removing subplot margins...
#> + Reordering tracks...
#> + Ensuring last track shows genomic units.
#> + Aligning xlimits for each subplot...
#> + Checking track heights...
#> + echoplot:: Saving plot ==> /tmp/Rtmp0dbVNr/results/GWAS/Nalls23andMe_2019/BST1/multiview.BST1.1KGphase3.1x.png
#> Found more than one class "simpleUnit" in cache; using the first, from namespace 'hexbin'
#> Also defined by 'ggbio'
#> Recording all `finemap_locus` arguments.
#> Formatting locus results.
#> ┌─────────────────────────────────────────┐
#> │                                         │
#>)))> 🦇 MEX3C [locus 2 / 2] 🦇 <(((
#> │                                         │
#> └─────────────────────────────────────────┘
#> 
#> ────────────────────────────────────────────────────────────────────────────────
#> 
#> ── Step 1 ▶▶▶ Query 🔎 ─────────────────────────────────────────────────────────
#> 
#> ────────────────────────────────────────────────────────────────────────────────
#> + Query Method: tabix
#> Constructing GRanges query using min/max ranges within a single chromosome.
#> query_dat is already a GRanges object. Returning directly.
#> ========= echotabix::convert =========
#> Converting full summary stats file to tabix format for fast querying.
#> Inferred format: 'table'
#> Explicit format: 'table'
#> Inferring comment_char from tabular header: 'SNP'
#> Determining chrom type from file header.
#> Chromosome format: 1
#> Detecting column delimiter.
#> Identified column separator: \t
#> Sorting rows by coordinates via bash.
#> Searching for header row with grep.
#> ( grep ^'SNP' .../nalls2019.fullSS_subset.tsv; grep
#>     -v ^'SNP' .../nalls2019.fullSS_subset.tsv | sort
#>     -k2,2n
#>     -k3,3n ) > .../file13259ad17dc8_sorted.tsv
#> Constructing outputs
#> Using existing bgzipped file: /tmp/Rtmp0dbVNr/nalls2019.fullSS_subset.tsv.bgz 
#> Set force_new=TRUE to override this.
#> Tabix-indexing file using: Rsamtools
#> Data successfully converted to bgzip-compressed, tabix-indexed format.
#> ========= echotabix::query =========
#> query_dat is already a GRanges object. Returning directly.
#> Inferred format: 'table'
#> Querying tabular tabix file using: Rsamtools.
#> Checking query chromosome style is correct.
#> Chromosome format: 1
#> Retrieving data.
#> Converting query results to data.table.
#> Processing query: 18:48682589-48684589
#> Adding 'query' column to results.
#> Retrieved data with 7 rows
#> Saving query ==> /tmp/Rtmp0dbVNr/results/GWAS/Nalls23andMe_2019/MEX3C/MEX3C_Nalls23andMe_2019_subset.tsv.gz
#> + Query: 7 SNPs x 11 columns.
#> Standardizing summary statistics subset.
#> Standardizing main column names.
#> ++ Preparing A1,A1 cols
#> ++ Preparing MAF,Freq cols.
#> ++ Could not infer MAF from Freq.
#> ++ Preparing N_cases,N_controls cols.
#> ++ Preparing proportion_cases col.
#> ++ Calculating proportion_cases from N_cases and N_controls.
#> Preparing sample size column (N).
#> Computing effective sample size using the LDSC method:
#>  Neff = (N_CAS+N_CON) * (N_CAS/(N_CAS+N_CON)) / mean((N_CAS/(N_CAS+N_CON))[(N_CAS+N_CON)==max(N_CAS+N_CON)]))
#> + Mapping colnames from MungeSumstats ==> echolocatoR
#> + Imputing t-statistic from Effect and StdErr.
#> + leadSNP missing. Assigning new one by min p-value.
#> ++ Ensuring Effect,StdErr,P are numeric.
#> ++ Ensuring 1 SNP per row and per genomic coordinate.
#> ++ Removing extra whitespace
#> + Standardized query: 7 SNPs x 15 columns.
#> ++ Saving standardized query ==> /tmp/Rtmp0dbVNr/results/GWAS/Nalls23andMe_2019/MEX3C/MEX3C_Nalls23andMe_2019_subset.tsv.gz
#> 
#> ────────────────────────────────────────────────────────────────────────────────
#> 
#> ── Step 2 ▶▶▶ Extract Linkage Disequilibrium 🔗 ────────────────────────────────
#> 
#> ────────────────────────────────────────────────────────────────────────────────
#> LD_reference identified as: 1kg.
#> Using 1000Genomes as LD reference panel.
#> Constructing GRanges query using min/max ranges across one or more chromosomes.
#> + as_blocks=TRUE: Will query a single range per chromosome that covers all regions requested (plus anything in between).
#> LD Reference Panel = 1KGphase3
#> Querying 1KG remote server.
#> Selecting 503 EUR individuals from 1kgphase3.
#> ========= echotabix::query =========
#> query_dat is already a GRanges object. Returning directly.
#> Explicit format: 'vcf'
#> Querying VCF tabix file.
#> Querying VCF file using: VariantAnnotation
#> Checking query chromosome style is correct.
#> Chromosome format: 1
#> Filtering query to 503 samples and returning ScanVcfParam object.
#> Retrieving data.
#> Time difference of 3.5 secs
#> Removing 7 / 38 non-overlapping SNPs.
#> Saving VCF subset ==> /tmp/Rtmp0dbVNr/VCF/Rtmp0dbVNr.chr18-48682662-48684103.ALL.chr18.phase3_shapeit2_mvncall_integrated_v5a.20130502.genotypes.vcf.bgz
#> Warning: converting NULL pointer to R NULL
#> Time difference of 0.3 secs
#> Retrieved data with 7 rows across 503 samples.
#> echoLD::snpStats:: Filling `MAF` column with MAF from LD panel.
#> echoLD:snpStats:: Computing pairwise LD between 7 SNPs across 503 individuals (stats = R).
#> Time difference of 0 secs
#> 7 x 7 LD_matrix (sparse)
#> Converting obj to sparseMatrix.
#> Saving sparse LD matrix ==> /tmp/Rtmp0dbVNr/results/GWAS/Nalls23andMe_2019/MEX3C/LD/MEX3C.1KGphase3_LD.RDS
#> Removing 1 temp files.
#> + FILTER:: Filtering by LD features.
#> 
#> ────────────────────────────────────────────────────────────────────────────────
#> 
#> ── Step 3 ▶▶▶ Filter SNPs 🚰 ───────────────────────────────────────────────────
#> 
#> ────────────────────────────────────────────────────────────────────────────────
#> FILTER:: Filtering by SNP features.
#> + FILTER:: Post-filtered data: 5 x 16
#> + Subsetting LD matrix and dat to common SNPs...
#> Removing unnamed rows/cols
#> Replacing NAs with 0
#> + LD_matrix = 7 SNPs.
#> + dat = 5 SNPs.
#> + 5 SNPs in common.
#> Converting obj to sparseMatrix.
#> 
#> ────────────────────────────────────────────────────────────────────────────────
#> 
#> ── Step 4 ▶▶▶ Fine-map 🔊 ──────────────────────────────────────────────────────
#> 
#> ────────────────────────────────────────────────────────────────────────────────
#> Gathering method sources.
#> Gathering method citations.
#> Preparing sample size column (N).
#> Using existing 'N' column.
#> Gathering method sources.
#> Gathering method citations.
#> ABF
#> ✅ All required columns present.
#> FINEMAP
#> ✅ All required columns present.
#> ✅ All optional columns present.
#> SUSIE
#> ✅ All required columns present.
#> ✅ All optional columns present.
#> ++ Fine-mapping using 3 tool(s): ABF, FINEMAP, SUSIE
#> 
#> +++ Multi-finemap:: ABF +++
#> Using all default values for finemap_args$ABF
#> Preparing sample size column (N).
#> Using existing 'N' column.
#> Running ABF.
#> ++ Credible Set SNPs identified = 0
#> ++ Merging ABF results with multi-finemap data.
#> 
#> +++ Multi-finemap:: FINEMAP +++
#> Using all default values for finemap_args$FINEMAP
#> Preparing sample size column (N).
#> Using existing 'N' column.
#> + Subsetting LD matrix and dat to common SNPs...
#> Removing unnamed rows/cols
#> Replacing NAs with 0
#> + LD_matrix = 5 SNPs.
#> + dat = 5 SNPs.
#> + 5 SNPs in common.
#> Converting obj to sparseMatrix.
#> Constructing master file.
#> Constructing data.z file.
#> Constructing data.ld file.
#> FINEMAP path: /github/home/.cache/R/echofinemap/FINEMAP/finemap_v1.4.1_x86_64/finemap_v1.4.1_x86_64
#> Inferred FINEMAP version: 1.4.1
#> Running FINEMAP.
#> cd .../MEX3C &&
#>     .../finemap_v1.4.1_x86_64
#>    
#>     --sss
#>    
#>     --in-files .../master
#>    
#>     --log
#>    
#>     --n-causal-snps 5
#> 
#> |--------------------------------------|
#> | Welcome to FINEMAP v1.4.1            |
#> |                                      |
#> | (c) 2015-2022 University of Helsinki |
#> |                                      |
#> | Help :                               |
#> | - ./finemap --help                   |
#> | - www.finemap.me                     |
#> | - www.christianbenner.com            |
#> |                                      |
#> | Contact :                            |
#> | - finemap@christianbenner.com        |
#> | - matti.pirinen@helsinki.fi          |
#> |--------------------------------------|
#> 
#> --------
#> SETTINGS
#> --------
#> - dataset            : all
#> - corr-config        : 0.95
#> - n-causal-snps      : 5
#> - n-configs-top      : 50000
#> - n-conv-sss         : 100
#> - n-iter             : 100000
#> - n-threads          : 1
#> - prior-k0           : 0
#> - prior-std          : 0.05 
#> - prob-conv-sss-tol  : 0.001
#> - prob-cred-set      : 0.95
#> 
#> ------------
#> FINE-MAPPING (1/1)
#> ------------
#> - GWAS summary stats               : FINEMAP/data.z
#> - SNP correlations                 : FINEMAP/data.ld
#> - Causal SNP stats                 : FINEMAP/data.snp
#> - Causal configurations            : FINEMAP/data.config
#> - Credible sets                    : FINEMAP/data.cred
#> - Log file                         : FINEMAP/data.log_sss
#> 
- Reading summary statistics       : +
- Reading summary stats            : done!   
- Reading SNP correlations         : 0.000%
- Reading SNP correlations         : done!   
- Reading input                    : done!   
#> 
#> - Number of GWAS samples           : 1460059
#> - Number of SNPs                   : 5
#> - Prior-Pr(# of causal SNPs is k)  : 
#>   (0 -> 0)
#>    1 -> 0.609
#>    2 -> 0.305
#>    3 -> 0.0762
#>    4 -> 0.00952
#>    5 -> 0.000476
#> 0 configurations evaluated (0.000/100%)
- 5 configurations evaluated (0.000/100%)
- 7 configurations evaluated (0.000/100%)
- 11 configurations evaluated (0.050/100%)
- 13 configurations evaluated (0.100/100%)
- 14 configurations evaluated (0.106/100%) : converged after 106 iterations
#> 
- Computing causal SNP statistics  : 7.143%
- Computing causal SNP statistics  : done!   
#> - Regional SNP heritability        : 1.93e-05 (SD: 7.15e-06 ; 95% CI: [7.39e-06,3.54e-05])
#> - Log10-BF of >= one causal SNP    : 3.56
#> - Post-expected # of causal SNPs   : 1.09
#> - Post-Pr(# of causal SNPs is k)   : 
#>   (0 -> 0)
#>    1 -> 0.914
#>    2 -> 0.0853
#>    3 -> 0.000407
#>    4 -> 0
#>    5 -> 0
#> - Computing credible sets          : 
- Writing causal configurations    : 7.143%
- Writing causal SNP statistics    : 20.000%
- Writing output                   : done!   
#> - Run time                         : 0 hours, 0 minutes, 0 seconds
#> 2 data.cred* file(s) found in the same subfolder.
#> Selected file based on postPr_k: data.cred1
#> Importing conditional probabilities (.cred file).
#> No configurations were causal at PP>=0.95.
#> Importing marginal probabilities (.snp file).
#> Importing configuration probabilities (.config file).
#> FINEMAP was unable to identify any credible sets at PP>=0.95.
#> ++ Credible Set SNPs identified = 0
#> ++ Merging FINEMAP results with multi-finemap data.
#> 
#> +++ Multi-finemap:: SUSIE +++
#> Using all default values for finemap_args$SUSIE
#> Preparing sample size column (N).
#> Using existing 'N' column.
#> + SUSIE:: sample_size=1,460,059
#> + Subsetting LD matrix and dat to common SNPs...
#> Removing unnamed rows/cols
#> Replacing NAs with 0
#> + LD_matrix = 5 SNPs.
#> + dat = 5 SNPs.
#> + 5 SNPs in common.
#> Converting obj to sparseMatrix.
#> + SUSIE:: Using `susie_rss()` from susieR v0.12.27
#> + SUSIE:: Extracting Credible Sets.
#> ++ Credible Set SNPs identified = 0
#> ++ Merging SUSIE results with multi-finemap data.
#> Identifying Consensus SNPs...
#> + support_thresh = 2
#> + Calculating mean Posterior Probability (mean.PP)...
#> + 3 fine-mapping methods used.
#> + 0 Credible Set SNPs identified.
#> + 0 Consensus SNPs identified.
#> Saving merged finemapping results ==> /tmp/Rtmp0dbVNr/results/GWAS/Nalls23andMe_2019/MEX3C/Multi-finemap/1KGphase3_LD.Multi-finemap.tsv.gz
#> + Fine-mapping with 'ABF, FINEMAP, SUSIE' completed:
#> 
#> ────────────────────────────────────────────────────────────────────────────────
#> 
#> ── Step 5 ▶▶▶ Plot 📈 ──────────────────────────────────────────────────────────
#> 
#> ────────────────────────────────────────────────────────────────────────────────
#> +-------- Locus Plot:  MEX3C --------+
#> + support_thresh = 2
#> + Calculating mean Posterior Probability (mean.PP)...
#> + 3 fine-mapping methods used.
#> + 0 Credible Set SNPs identified.
#> + 0 Consensus SNPs identified.
#> + Filling NAs in CS cols with 0.
#> + Filling NAs in PP cols with 0.
#> LD_matrix detected. Coloring SNPs by LD with lead SNP.
#> ++ echoplot:: GWAS full window track
#> ++ echoplot:: GWAS track
#> ++ echoplot:: Merged fine-mapping track
#> Melting PP and CS from 4 fine-mapping methods.
#> + echoplot:: Constructing SNP labels.
#> Adding SNP group labels to locus plot.
#> ++ echoplot:: Adding Gene model track.
#> Converting dat to GRanges object.
#> max_transcripts= 1 . 
#> 0  transcripts from  0  genes returned.
#> Returning NULL for gene track.
#> + Adding vertical lines to highlight SNP groups.
#> +>+>+>+>+ zoom =  1x  +<+<+<+<+
#> + echoplot:: Get window suffix...
#> + echoplot:: Removing GWAS full window track @ zoom=1x
#> + Removing subplot margins...
#> + Reordering tracks...
#> + Ensuring last track shows genomic units.
#> + Aligning xlimits for each subplot...
#> + Checking track heights...
#> + echoplot:: Saving plot ==> /tmp/Rtmp0dbVNr/results/GWAS/Nalls23andMe_2019/MEX3C/multiview.MEX3C.1KGphase3.1x.png
#> Warning: Removed 1 rows containing missing values (`geom_text()`).

#> Warning: Removed 1 rows containing missing values (`geom_text()`).
#> Recording all `finemap_locus` arguments.
#> Formatting locus results.
#> 
#> ────────────────────────────────────────────────────────────────────────────────
#> 
#> ── Step 6 ▶▶▶ Postprocess data 🎁 ──────────────────────────────────────────────
#> 
#> ────────────────────────────────────────────────────────────────────────────────
#> Returning results as nested list.

#> All loci done in: 0.44 min
#>