Generate a locus-specific plot with multiple selectable tracks. Users can also generate multiple zoomed in views of the plot at multiple resolutions.
plot_locus(
dat,
locus_dir,
LD_matrix = NULL,
LD_reference = NULL,
facet_formula = "Method~.",
dataset_type = "GWAS",
color_r2 = TRUE,
finemap_methods = c("ABF", "FINEMAP", "SUSIE", "POLYFUN_SUSIE"),
track_order = NULL,
track_heights = NULL,
plot_full_window = TRUE,
dot_summary = FALSE,
qtl_suffixes = NULL,
mean.PP = TRUE,
credset_thresh = 0.95,
consensus_thresh = 2,
sig_cutoff = 5e-08,
gene_track = TRUE,
tx_biotypes = NULL,
point_size = 1,
point_alpha = 0.6,
density_adjust = 0.2,
snp_group_lines = c("Lead", "UCS", "Consensus"),
labels_subset = c("Lead", "CS", "Consensus"),
xtext = FALSE,
show_legend_genes = TRUE,
xgr_libnames = NULL,
xgr_n_top = 5,
roadmap = FALSE,
roadmap_query = NULL,
roadmap_n_top = 7,
zoom_exceptions_str = "*full window$|zoom_polygon",
nott_epigenome = FALSE,
nott_regulatory_rects = TRUE,
nott_show_placseq = TRUE,
nott_binwidth = 200,
nott_bigwig_dir = NULL,
save_plot = FALSE,
show_plot = TRUE,
genomic_units = "Mb",
strip.text.y.angle = 0,
max_transcripts = 1,
zoom = c("1x"),
dpi = 300,
height = 12,
width = 10,
plot_format = "png",
save_RDS = FALSE,
return_list = FALSE,
conda_env = "echoR_mini",
nThread = 1,
verbose = TRUE
)
Data to query transcripts with.
Storage directory to use.
LD matrix.
LD reference to use:
"1KGphase1" : 1000 Genomes Project Phase 1 (genome build: hg19).
"1KGphase3" : 1000 Genomes Project Phase 3 (genome build: hg19).
"UKB" : Pre-computed LD from a British European-decent subset of UK Biobank. Genome build : hg19
"<vcf_path>" : User-supplied path to a custom VCF file
to compute LD matrix from.
Accepted formats: .vcf / .vcf.gz / .vcf.bgz
Genome build : defined by user with target_genome
.
"<matrix_path>" : User-supplied path to a pre-computed LD matrix
Accepted formats: .rds / .rda / .csv /
.tsv / .txt
Genome build : defined by user with target_genome
.
Formula to facet plots by. See facet_grid for details.
Dataset type (e.g. "GWAS" or "eQTL").
Whether to color data points (SNPs) by how strongly they correlate with the lead SNP (i.g. LD measured in terms of r2).
Fine-mapping methods to plot tracks for, where the y-axis show the Posterior Probabilities (PP) of each SNP being causal.
The order in which tracks should appear (from top to bottom).
The height of each track (from top to bottom).
Include a track with a Manhattan plot of the full GWAS/eQTL locus (not just the zoomed-in portion).
Include a dot-summary plot that highlights the Lead, Credible Set, and Consensus SNPs.
If columns with QTL data is included in dat
,
you can indicate which columns those are with one or more string suffixes
(e.g. qtl_suffixes=c(".eQTL1",".eQTL2")
to use the columns
"P.QTL1", "Effect.QTL1", "P.QTL2", "Effect.QTL2").
Include a track showing mean Posterior Probabilities (PP) averaged across all fine-mapping methods.
The minimum fine-mapped posterior probability
for a SNP to be considered part of a Credible Set.
For example, credset_thresh=.95
means that all Credible Set SNPs
will be 95% Credible Set SNPs.
The minimum number of fine-mapping tools in which a SNP is in the Credible Set in order to be included in the "Consensus_SNP" column.
Filters out SNPs to plot based on an (uncorrected) p-value significance cutoff.
Include a track showing gene bodies.
Transcript biotypes to include in the gene model track.
By default (NULL
), all transcript biotypes will be included.
See get_tx_biotypes for a full list of
all available biotypes
Size of each data point.
Opacity of each data point.
Passed to adjust
argument in
geom_density.
Include vertical lines to help highlight SNPs belonging to one or more of the following groups: Lead, Credible Set, Consensus.
Include colored shapes and RSID labels to help highlight SNPs belonging to one or more of the following groups: Lead, Credible Set, Consensus.
Include x-axis title and text for each track (not just the lower-most one).
Show the legend for the gene_track
.
Passed to XGR_plot.
Which XGR annotations to check overlap with.
For full list of libraries see
here.
Passed to the RData.customised
argument in xRDataLoader.
Examples:
"ENCODE_TFBS_ClusteredV3_CellTypes"
"ENCODE_DNaseI_ClusteredV3_CellTypes"
"Broad_Histone"
Passed to XGR_plot. Number of top annotations to be plotted (passed to XGR_filter_sources and then XGR_filter_assays).
Find and plot annotations from Roadmap.
Only plot annotations from Roadmap whose
metadata contains a string or any items from a list of strings
(e.g. "brain"
or c("brain","liver","monocytes")
).
Passed to ROADMAP_plot. Number of top annotations to be plotted (passed to ROADMAP_query).
Names of tracks to exclude when zooming.
Include tracks showing brain cell-type-specific epigenomic data from Nott et al. (2019).
Include track generated by NOTT2019_epigenomic_histograms.
Include track generated by NOTT2019_plac_seq_plot.
When including Nott et al. (2019) epigenomic data in the track plots, adjust the bin width of the histograms.
Instead of pulling Nott et al. (2019) epigenomic data from the UCSC Genome Browser, use a set of local bigwig files.
Save plot as RDS file.
Print plot to screen.
Which genomic units to return window limits in.
Angle of the y-axis facet labels.
Maximum number of transcripts per gene.
Zoom into the center of the locus when plotting (without editing the fine-mapping results file). You can provide either:
The size of your plot window in terms of basepairs
(e.g. zoom=50000
for a 50kb window).
How much you want to zoom in (e.g. zoom="1x"
for the full locus, zoom="2x"
for 2x zoom into the center of the locus, etc.).
You can pass a list of window sizes (e.g. c(50000,100000,500000)
)
to automatically generate
multiple views of each locus.
This can even be a mix of different style inputs: e.g.
c("1x","4.5x",25000)
.
dpi to use for raster graphics
height (defaults to the height of current plotting window)
width (defaults to the width of current plotting window)
Format to save plot as when saving with ggsave.
Save the tracks as an RDS file (Warning: These plots take up a lot disk space).
Return a named list with each track as a separate plot
(default: FALSE
). If TRUE
, will return a merged plot using
wrap_plots.
Conda environments to search in.
If NULL
(default), will search all conda environments.
Number of threads to parallelize over.
Print messages.
dat1 <- echodata::BST1
LD_matrix <- echodata::BST1_LD_matrix
locus_dir <- file.path(tempdir(),echodata::locus_dir)
plt <- echoplot::plot_locus(dat = dat1,
locus_dir = locus_dir,
LD_matrix = LD_matrix,
show_plot = TRUE)
#> +-------- Locus Plot: BST1 --------+
#> + support_thresh = 2
#> + Calculating mean Posterior Probability (mean.PP)...
#> + 4 fine-mapping methods used.
#> + 7 Credible Set SNPs identified.
#> + 3 Consensus SNPs identified.
#> + Filling NAs in CS cols with 0.
#> + Filling NAs in PP cols with 0.
#> LD_matrix detected. Coloring SNPs by LD with lead SNP.
#> Filling r/r2 NAs with 0
#> ++ echoplot:: GWAS full window track
#> ++ echoplot:: GWAS track
#> ++ echoplot:: Merged fine-mapping track
#> Melting PP and CS from 5 fine-mapping methods.
#> + echoplot:: Constructing SNP labels.
#> Adding SNP group labels to locus plot.
#> ++ echoplot:: Adding Gene model track.
#> Converting dat to GRanges object.
#> max_transcripts= 1 .
#> 16 transcripts from 16 genes returned.
#> Fetching data...
#> OK
#> Parsing exons...
#> OK
#> Defining introns...
#> OK
#> Defining UTRs...
#> OK
#> Defining CDS...
#> OK
#> aggregating...
#> Done
#> Constructing graphics...
#> + Adding vertical lines to highlight SNP groups.
#> +>+>+>+>+ zoom = 1x +<+<+<+<+
#> + echoplot:: Get window suffix...
#> + echoplot:: Removing GWAS full window track @ zoom=1x
#> + Removing subplot margins...
#> + Reordering tracks...
#> + Ensuring last track shows genomic units.
#> + Aligning xlimits for each subplot...
#> + Checking track heights...