motifbreakR is a package to predict how much a SNP will disrupt a transcription factor binding motif (if it falls within one). Notes:

  • BSgenomeUsers must manually run library(BSgenome) before running any motifbreakR functions to successfully use this tool.

  • threshold= If filterp=TRUE, this argument indicates the p-value threshold. If filterp=FALSE, this argument instead indicates the pct threshold.

MOTIFBREAKR(
  rsid_list,
  results_dir = file.path(tempdir(), "results"),
  pwmList = NULL,
  pwmList_max = NULL,
  genome_build = NULL,
  organism = "Hsapiens",
  threshold = 0.85,
  show.neutral = FALSE,
  method = "default",
  calculate_pvals = TRUE,
  force_new = FALSE,
  background = c(A = 0.25, C = 0.25, G = 0.25, T = 0.25),
  granularity = NULL,
  nThread = 1,
  verbose = TRUE
)

Arguments

rsid_list

RSIDs of SNPs to test for motif disruption between the reference and alternative alleles..

results_dir

Directory where results should be saved as a file named: <results_dir>/_genome_wide/motifbreakR/motifbreakR_results.rds. If NULL, results will not be saved to disk.

pwmList

An object of class MotifList containing the motifs that you wish to interrogate

pwmList_max

Limit the maximum number of PWM datasets tested (e.g. 10). If NULL, no limit it set.

genome_build

Genome build to use.

organism

Only include datasets in the pwmList performed in a particular organism.

threshold

Numeric; the maximum p-value for a match to be called or a minimum score threshold

show.neutral

Logical; include neutral changes in the output

method

Character; one of default, log, ic, or notrans; see details.

calculate_pvals

Calculate p-values for all SNPs tested. WARNING: May take a long time if many SNPs and/or PWM are selected.

force_new

If results of the same name already exist, overwrite them with new analyses (TRUE). Otherwise, import the existing results and skip the analyses (default: FALSE).

background

Numeric Vector; the background probabilities of the nucleotides

granularity

Numeric Vector; the granularity to which to round the PWM, larger values compromise full accuracy for speed of calculation. A value of NULL does no rounding.

nThread

Number of threads to parallelize analyses across.

verbose

Print messages.

Value

Motif disruption predictions in

GRanges format.

See also

Examples

library(BSgenome) ## <-- IMPORTANT!
#> Loading required package: BiocGenerics
#> 
#> Attaching package: 'BiocGenerics'
#> The following objects are masked from 'package:stats':
#> 
#>     IQR, mad, sd, var, xtabs
#> The following objects are masked from 'package:base':
#> 
#>     Filter, Find, Map, Position, Reduce, anyDuplicated, aperm, append,
#>     as.data.frame, basename, cbind, colnames, dirname, do.call,
#>     duplicated, eval, evalq, get, grep, grepl, intersect, is.unsorted,
#>     lapply, mapply, match, mget, order, paste, pmax, pmax.int, pmin,
#>     pmin.int, rank, rbind, rownames, sapply, setdiff, sort, table,
#>     tapply, union, unique, unsplit, which.max, which.min
#> Loading required package: S4Vectors
#> Loading required package: stats4
#> 
#> Attaching package: 'S4Vectors'
#> The following objects are masked from 'package:base':
#> 
#>     I, expand.grid, unname
#> Loading required package: IRanges
#> Loading required package: GenomeInfoDb
#> Loading required package: GenomicRanges
#> Loading required package: Biostrings
#> Loading required package: XVector
#> 
#> Attaching package: 'Biostrings'
#> The following object is masked from 'package:base':
#> 
#>     strsplit
#> Loading required package: rtracklayer
#### Example fine-mapping results ####
merged_DT <- echodata::get_Nalls2019_merged()
#### Run motif analyses ####
mb_res <- MOTIFBREAKR(rsid_list = c("rs11175620"),
                      # limit the number of datasets tested 
                      # for demonstration purposes only
                      pwmList_max = 4,
                      calculate_pvals = FALSE)
#> Loading required namespace: motifbreakR
#> 
#> genome_build set to hg19 by default.
#> Loading required namespace: SNPlocs.Hsapiens.dbSNP144.GRCh37
#> Loading required namespace: BSgenome.Hsapiens.UCSC.hg19
#> Using genome_build hg19
#> + MOTIFBREAKR:: Converting SNP list into motifbreakR input format.
#> See system.file("LICENSE", package="MotifDb") for use restrictions.
#> Filtering pwmList to only include organism: Hsapiens
#> Only testing the first 4 datasets in pwmList.
#> + MOTIFBREAKR:: Identifying motifs and predicting disruptions.
#> reached end of SNPs list length = 1 with 1 potentially disruptive matches to 1 of 4 motifs.
#> + MOTIFBREAKR:: Saving results ==> /tmp/Rtmp3k34Cf/results/_genome_wide/motifbreakR/motifbreakR_results.rds