knitr::opts_chunk$set(warning = FALSE, echo = FALSE, message = FALSE)
library(tidyverse)
## ── Attaching packages ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── tidyverse 1.3.0 ──
## ✓ ggplot2 3.3.2 ✓ purrr 0.3.4
## ✓ tibble 3.0.3 ✓ dplyr 1.0.0
## ✓ tidyr 1.1.0 ✓ stringr 1.4.0
## ✓ readr 1.3.1 ✓ forcats 0.5.0
## ── Conflicts ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
library(ggrepel)
library(patchwork)
Read in COLOC eQTL results
create separate disease plots for Daner and Stahl GWAS results.
##
## Daner_2020 IMSGC_2019 Jansen_2018 Kunkle_2019 Lambert_2013 Marioni_2018 Nalls23andMe_2019 Nicolas_2018 Ripke_2014 Stahl_2019 Wray_2018
## 64907 153063 34012 20819 19554 26030 69916 11430 84954 31251 19994
##
## DLPFC_ROSMAP Microglia_all_regions Microglia_all_regions_sQTL Microglia_all_regions_Young Microglia_MFG Microglia_sQTL_MFG Microglia_sQTL_STG Microglia_sQTL_SVZ Microglia_sQTL_THA Microglia_STG
## 13262 20400 42362 15422 20400 53148 52378 57880 53588 20400
## Microglia_SVZ Microglia_THA Microglia_Young Monocytes_Fairfax Monocytes_MyND Monocytes_sQTL_MyND
## 20400 20400 27692 19327 21113 77758
## # A tibble: 11 x 2
## GWAS n
## <chr> <int>
## 1 Daner_2020 64
## 2 IMSGC_2019 137
## 3 Jansen_2018 29
## 4 Kunkle_2019 22
## 5 Lambert_2013 19
## 6 Marioni_2018 21
## 7 Nalls23andMe_2019 71
## 8 Nicolas_2018 10
## 9 Ripke_2014 104
## 10 Stahl_2019 29
## 11 Wray_2018 41
## # A tibble: 7 x 2
## disease n
## <chr> <int>
## 1 AD 37
## 2 ALS 10
## 3 BPD 93
## 4 MDD 41
## 5 MS 137
## 6 PD 71
## 7 SCZ 104
There are some potentially spurious COLOCs arising when the lead GWAS SNP is far too far from the lead QTL SNP for there to be a single plausible variant.
Use ensembl biomart to get SNPs positions of all GWAS and QTL lead SNPs which colocalise at 0.5 or greater.
In addition I have pairwise 1000 Genomes LD for the GWAS SNP and lead QTL SNP. How often is the LD predictive of the distance and vice versa?
eQTLs - tests all SNPs 1MB either side of the TSS for association for gene expression sQTLs - tests all SNPs within 100kb either side of the centre of the intron cluster for association with intron usage.
Therefore is it fair to put in separate SNP distance thresholds for e and s QTLs?
##
## FALSE TRUE
## eQTL 706 1302
## sQTL 1631 3135
##
## FALSE TRUE
## eQTL 15547 183269
## sQTL 44269 292845
Filtering on SNP-SNP distance or LD retains almost all AD and PD colocalisations but removes up to 90% of MS, BPD and SCZ colocalisations.
This holds true for both e and s QTLs, despite the larger number of sQTLs colocalising in the non-neurodegenerative diseases.
I now filter colocalisations so that:
I was previously using the SNP-level COLOC P4 to assess whether colocalisation has occured.
Instead I should be using the summary-level COLOC H4. Once I filter on that, then I should apply the SNP-distance and LD filters using the top SNPs for QTL and GWAS for each locus-gene.
MASHR meta-analysis compared to METASOFT
For each disease GWAS - or union of GWAS, of the total number of loci, how many have at least 1 COLOC with H4 > 0.5, H4 > 0.8 and H4 > 0.9?
disease | QTL | n_loci | h4_0.5 | h4_0.8 | h4_0.9 |
---|---|---|---|---|---|
AD | DLPFC_ROSMAP | 37 | 5 | 1 | 5 |
AD | Monocytes_Fairfax | 37 | 6 | 1 | 4 |
AD | Monocytes_sQTL_MyND | 37 | 7 | 1 | 6 |
AD | Monocytes_MyND | 37 | 4 | 3 | 3 |
AD | Microglia_Young | 37 | 2 | 2 | 3 |
AD | Microglia_all_regions_sQTL | 37 | 2 | 0 | 3 |
AD | Microglia_all_regions | 37 | 10 | 0 | 4 |
PD | DLPFC_ROSMAP | 71 | 8 | 5 | 9 |
PD | Monocytes_Fairfax | 71 | 9 | 2 | 8 |
PD | Monocytes_sQTL_MyND | 71 | 9 | 1 | 8 |
PD | Monocytes_MyND | 71 | 7 | 2 | 8 |
PD | Microglia_Young | 71 | 11 | 2 | 0 |
PD | Microglia_all_regions_sQTL | 71 | 6 | 0 | 5 |
PD | Microglia_all_regions | 71 | 6 | 3 | 4 |
BPD | DLPFC_ROSMAP | 29 | 0 | 1 | 4 |
BPD | Monocytes_Fairfax | 29 | 3 | 1 | 1 |
BPD | Monocytes_sQTL_MyND | 29 | 0 | 2 | 3 |
BPD | Monocytes_MyND | 29 | 1 | 0 | 2 |
BPD | Microglia_Young | 29 | 2 | 0 | 0 |
BPD | Microglia_all_regions_sQTL | 29 | 2 | 1 | 3 |
BPD | Microglia_all_regions | 29 | 0 | 0 | 0 |
SCZ | DLPFC_ROSMAP | 104 | 10 | 2 | 8 |
SCZ | Monocytes_Fairfax | 104 | 9 | 1 | 4 |
SCZ | Monocytes_sQTL_MyND | 104 | 10 | 1 | 11 |
SCZ | Monocytes_MyND | 104 | 8 | 1 | 2 |
SCZ | Microglia_Young | 104 | 6 | 0 | 1 |
SCZ | Microglia_all_regions_sQTL | 104 | 5 | 1 | 4 |
SCZ | Microglia_all_regions | 104 | 5 | 0 | 3 |
MS | DLPFC_ROSMAP | 137 | 6 | 3 | 1 |
MS | Monocytes_Fairfax | 137 | 16 | 0 | 3 |
MS | Monocytes_sQTL_MyND | 137 | 16 | 5 | 8 |
MS | Monocytes_MyND | 137 | 6 | 1 | 0 |
MS | Microglia_Young | 137 | 6 | 3 | 1 |
MS | Microglia_all_regions_sQTL | 137 | 13 | 1 | 0 |
MS | Microglia_all_regions | 137 | 9 | 2 | 1 |
between Young and our Microglia between MyND monocytes and our microglia
etc…
Plot each locus-gene COLOC H4
Plot some well known AD loci shared between GWAS - do the same Colocalisations occur?
Microglia regions