knitr::opts_chunk$set(warning = FALSE, echo = FALSE, message = FALSE)
library(tidyverse)
## ── Attaching packages ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── tidyverse 1.3.0 ──
## ✓ ggplot2 3.3.2     ✓ purrr   0.3.4
## ✓ tibble  3.0.3     ✓ dplyr   1.0.0
## ✓ tidyr   1.1.0     ✓ stringr 1.4.0
## ✓ readr   1.3.1     ✓ forcats 0.5.0
## ── Conflicts ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
library(ggrepel)
library(patchwork)

Read in COLOC eQTL results

create separate disease plots for Daner and Stahl GWAS results.

## 
##        Daner_2020        IMSGC_2019       Jansen_2018       Kunkle_2019      Lambert_2013      Marioni_2018 Nalls23andMe_2019      Nicolas_2018        Ripke_2014        Stahl_2019         Wray_2018 
##             64907            153063             34012             20819             19554             26030             69916             11430             84954             31251             19994
## 
##                DLPFC_ROSMAP       Microglia_all_regions  Microglia_all_regions_sQTL Microglia_all_regions_Young               Microglia_MFG          Microglia_sQTL_MFG          Microglia_sQTL_STG          Microglia_sQTL_SVZ          Microglia_sQTL_THA               Microglia_STG 
##                       13262                       20400                       42362                       15422                       20400                       53148                       52378                       57880                       53588                       20400 
##               Microglia_SVZ               Microglia_THA             Microglia_Young           Monocytes_Fairfax              Monocytes_MyND         Monocytes_sQTL_MyND 
##                       20400                       20400                       27692                       19327                       21113                       77758
## # A tibble: 11 x 2
##    GWAS                  n
##    <chr>             <int>
##  1 Daner_2020           64
##  2 IMSGC_2019          137
##  3 Jansen_2018          29
##  4 Kunkle_2019          22
##  5 Lambert_2013         19
##  6 Marioni_2018         21
##  7 Nalls23andMe_2019    71
##  8 Nicolas_2018         10
##  9 Ripke_2014          104
## 10 Stahl_2019           29
## 11 Wray_2018            41
## # A tibble: 7 x 2
##   disease     n
##   <chr>   <int>
## 1 AD         37
## 2 ALS        10
## 3 BPD        93
## 4 MDD        41
## 5 MS        137
## 6 PD         71
## 7 SCZ       104

Assess Lead SNP distance and LD

There are some potentially spurious COLOCs arising when the lead GWAS SNP is far too far from the lead QTL SNP for there to be a single plausible variant.

Use ensembl biomart to get SNPs positions of all GWAS and QTL lead SNPs which colocalise at 0.5 or greater.

In addition I have pairwise 1000 Genomes LD for the GWAS SNP and lead QTL SNP. How often is the LD predictive of the distance and vice versa?

eQTLs - tests all SNPs 1MB either side of the TSS for association for gene expression sQTLs - tests all SNPs within 100kb either side of the centre of the intron cluster for association with intron usage.

Therefore is it fair to put in separate SNP distance thresholds for e and s QTLs?

##       
##        FALSE TRUE
##   eQTL   706 1302
##   sQTL  1631 3135
##       
##         FALSE   TRUE
##   eQTL  15547 183269
##   sQTL  44269 292845

Filtering on SNP-SNP distance or LD retains almost all AD and PD colocalisations but removes up to 90% of MS, BPD and SCZ colocalisations.

This holds true for both e and s QTLs, despite the larger number of sQTLs colocalising in the non-neurodegenerative diseases.

I now filter colocalisations so that:

Summarising COLOC results

I was previously using the SNP-level COLOC P4 to assess whether colocalisation has occured.

Instead I should be using the summary-level COLOC H4. Once I filter on that, then I should apply the SNP-distance and LD filters using the top SNPs for QTL and GWAS for each locus-gene.

MASHR meta-analysis compared to METASOFT

Summary counts

For each disease GWAS - or union of GWAS, of the total number of loci, how many have at least 1 COLOC with H4 > 0.5, H4 > 0.8 and H4 > 0.9?

disease QTL n_loci h4_0.5 h4_0.8 h4_0.9
AD DLPFC_ROSMAP 37 5 1 5
AD Monocytes_Fairfax 37 6 1 4
AD Monocytes_sQTL_MyND 37 7 1 6
AD Monocytes_MyND 37 4 3 3
AD Microglia_Young 37 2 2 3
AD Microglia_all_regions_sQTL 37 2 0 3
AD Microglia_all_regions 37 10 0 4
PD DLPFC_ROSMAP 71 8 5 9
PD Monocytes_Fairfax 71 9 2 8
PD Monocytes_sQTL_MyND 71 9 1 8
PD Monocytes_MyND 71 7 2 8
PD Microglia_Young 71 11 2 0
PD Microglia_all_regions_sQTL 71 6 0 5
PD Microglia_all_regions 71 6 3 4
BPD DLPFC_ROSMAP 29 0 1 4
BPD Monocytes_Fairfax 29 3 1 1
BPD Monocytes_sQTL_MyND 29 0 2 3
BPD Monocytes_MyND 29 1 0 2
BPD Microglia_Young 29 2 0 0
BPD Microglia_all_regions_sQTL 29 2 1 3
BPD Microglia_all_regions 29 0 0 0
SCZ DLPFC_ROSMAP 104 10 2 8
SCZ Monocytes_Fairfax 104 9 1 4
SCZ Monocytes_sQTL_MyND 104 10 1 11
SCZ Monocytes_MyND 104 8 1 2
SCZ Microglia_Young 104 6 0 1
SCZ Microglia_all_regions_sQTL 104 5 1 4
SCZ Microglia_all_regions 104 5 0 3
MS DLPFC_ROSMAP 137 6 3 1
MS Monocytes_Fairfax 137 16 0 3
MS Monocytes_sQTL_MyND 137 16 5 8
MS Monocytes_MyND 137 6 1 0
MS Microglia_Young 137 6 3 1
MS Microglia_all_regions_sQTL 137 13 1 0
MS Microglia_all_regions 137 9 2 1

Sharing of colocalised genes

Plot the genes that are shared between pairs of QTL datasets across all 5 diseases.

Upset plots of Colocalised QTLs (PP4 > 0.5) between Monocytes, Microglia and DLPFC

Jaccard distances between each QTL set - take the intersection of loci at a threshold divided by the union.

Compare eQTL effect sizes

between Young and our Microglia between MyND monocytes and our microglia

etc…

Disease plots

Plot each locus-gene COLOC H4

Alzheimer’s Disease

AD - focus on microglia

AD - Compare loci between the 4 GWAS

Plot some well known AD loci shared between GWAS - do the same Colocalisations occur?

Microglia regions

Parkinson’s

Microglia, Monocytes and DLPFC

PD - just H4 > 0.9

Microglia regions

Schizophrenia

Microglia, Monocytes and DLPFC

Microglia-focused

Microglia regions

Just 0.7 in microglia

CMC TWAS Genes from Gusev et al

Bipolar Disorder 1 - Stahl 2019

Microglia, Monocytes and DLPFC

Microglia regions

Bipolar 2 - Daner 2020

Microglia regions

Bipolar - microglia-focussed

Multiple Sclerosis

ALS

Microglia regions