Age-related analysis

Dream | coef = age.

expression_dir = "~/ad-omics_hydra/microglia_omics/expression_tables/added_pilot_314s/expr_4brain_regions/"
work_plots = "~/pd-omics/katia/scripts/GitHub_scripts/glia_omics/3rd_pass_mic_255s/age_related/age_Dream_3rd/"
work_plots = "~/ad-omics/ricardo/tmp/dream/"
metadata_path = "~/pd-omics/katia/Microglia/mic_255s/metadata_files/"

load(paste0(expression_dir, "Expression_filt_255s.Rdata")) 
# dim(genes_counts_exp_3rd) # 19376   255

load(paste0(metadata_path, "metadata_255filt_eng_29jun2020.Rdata"))
# str(metadata3rd_pass)

Age distribution

255 samples

metadata <- metadata3rd_pass
ageByDonor = unique(metadata[,c("donor_id", "age", "sex")])

ggplot(ageByDonor, aes(x=age, fill=age)) +
  geom_histogram(bins = 25, colour='black', position = "stack", fill="#5c88da99") +
  labs(x="Age", y="Donors") +
  scale_y_continuous(breaks = (1:20)) +
  scale_x_continuous(breaks=seq(20,120,10)) + 
  theme_classic()

Dream analysis

To use cause_of_death in the model, I got the NAs and change it for “Other” category.

To use C1-C4 I calculated the median for the missing samples.

params = BiocParallel::MulticoreParam(workers=10, progressbar=T)
register(params)
registerDoParallel(10)

metadata$cause_of_death_categories[metadata$cause_of_death_categories %in% NA] <- "Other"
#table(metadata$cause_of_death_categories)

metadata$C1 = metadata$C1 %>% replace_na(median(metadata$C1, na.rm = T))
metadata$C2 = metadata$C2 %>% replace_na(median(metadata$C2, na.rm = T))
metadata$C3 = metadata$C3 %>% replace_na(median(metadata$C3, na.rm = T))
metadata$C4 = metadata$C4 %>% replace_na(median(metadata$C4, na.rm = T))

# Matching the names from the DE analysis to use the same code 
genes_counts4deg = genes_counts_exp_3rd
metadata4deg = metadata
# all(colnames(genes_counts_exp_3rd) == metadata$donor_tissue) # Check the order of columns - TRUE 

# The dream model operates directly on the results of voom. 
# The only change compared to the standard limma workflow is to replace lmFit with dream. 

# Check variance partition version 
# packageVersion("variancePartition")  # Must be 1.17.7


# The variable to be tested should be a fixed effect
form <- ~ sex + (1|donor_id) + age + tissue + (1|cause_of_death_categories) + C1 + C2 + C3 + C4 + picard_pct_mrna_bases + picard_summed_median + picard_pct_ribosomal_bases

# estimate weights using linear mixed model of dream
vobjDream = suppressWarnings( voomWithDreamWeights( genes_counts4deg, form, metadata4deg ) ) # supressing messages because of Biocparallel was generating a lot of messages  
 
# Fit the dream model on each gene
# By default, uses the Satterthwaite approximation for the hypothesis test
fitmm = suppressWarnings (dream( vobjDream, form, metadata4deg )) 

# Examine design matrix
# createDT(fitmm$design, 3)

res_age <- data.frame(topTable(fitmm, coef='age', 
                                 number=nrow(genes_counts4deg), sort.by = "p"), check.names = F)

Genes related with age

adj.P.Val<0.001

length(which(res_age$adj.P.Val<0.001))

[1] 181

adj.P.Val<0.01

length(which(res_age$adj.P.Val<0.01))

[1] 536

adj.P.Val<0.05

length(which(res_age$adj.P.Val<0.05))

[1] 1693

res = res_age
## Get conversion table for Gencode 30
gencode_30 = read.table("~/pd-omics/katia/ens.geneid.gencode.v30")
colnames(gencode_30) = c("ensembl","symbol")

res$ensembl = rownames(res)
res_name = merge(res, gencode_30, by="ensembl")
rownames(res_name) = res_name$ensembl
res_name = res_name[order(res_name$adj.P.Val), ]
res_name = res_name[, c("symbol", "logFC", "AveExpr", "t", "P.Value", "adj.P.Val")]
   
#deg_lists = res_name[which(res_name$adj.P.Val<0.001),]

write.table(res_name, file = paste0(work_plots, "List_age_dream_255s.txt"), quote = F, sep = "\t")
# write.table(deg_lists, file = paste0(work_plots, "List_age_0.001_dream.txt"), quote = F, sep = "\t")

createDT(res_name)

PCA voom

The voom here comes from the Dream object.

genes_voom = as.data.frame(vobjDream$E) 
res.pca = prcomp(t(genes_voom)) 

autoplot(res.pca, data = metadata4deg, colour = 'age') +
  scale_colour_viridis_c() +
  theme_classic()

I’m applying the removeBatchEffect function to get the residuals.

allResiduals <- removeBatchEffect(x = genes_voom, 
                                  batch = metadata4deg$tissue, 
                                  batch2 = paste(metadata4deg$sex, metadata4deg$cause_of_death_categories),
                                 # design = model.matrix(~ age, data = metadata4deg), #force to not regress age. It didn't change the PCA. 
                                  covariates = as.matrix(metadata4deg[, c("picard_pct_mrna_bases", "picard_summed_median", "picard_pct_ribosomal_bases", "C1","C2","C3","C4" )]))

PCA after removeBatchEffect

res.pca = prcomp(t(allResiduals)) 

autoplot(res.pca, data = metadata4deg, colour = 'age') +
  scale_colour_viridis_c() +
  theme_classic()

All samples (n = 255)

Ordered by age!

I’ve applied the function scale to transform the data into z-score.

age_genes = res_name[res$adj.P.Val<0.001, ]
age_genes = age_genes[order(age_genes$adj.P.Val) ,]
top_age_genes = age_genes[1:100 ,]

residuals_matrix = as.data.frame(allResiduals)
residuals_matrix$ensembl = rownames(residuals_matrix)
residuals_matrix = merge(residuals_matrix, gencode_30, by="ensembl")

#Remove duplicates 
residuals_matrix <- residuals_matrix[! duplicated(residuals_matrix$symbol),]
rownames(residuals_matrix) = residuals_matrix$symbol
residuals_matrix$ensembl = NULL
residuals_matrix$symbol = NULL
# dim(residuals_matrix) #19342   255

ourexpr_age_res = residuals_matrix[rownames(residuals_matrix) %in% top_age_genes$symbol , ] # RESIDUALS TABLE 
col_age = as.data.frame(metadata4deg[c("age")] ,)
rownames(col_age) = metadata4deg$donor_tissue
summary(col_age)

age

Min. : 21.00
1st Qu.: 63.50
Median : 77.00
Mean : 73.65
3rd Qu.: 88.00
Max. :103.00

ourexpr_age_zscore = t(scale(t(ourexpr_age_res))) # The scale is by column. I want the scale by row so I need to transpose the matrix. Then, I transposed again for the plot! 
ourexpr_age_zscore = as.data.frame(ourexpr_age_zscore)

all(colnames(ourexpr_age_zscore) == metadata4deg$Donor_tissue) # Check the order of columns - TRUE

[1] TRUE

my.breaks <- seq(-3, 3, length.out = 100)

ourexpr_age_zscore_sorted = ourexpr_age_zscore[, order(col_age)]

# Function to cluster the columns beautifully! 
# callback = function(hc, mat){
#     sv = svd(t(mat))$v[,1]
#     dend = reorder(as.dendrogram(hc), wts = sv)
#     as.hclust(dend)
# }

pheatmap::pheatmap(ourexpr_age_zscore_sorted, 
                   annotation_col = col_age, 
                   show_colnames = F,
                   breaks = my.breaks,
                   cluster_cols = F,
                   #clustering_callback = callback,
                   clustering_method = "ward.D2")

sessionInfo()

R version 3.6.2 (2019-12-12) Platform: x86_64-pc-linux-gnu (64-bit) Running under: CentOS Linux 7 (Core)

Matrix products: default BLAS/LAPACK: /hpc/packages/minerva-centos7/intel/parallel_studio_xe_2019/compilers_and_libraries_2019.0.117/linux/mkl/lib/intel64_lin/libmkl_gf_lp64.so

locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages: [1] parallel grid stats graphics grDevices utils datasets [8] methods base

other attached packages: [1] tidyr_1.1.0 dplyr_0.8.5 doParallel_1.0.15
[4] iterators_1.0.12 ggfortify_0.4.9 gridExtra_2.3
[7] RColorBrewer_1.1-2 kableExtra_1.1.0 reshape2_1.4.3
[10] pheatmap_1.0.12 readxl_1.3.1 ggpubr_0.2.5
[13] magrittr_1.5 gplots_3.0.3 statmod_1.4.34
[16] edgeR_3.28.1 BiocParallel_1.20.1 variancePartition_1.17.7 [19] Biobase_2.46.0 BiocGenerics_0.32.0 scales_1.1.0
[22] foreach_1.4.8 limma_3.42.2 factoextra_1.0.6
[25] ggplot2_3.3.0 ggsci_2.9 ggeasy_0.1.1

loaded via a namespace (and not attached): [1] nlme_3.1-145 bitops_1.0-6 pbkrtest_0.4-8.6
[4] webshot_0.5.2 progress_1.2.2 httr_1.4.1
[7] numDeriv_2016.8-1.1 tools_3.6.2 DT_0.12
[10] R6_2.4.1 KernSmooth_2.23-16 colorspace_1.4-1
[13] withr_2.1.2 tidyselect_1.1.0 prettyunits_1.1.1
[16] compiler_3.6.2 rvest_0.3.5 xml2_1.2.5
[19] labeling_0.3 caTools_1.18.0 readr_1.3.1
[22] stringr_1.4.0 digest_0.6.25 minqa_1.2.4
[25] rmarkdown_2.1 colorRamps_2.3 pkgconfig_2.0.3
[28] htmltools_0.4.0 lme4_1.1-21 fastmap_1.0.1
[31] htmlwidgets_1.5.1 rlang_0.4.6 rstudioapi_0.11
[34] shiny_1.4.0 farver_2.0.3 jsonlite_1.6.1
[37] crosstalk_1.0.0 gtools_3.8.1 Matrix_1.2-18
[40] Rcpp_1.0.3 munsell_0.5.0 lifecycle_0.2.0
[43] stringi_1.4.6 yaml_2.2.1 MASS_7.3-51.5
[46] plyr_1.8.6 promises_1.1.0 gdata_2.18.0
[49] ggrepel_0.8.2 crayon_1.3.4 lattice_0.20-40
[52] splines_3.6.2 hms_0.5.3 locfit_1.5-9.1
[55] knitr_1.28 pillar_1.4.3 boot_1.3-24
[58] ggsignif_0.6.0 codetools_0.2-16 glue_1.3.1
[61] evaluate_0.14 BiocManager_1.30.10 httpuv_1.5.2
[64] vctrs_0.3.1 nloptr_1.2.2.1 cellranger_1.1.0
[67] gtable_0.3.0 purrr_0.3.3 assertthat_0.2.1
[70] xfun_0.12 mime_0.9 xtable_1.8-4
[73] later_1.0.0 viridisLite_0.3.0 tibble_3.0.1
[76] lmerTest_3.1-1 ellipsis_0.3.0

Age-related analysis

Using Dream

Katia de Paiva Lopes
Raj Lab
Department of Neuroscience
Icahn School of Medicine at Mount Sinai
NYC, New York

2020-06-30

Age distribution

255 samples

Dream analysis

PCA voom

PCA after removeBatchEffect

All samples (n = 255)

Age-related analysis

Using Dream

Katia de Paiva Lopes Raj Lab Department of Neuroscience Icahn School of Medicine at Mount Sinai NYC, New York

2020-06-30

Age distribution

255 samples

Dream analysis

Genes related with age

adj.P.Val<0.001

adj.P.Val<0.01

adj.P.Val<0.05

PCA voom

PCA after removeBatchEffect

All samples (n = 255)

Katia de Paiva Lopes
Raj Lab
Department of Neuroscience
Icahn School of Medicine at Mount Sinai
NYC, New York