255 samples (from 314) | Different brain regions from the same donor | Multi disease cohort.
Previous R script: 01_expression_metadata_4reg.R by Katia Lopes.
work_dir = "~/ad-omics/ricardo/tmp/3rd_pass_mic_258s/"
expression_dir = "~/ad-omics_hydra/microglia_omics/expression_tables/added_pilot_314s/expr_4brain_regions/"
metadata_path = "~/pd-omics/katia/Microglia/mic_255s/metadata_files/"
This metadata contains biological and technical covariates.
“Evaluating the correlation between variables in a important part in interpreting variancePartition results. When comparing two continuous variables, Pearson correlation is widely used. But variancePartition includes categorical variables in the model as well. In order to accommodate the correlation between a continuous and a categorical variable, or two categorical variables we used canonical correlation analysis. Canonical Correlation Analysis (CCA) is similar to correlation between two vectors, except that CCA can accommodate matricies as well.” Gabriel Hoffman.
# These variables have zero variance so cannot be analyzed:
# infection_2weeks, glucocorticosteroids, batch
metadata_alt1 = metadata %>% dplyr::select(-glucocorticosteroids) # These variables have zero variance so cannot be analyzed: glucocorticosteroids
metadata_alt2 = metadata_alt1 %>% dplyr::select(-infection_2weeks) # These variables have zero variance so cannot be analyzed: infection_2weeks
metadata_alt = metadata_alt2 %>% dplyr::select(-batch) # These variables have zero variance so cannot be analyzed: batch
# names(metadata_alt) = tolower(names(metadata_alt))
# Transforms characters columns in factors columns
indx <- sapply(metadata_alt, is.character)
metadata_alt[indx] <- lapply(metadata_alt[indx], function(x) as.factor(x))
metadata_alt$ph = as.numeric(as.character(metadata_alt$ph))
#str(metadata_alt) #shows class for all columns
form <- as.formula(paste("", paste(colnames(metadata_alt), collapse=" + "), sep=" ~ ")) #form contains the design
C = canCorPairs(form, metadata_alt)
# pdf(paste0(work_dir, "figures_pdf/canonical_hm.pdf"), width = 10, height = 8)
plotCorrMatrix(C, margins = c(12,18))
# dev.off()
### Number of tissues by donor
tissuesBydonor = rowSums(table(metadata_alt[,c("donor_id","tissue")]))
createDT(as.data.frame(tissuesBydonor))
[1] 26
[1] “13-080” “14-005” “14-015” “14-029” “14-069” “14-075” “15-018” “15-024” [9] “15-055” “15-075” “15-087” “15-089” “15-093” “15-107” “16-003” “16-024” [17] “16-027” “16-028” “16-033” “16-046” “16-049” “16-056” “16-062” “16-065” [25] “16-067” “16-078” “16-080” “16-110” “16-111” “16-112” “16-116” “16-117” [33] “16-118” “16-137” “17-003” “17-005” “17-009” “17-012” “17-013” “17-015” [41] “17-016” “17-017” “17-029” “17-032” “17-043” “17-074” “17-078” “17-092” [49] “17-094” “17-097” “17-099” “17-102” “17-121” “17-124” “17-128” “17-136” [57] “17-148” “18-010” “18-012” “18-018” “18-021” “18-023” “18-039” “18-063” [65] “18-064” “18-074” “18-079” “18-105” “18-112” “MG-06” “MG-08” “MG-10” [73] “MG-11” “MG-13” ## Input expression
The covariates for the formula was chosen based on canonical correlation. We need to be careful to interpret this plot because here, we are using all samples together: same individual with different brain regions.
If the covariate have NAs in the column, we can’t fit a model for Limma or Dream. The following covariates have NAs: lane, benzodiapezines, opiates, autoimmune_diseases, smoking, infection_2weeks, alcohol_dependence_daily_use, ph, C1-C4 from genotyping, suicide_attempts and cause_of_death_categories.
library(doParallel)
cl <- makeCluster(10)
registerDoParallel(cl)
# For categorical covariates you need to include the number 1. Example: (1|Status) + (1|Plate)
metadata_alt$fastqc_total_sequences = scale(metadata_alt$fastqc_total_sequences)
metadata_alt$star_uniquely_mapped = scale(metadata_alt$star_uniquely_mapped)
metadata_alt$featurecounts_assigned = scale(metadata_alt$featurecounts_assigned)
# form <- ~ (1|donor_id) + age + (1|tissue) + (1|lane) + (1|main_diagnosis) + (1|sex) + pmd_minutes + picard_pct_mrna_bases + picard_summed_mean + picard_pct_pf_reads_aligned + ph + picard_pct_ribosomal_bases + (1|cause_of_death_categories) + C1 + C2 + C3 + C4 + (1|suicide_attempts) + fastqc_total_sequences + (1|opiates) + (1|autoimmune_diseases) + star_uniquely_mapped + fastqc_percent_fails + featurecounts_assigned
form <- ~ (1|donor_id) + age + (1|tissue) + (1|lane) + (1|main_diagnosis) + (1|sex) + pmd_minutes + picard_pct_mrna_bases + picard_summed_mean + picard_pct_pf_reads_aligned + ph + picard_pct_ribosomal_bases + (1|cause_of_death_categories) + C1 + C2 + C3 + C4
varPart_tx <- suppressWarnings( fitExtractVarPartModel( genes_counts_voom_3rd, form, metadata_alt) )
vp <- sortCols( varPart_tx )
# plotPercentBars( vp[1:10,] ) Not applicable for dataset with different tissues from the same individual
save(varPart_tx, file = paste0(work_dir, "/vp_files/varPart_tx_255s.RData"))
plotVarPart(vp) +
ggeasy::easy_rotate_x_labels(angle = 70, side = c("right"))
We have tried to run including +0 in the formula (Vignette, page 19) but it didn’t works because we have some donors with only one tissue. So, we can’t fit a model for that. However, we can try to run separately by tissue.
metadata_GFM = metadata_alt[metadata_alt$tissue=="MFG",]
voom_GFM = genes_counts_voom_3rd[ , as.character(metadata_GFM$donor_tissue)]
metadata_GTS = metadata_alt[metadata_alt$tissue=="STG",]
voom_GTS = genes_counts_voom_3rd[ , as.character(metadata_GTS$donor_tissue)]
metadata_SVZ = metadata_alt[metadata_alt$tissue=="SVZ",]
voom_SVZ = genes_counts_voom_3rd[ , as.character(metadata_SVZ$donor_tissue)]
metadata_THA = metadata_alt[metadata_alt$tissue=="THA",]
voom_THA = genes_counts_voom_3rd[ , as.character(metadata_THA$donor_tissue)]
save(metadata_alt, metadata_GFM, metadata_GTS, metadata_SVZ, metadata_THA,
voom_GFM, voom_GTS, voom_SVZ, voom_THA, file = paste0(work_dir, "metadata_exprBytissue_255s.Rdata"))
varPart_tx <- suppressWarnings( fitExtractVarPartModel( voom_GFM, form, metadata_GFM) )
vp <- sortCols( varPart_tx )
save(varPart_tx, file = paste0(work_dir, "vp_files/varPart_tx_MFG.RData"))
plotVarPart(vp)+
ggeasy::easy_rotate_x_labels(angle = 70, side = c("right"))
varPart_tx <- suppressWarnings( fitExtractVarPartModel( voom_GTS, form, metadata_GTS) )
vp <- sortCols( varPart_tx )
save(varPart_tx, file = paste0(work_dir, "vp_files/varPart_tx_STG.RData"))
plotVarPart(vp)+
ggeasy::easy_rotate_x_labels(angle = 70, side = c("right"))
eur_samples = read.table("~/pd-omics/katia/Microglia/mic_255s/samples_eqtl_90d.txt", header = F, stringsAsFactors = T)
colnames(eur_samples) = c("eur_samples")
metadata_alt_eur = metadata_alt[metadata_alt$donor_id %in% eur_samples$eur_samples ,]
genes_counts_voom_3rd_eur = genes_counts_voom_3rd[, colnames(genes_counts_voom_3rd) %in% metadata_alt_eur$donor_tissue]
form <- ~ (1|donor_id) + age + (1|tissue) + (1|lane) + (1|main_diagnosis) + (1|sex) + pmd_minutes + picard_pct_mrna_bases + picard_summed_mean + picard_pct_pf_reads_aligned + ph + picard_pct_ribosomal_bases + (1|cause_of_death_categories) + C1 + C2 + C3 + C4
varPart_tx <- suppressWarnings( fitExtractVarPartModel( genes_counts_voom_3rd_eur, form, metadata_alt_eur) )
vp <- sortCols( varPart_tx )
# plotPercentBars( vp[1:10,] ) Not applicable for dataset with different tissues from the same individual
save(varPart_tx, file = paste0(work_dir, "/vp_files/varPart_tx_eur_216s.RData"))
plotVarPart(vp) +
ggeasy::easy_rotate_x_labels(angle = 70, side = c("right"))
R version 3.6.2 (2019-12-12) Platform: x86_64-pc-linux-gnu (64-bit) Running under: CentOS Linux 7 (Core)
Matrix products: default BLAS/LAPACK: /hpc/packages/minerva-centos7/intel/parallel_studio_xe_2019/compilers_and_libraries_2019.0.117/linux/mkl/lib/intel64_lin/libmkl_gf_lp64.so
locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages: [1] stats4 parallel stats graphics grDevices utils datasets [8] methods base
other attached packages: [1] doParallel_1.0.15 iterators_1.0.12
[3] forcats_0.5.0 stringr_1.4.0
[5] dplyr_0.8.5 purrr_0.3.3
[7] readr_1.3.1 tidyr_1.1.0
[9] tibble_3.0.1 tidyverse_1.3.0
[11] broom_0.5.5 edgeR_3.28.1
[13] DESeq2_1.26.0 SummarizedExperiment_1.16.1 [15] DelayedArray_0.12.2 BiocParallel_1.20.1
[17] matrixStats_0.55.0 GenomicRanges_1.38.0
[19] GenomeInfoDb_1.22.0 IRanges_2.20.2
[21] S4Vectors_0.24.3 pamr_1.56.1
[23] survival_3.1-11 cluster_2.1.0
[25] factoextra_1.0.6 gplots_3.0.3
[27] ggfortify_0.4.9 variancePartition_1.17.7
[29] Biobase_2.46.0 BiocGenerics_0.32.0
[31] scales_1.1.0 foreach_1.4.8
[33] limma_3.42.2 ggplot2_3.3.0
loaded via a namespace (and not attached): [1] readxl_1.3.1 backports_1.1.5 Hmisc_4.3-1
[4] plyr_1.8.6 splines_3.6.2 digest_0.6.25
[7] htmltools_0.4.0 gdata_2.18.0 fansi_0.4.1
[10] magrittr_1.5 checkmate_2.0.0 memoise_1.1.0
[13] annotate_1.64.0 modelr_0.1.6 prettyunits_1.1.1
[16] jpeg_0.1-8.1 colorspace_1.4-1 blob_1.2.1
[19] rvest_0.3.5 ggrepel_0.8.2 haven_2.2.0
[22] xfun_0.12 crayon_1.3.4 RCurl_1.98-1.1
[25] jsonlite_1.6.1 genefilter_1.68.0 lme4_1.1-21
[28] glue_1.3.1 gtable_0.3.0 zlibbioc_1.32.0
[31] XVector_0.26.0 DBI_1.1.0 Rcpp_1.0.3
[34] xtable_1.8-4 progress_1.2.2 htmlTable_1.13.3
[37] foreign_0.8-76 bit_1.1-15.2 Formula_1.2-3
[40] htmlwidgets_1.5.1 httr_1.4.1 RColorBrewer_1.1-2
[43] acepack_1.4.1 ellipsis_0.3.0 farver_2.0.3
[46] pkgconfig_2.0.3 XML_3.99-0.3 nnet_7.3-13
[49] dbplyr_1.4.2 locfit_1.5-9.1 labeling_0.3
[52] tidyselect_1.1.0 rlang_0.4.6 reshape2_1.4.3
[55] AnnotationDbi_1.48.0 munsell_0.5.0 cellranger_1.1.0
[58] tools_3.6.2 cli_2.0.2 generics_0.0.2
[61] RSQLite_2.2.0 evaluate_0.14 yaml_2.2.1
[64] knitr_1.28 bit64_0.9-7 fs_1.3.2
[67] caTools_1.18.0 nlme_3.1-145 xml2_1.2.5
[70] compiler_3.6.2 pbkrtest_0.4-8.6 rstudioapi_0.11
[73] png_0.1-7 reprex_0.3.0 geneplotter_1.64.0
[76] stringi_1.4.6 lattice_0.20-40 Matrix_1.2-18
[79] nloptr_1.2.2.1 vctrs_0.3.1 pillar_1.4.3
[82] lifecycle_0.2.0 data.table_1.12.8 bitops_1.0-6
[85] colorRamps_2.3 R6_2.4.1 latticeExtra_0.6-29
[88] KernSmooth_2.23-16 gridExtra_2.3 codetools_0.2-16
[91] boot_1.3-24 MASS_7.3-51.5 gtools_3.8.1
[94] assertthat_0.2.1 withr_2.1.2 GenomeInfoDbData_1.2.2 [97] hms_0.5.3 grid_3.6.2 rpart_4.1-15
[100] minqa_1.2.4 rmarkdown_2.1 lubridate_1.7.9
[103] base64enc_0.1-3 ggeasy_0.1.1