The resulting topSNPs data.frame can be used to guide the finemap_loci in querying and fine-mapping loci.
import_topSNPs(
topSS,
sheet = 1,
startRow = 1,
cols = NULL,
munge = TRUE,
colmap = construct_colmap(),
min_POS = NULL,
max_POS = NULL,
grouping_vars = c("Locus"),
remove_variants = NULL,
show_table = FALSE,
verbose = TRUE
)
Can be a data.frame with the top summary stats per locus. Alternatively, you can provide a path to the stored top summary stats file. Can be in any tabular format (e.g. excel, .tsv, .csv, etc.). This file should have one lead GWAS/QTL hits per locus. If there is more than one SNP per locus, the one with the smallest p-value (then the largest effect size) is selected as the lead SNP. The lead SNP will be used as the center of the locus when constructing the locus subset files.
If the topSS file is an excel sheet, you can specify which tab to use. You can provide either a number to identify the tab by order, or a string to identify the tab by name.
first row to begin looking for data. Empty rows at the top of a file are always skipped, regardless of the value of startRow.
A numeric vector specifying which columns in the Excel file to read. If NULL, all columns are read.
Standardise column names.
Column mappings object. Uses construct_colmap by default.
Column containing minimum genomic position (used instead of an arbitrary window size).
Column containing maximum genomic position (used instead of an arbitrary window size).
The variables that you want to group by
such that each grouping_var combination has its own index SNP.
For example, if you want one index SNP per QTL eGene -
GWAS locus pair, you could supply:
grouping_vars=c("Locus","Gene")
.
SNPs to remove from topSS
,
Create an interative data table.
Print messages.
Column containing unique locus name.
Munged topSNPs table.
topSNPs <- echodata::import_topSNPs(
topSS = echodata::topSNPs_Nalls2019_raw,
colmap = construct_colmap(P = "P, all studies",
Effect = "Beta, all studies",
Locus = "Nearest Gene",
Gene = "QTL Nominated Gene (nearest QTL)"
),
grouping_vars = "Locus Number")
#> Renaming column: P, all studies ==> P
#> Renaming column: Beta, all studies ==> Effect
#> Renaming column: Nearest Gene ==> Locus
#> Renaming column: QTL Nominated Gene (nearest QTL) ==> Gene
#> [1] "+ Assigning Gene and Locus independently."
#> Standardising column headers.
#> First line of summary statistics file:
#> SNP CHR BP Locus Gene Effect allele Other allele Effect allele frequency Effect SE, all studies P P, COJO, all studies P, random effects, all studies P, Conditional 23AndMe only P, 23AndMe only I2, all studies Freq1, previous studies Beta, previous studies StdErr, previous studies P, previous studies I2, previous studies Freq1, new studies Beta, new studies StdErr, new studies P, new studies I2, new studies Passes pooled 23andMe QC Known GWAS locus within 1MB Failed final filtering and QC Locus within 250KB Locus Number
#> Returning unmapped column names without making them uppercase.
#> + Mapping colnames from MungeSumstats ==> echolocatoR