The resulting topSNPs data.frame can be used to guide the finemap_loci in querying and fine-mapping loci.

import_topSNPs(
  topSS,
  sheet = 1,
  startRow = 1,
  cols = NULL,
  munge = TRUE,
  colmap = construct_colmap(),
  min_POS = NULL,
  max_POS = NULL,
  grouping_vars = c("Locus"),
  remove_variants = NULL,
  show_table = FALSE,
  verbose = TRUE
)

Arguments

topSS

Can be a data.frame with the top summary stats per locus. Alternatively, you can provide a path to the stored top summary stats file. Can be in any tabular format (e.g. excel, .tsv, .csv, etc.). This file should have one lead GWAS/QTL hits per locus. If there is more than one SNP per locus, the one with the smallest p-value (then the largest effect size) is selected as the lead SNP. The lead SNP will be used as the center of the locus when constructing the locus subset files.

sheet

If the topSS file is an excel sheet, you can specify which tab to use. You can provide either a number to identify the tab by order, or a string to identify the tab by name.

startRow

first row to begin looking for data. Empty rows at the top of a file are always skipped, regardless of the value of startRow.

cols

A numeric vector specifying which columns in the Excel file to read. If NULL, all columns are read.

munge

Standardise column names.

colmap

Column mappings object. Uses construct_colmap by default.

min_POS

Column containing minimum genomic position (used instead of an arbitrary window size).

max_POS

Column containing maximum genomic position (used instead of an arbitrary window size).

grouping_vars

The variables that you want to group by such that each grouping_var combination has its own index SNP. For example, if you want one index SNP per QTL eGene - GWAS locus pair, you could supply: grouping_vars=c("Locus","Gene").

remove_variants

SNPs to remove from topSS,

show_table

Create an interative data table.

verbose

Print messages.

Locus

Column containing unique locus name.

Value

Munged topSNPs table.

Examples

topSNPs <- echodata::import_topSNPs(
    topSS = echodata::topSNPs_Nalls2019_raw,
    colmap = construct_colmap(P = "P, all studies",
                              Effect = "Beta, all studies",
                              Locus = "Nearest Gene",
                              Gene = "QTL Nominated Gene (nearest QTL)"
                              ),
    grouping_vars = "Locus Number")
#> Renaming column: P, all studies ==> P
#> Renaming column: Beta, all studies ==> Effect
#> Renaming column: Nearest Gene ==> Locus
#> Renaming column: QTL Nominated Gene (nearest QTL) ==> Gene
#> [1] "+ Assigning Gene and Locus independently."
#> Standardising column headers.
#> First line of summary statistics file: 
#> SNP	CHR	BP	Locus	Gene	Effect allele	Other allele	Effect allele frequency	Effect	SE, all studies	P	P, COJO, all studies	P, random effects, all studies	P, Conditional 23AndMe only	P, 23AndMe only	I2, all studies	Freq1, previous studies	Beta, previous studies	StdErr, previous studies	P, previous studies	I2, previous studies	Freq1, new studies	Beta, new studies	StdErr, new studies	P, new studies	I2, new studies	Passes pooled 23andMe QC	Known GWAS locus within 1MB	Failed final filtering and QC	Locus within 250KB	Locus Number	
#> Returning unmapped column names without making them uppercase.
#> + Mapping colnames from MungeSumstats ==> echolocatoR