Main function to run either the conditional stepwise procedure (genome-wide) or the conditional analysis (locus-specific) from GCTA-COJO.
Usage
COJO(
dat,
locus_dir,
bfile = file.path(locus_dir, "LD/plink"),
fullSS_path = NULL,
conditioned_snps = NULL,
exclude = NULL,
prefix = "cojo",
run_stepwise = TRUE,
run_conditional = FALSE,
run_joint = FALSE,
credset_thresh = 0.95,
freq_cutoff = 0.1,
compute_n = "ldsc",
colmap = echodata::construct_colmap(),
full_genome = FALSE,
gcta_path = echoconda::find_executables_remote(tool = "gcta")[[1]],
verbose = TRUE,
...
)Source
COJO documentation Publication 1 (doi:10.1016/j.ajhg.2010.11.011 ) Publication 2 (doi:10.1038/ng.2213 )
Arguments
- dat
Fine-mapping results data.
- locus_dir
Locus-specific directory to store results in.
- bfile
Input PLINK binary PED files, e.g. test.fam, test.bim and test.bed (see PLINK user manual for details).
- fullSS_path
Path to the full summary statistics file (GWAS or QTL) that you want to fine-map. It is usually best to provide the absolute path rather than the relative path.
- conditioned_snps
Which SNPs to conditions on when fine-mapping with (e.g. COJO).
- exclude
Specify a list of SNPs to be excluded from the analysis.
- prefix
Prefix to use for file names.
- run_stepwise
--cojo-slct: Perform a stepwise model selection procedure to select independently associated SNPs. Results will be saved in a *.jma file with additional file *.jma.ldr showing the LD correlations between the SNPs.- run_conditional
--cojo-cond: Perform association analysis of the included SNPs conditional on the given list of SNPs. Results will be saved in a *.cma. The conditional SNP effects (i.e. bC) will be labelled as "NA" if the multivariate correlation between the SNP in question and all the covariate SNPs is > 0.9.- run_joint
--cojo-joint: Fit all the included SNPs to estimate their joint effects without model selection. Results will be saved in a *.jma file with additional file *.jma.ldr.- credset_thresh
The minimum mean Posterior Probability (across all fine-mapping methods used) of SNPs to be included in the "mean.CS" column.
- freq_cutoff
Minimum variant frequency cutoff.
- compute_n
How to compute per-SNP sample size (new column "N").
If the column "N" is already present indat, this column will be used to extract per-SNP sample sizes and the argumentcompute_nwill be ignored.
If the column "N" is not present indat, one of the following options can be supplied tocompute_n:0N will not be computed.
>0If any number >0 is provided, that value will be set as N for every row. **Note**: Computing N this way is incorrect and should be avoided if at all possible.
"sum"N will be computed as: cases (N_CAS) + controls (N_CON), so long as both columns are present.
"ldsc"N will be computed as effective sample size: Neff =(N_CAS+N_CON)*(N_CAS/(N_CAS+N_CON)) / mean((N_CAS/(N_CAS+N_CON))(N_CAS+N_CON)==max(N_CAS+N_CON)).
"giant"N will be computed as effective sample size: Neff = 2 / (1/N_CAS + 1/N_CON).
"metal"N will be computed as effective sample size: Neff = 4 / (1/N_CAS + 1/N_CON).
- colmap
Column mappings object. Uses construct_colmap by default.
- full_genome
Whether to run GCTA-COJO across genome-wide (
TRUE), or within a specific locus (default:FALSE)- gcta_path
Path to the GCTA-COJO executable.
- verbose
Print messages.
- ...
Arguments passed on to
COJO_argscojo_fileInput the summary-level statistics from a meta-analysis GWAS (or a single GWAS).
outSpecify output root filename.
cojo_slctPerform a stepwise model selection procedure to select independently associated SNPs. Results will be saved in a *.jma file with additional file *.jma.ldr showing the LD correlations between the SNPs.
cojo_condPerform association analysis of the included SNPs conditional on the given list of SNPs. Results will be saved in a *.cma. The conditional SNP effects (i.e. bC) will be labelled as "NA" if the multivariate correlation between the SNP in question and all the covariate SNPs is > 0.9.
cojo_jointFit all the included SNPs to estimate their joint effects without model selection. Results will be saved in a *.jma file with additional file *.jma.ldr showing the LD correlations between the SNPs.
mafExclude SNPs with minor allele frequency (MAF) less than a specified value, e.g. 0.01.
max_mafInclude SNPs with MAF less than a specified value, e.g. 0.1.
cojo_top_SNPsPerform a stepwise model selection procedure to select a fixed number of independently associated SNPs without a p-value threshold. The output format is the same as that from
--cojo-slct.cojo_pThreshold p-value to declare a genome-wide significant hit. The default value is 5e-8 if not specified. This option is only valid in conjunction with the option
--cojo-slct. Note: it will be extremely time-consuming if you set a very low significance level, e.g. 5e-3.cojo_windSpecify a distance d (in Kb unit). It is assumed that SNPs more than d Kb away from each other are in complete linkage equilibrium. The default value is 10000 Kb (i.e. 10 Mb) if not specified.
cojo_collinearDuring the model selection procedure, the program will check the collinearity between the SNPs that have already been selected and a SNP to be tested. The testing SNP will not be selected if its multiple regression R2 on the selected SNPs is greater than the cutoff value. By default, the cutoff value is 0.9 if not specified.
diff_freqTo check the difference in allele frequency of each SNP between the GWAS summary datasets and the LD reference sample. SNPs with allele frequency differences greater than the specified threshold value will be excluded from the analysis. The default value is 0.2.
cojo_gcIf this option is specified, p-values will be adjusted by the genomic control method. By default, the genomic inflation factor will be calculated from the summary-level statistics of all the SNPs unless you specify a value, e.g.
--cojo-gc 1.05.
Details
Documentation
Columns are SNP, the effect allele, the other allele,
frequency of the effect allele,
effect size, standard error, p-value and sample size.
The headers are not keywords and will be omitted by the program.
Important: "A1" needs to be the effect allele
with "A2" being the other allele and "freq"
should be the frequency of "A1".'
Note: 1) For a case-control study, the effect size should be log(odds ratio)
with its corresponding standard error.
2) Please always input the summary statistics of all SNPs even
if your analysis only focuses on a subset of SNPs
because the program needs the summary data of all SNPs to calculate the
phenotypic variance.
You can use one of the --extract options (Data management) to limit
the COJO analysis in a certain genomic region.
General results columns:
- Chr
Chromosome.
- SNP
SNP RSID.
- bp
Physical position.
- refA
Effect allele.
- freq
Frequency of the effect allele in the original data.
- b
Effect size.
- se
Standard error.
- p
p-value from the original GWAS or meta-analysis.
- n
Estimated effective sample size.
- freq_geno
Frequency of the effect allele in the reference sample.
Stepwise analysis results columns:
- bJ
Effect size from the joint analysis of all the selected SNPs.
- bJ_se
Standard error from the joint analysis of all the selected SNPs.
- pJ
p-value from the joint analysis of all the selected SNPs.
- LD_r
LD correlation between the SNP i and SNP i + 1 for the SNPs on the list.
- LD_r2
LD_r squared.
- CS
Whether the SNP is in the Credible Set, defined as any SNP with where
pJ<(1-credset_thresh).
Conditional analysis results columns:
- bC
Effect size from the conditional analysis.
- bC_se
Standard error from the conditional analysis.
- pC
p-value from the conditional analysis.
- CS
Whether the SNP is in the Credible Set, defined as any SNP with where
pC<(1-credset_thresh).
Examples
if (FALSE) { # \dontrun{
vcf <- system.file("extdata", "BST1.1KGphase3.vcf.bgz",
package = "echodata")
dat <- echodata::BST1
locus_dir <- file.path(tempdir(), echodata::locus_dir)
fullSS_path <- echodata::example_fullSS()
bfile <- echoLD::vcf_to_plink(vcf = vcf)$prefix
cojo_DT <- COJO(dat = dat,
locus_dir = locus_dir,
fullSS_path = fullSS_path,
bfile = bfile)
} # }