Table of Contents
No, coloc.abf() and coloc.signals() assume they are given a dense map of all SNPs in a region that could be causal. This means you need to give all SNPs in a region. You can imagine they ask whether the patterns “match” across this region of SNPs, and a single variant does not represent a pattern.
Coloc is designed to address whether two traits share causal variant(s) in a genomic region. It leaves the definition of “region” up to the user. You need to break the genome into smaller regions, within which it is reasonable to assume there is at most one (coloc.abf) or a small number (coloc.signals) of causal variants per trait. One way to do this is to use the boundaries defined by recombination hotspots, proxied by this map created by lddetect.
How big should a region be? Big enough that all variants in LD with a lead SNP are included; small enough that only one or a small number of causal variants might exist in it. I have found using densely genotyped studies and the lddetect boundaries above that regions typically contain 1,000-10,000 SNPs.
That’s really how coloc works - by exploiting a dense SNP map - please see the original paper
This is described in detail in the latest paper
The summary printed on the screen by coloc.abf() and coloc.signals() shows the posterior probability of whether a shared causal variant exists in the region. High PP4 does not mean all variants are causal and shared - to check which variants are most likely to be causal look at the SNP.PP column in the returned detailed results data.frame.
high overall PP4 (>95%) means there is a high probability of colocalisation
low PP4 for single SNP (<50%) means we cannot identify which individual SNP is jointly causal with confidence