Group SNPs by r-squared and their influence on the posterior

snp.picker(d, data, start.thr = 0.01, nochange.thr = 0.001,
  nochange.run = 3, r2.gap = 0.1, shared.models = 0.03,
  skip.shared.models = FALSE)

Arguments

d

object of class snpmod containing SNPs to be grouped

data

genotype data as a SnpMatrix object which will be used to determine r-squared

start.thr

MPI threshold to identify new groups. As long as at least one ungrouped SNP remains with MPI>start.thr the algorithm will attempt to form new groups

nochange.thr

the threshold below which MPI will be regarded as unchanged

nochange.run

the number of SNPs with MPI < nochange.thr for the algorithm to decide a group is complete

r2.gap

if MPI falls below nochange.thr, then at the next SNP rises above start.thr and if that SNP is > r2.gap away, the group is terminated

shared.models

the maximum proportion of models for the group's index SNP in which another group SNP may belong

skip.shared.models

if TRUE ignore the limit on shared.models

Value

object of class snppicker

Details

In the presence of LD, the posterior is diluted over correlated SNPs. We deal with this by picking groups of SNPs according to r squared and their influence on the posterior, as measured by the marginal probability of inclusion (MPI). The aim is to identify groups of SNPs, at most one of which should be included in any model.

The algorithm is rather simplistic, and works according to a series of tuning parameters detailed below. It appears to work for the datasets I have tried, but I would be very interested to hear about its performance on alternative datasets, whether positive or negative! It may be possible to set sensible defaults for specific parameters according to the type of dataset.

By default, the method checks explicitly that SNPs in the same group do not occur together in models. This requires creation of a model matrix, which is a time consuming step. It can be avoided by setting skip.shared.models=TRUE, but this is not advised as SNP groups should be sets of SNPs, either none or one of which are needed in the model.