Why can we impute some rare sequence variants and not others? (#29)
We investigated how well rare variants can be imputed, using 1000 bull genomes sequence data (1147 sequences) as a reference, and a target set of dairy cattle with 630K SNP genotypes, that were also actually genotyped for four rare recessive defects (BLAD, CVM, HH1 and JH1). The proportion of carriers correctly imputed ranged from 1, for JH1, to 0.04 for CVM. There was a general trend for the proportion of carriers correctly imputed to increase as the frequency of the rare allele increased. CVM did not follow this trend – the frequency of the rare allele for this locus was 10 times higher than for BLAD, but proportion of carriers correctly imputed was much lower than BLAD. On closer inspection, the core haplotype of sequence variants common to all CVM carriers was found in many non carriers, and even in breeds other than Holstein (the disease has only been reported in Holstein). This was in contrast to JH1, where the core haplotype shared by carriers was unique to carriers, and was not found in other breeds. These results shed light on why we can impute some rare sequence variants well, while others are very difficult to impute.