Using Random Forests to identify SNP associated with leg defects in broiler chicken: impact of correcting for population structures (#19)
The machine learning method, Random Forests (RF) has been shown to be effective in genome-wide association studies (GWAS). However existence of population structure (PS) without correcting it may cause spurious results in a RF analysis. In this study, we examined the impact of correcting for PS on the RF analysis of leg defect data from a commercial poultry population of 826 chickens genotyped for 44,129 SNP markers. The results show that correcting for PS led to: 1) a significant improvement in the estimates of SNP variable importance values; 2) a significant reduction in false positives identified in the uncorrected data; and 3) a stronger evidence for a set of SNP associated with the led defect phenotype.