星期三, 九月 07, 2011

Is genome wide association study meeting its end?

A paper was published in Nature last year.   The authors used 183,727 individuals to do a genome wide association study (GWAS) on human height, with 2,834,208 (imputed) loci SNP chips.



180 loci were found to be affecting human height.   Altogether, they explain 10% of  phenotypic variance of human height, or ONLY 12.5% of total height inheritable variance (h2_height=0.8).

From the point of view of statistics, the results are hard to believe.   Usually, it is called model selection to determine number (and effects) of QTLs along a chromosome.  If you have a look at  Goffinet & Mangin's paper many years ago, it is even very hard to determine the situation of 3 linked QTL per chromosome.

In a typical GWAS, people usually don't have so much funding to carry out a very large experiment as the one in this Nature paper.  The study above itself is actually a meta analysis of results from many labs around the world who have conducted some other GWAS.   They happened to have recorded the patients' heights also.

In a small experiment, one typically see several large peaks and many small peaks.   But when you zoom in the peaks, the very significant QTLs are usually gone  as they might just be some aggregate effects of many small QTLs and might even in variant association/linkage phases.

Many people believe that there are 2000+ genes affecting human height.   Some other people also estimated the number of genes that might affect common farm animal traits.  All of them suggest the lower bound of genes that affect a trait is in hundreds, or at least tens along a chromosome averagely.  The case of DGAT1 is just like a case of winning of a lottery.

We can also put the things this way.   Linkage analysis has been doing for more than 30 years.  Thought it is said that it saved the quantitative genetics, which is true. Statistics and computer sciences are now playing a even more important role today.  There were however rare responses to linkage analysis in agricultural industries.

Only 3-4 years ago, very few people knew genomic selection.  Now almost every people in animal and plant sciences is talking about it.  The industries of all kinds of farm animals and plants are investing or adopting this new technology.  Some country, like France, has totally stopped progeny tests of dairy cattle.

The world is changed.

2 条评论:

xijiang 说...

好多人将关联分析(association study)和连锁分析(linkage analysis)看作不同的方法,其实都可以看作连锁分析。不同的是后者还需要系谱关系。

这个身高的结果是顺便分析的。有好多人现在在寻找诸如糖尿病、心血管病、癌症、老年痴呆、抑郁狂躁症等疾病的致病基因,收集数据的时候顺便记录了身高体重。2008年有一篇nature genetics的文章说到分析身高的所谓丢失的遗传力。那次检测出来的基因只能解释5%的表型方差。大约6万个个体。现在数据量扩大了3倍,仍然离解释全部的遗传方差十分遥远。

另外,即便是检测出来的那些基因,它们其实很可能是增效或减效基因连锁在一起的结果,这是为什么一旦具体到真正的序列片段,检测出来的QTL反倒找不到了。这种模型选择在统计上十分困难,乃至不可能。

GWA实际上也是一种统计方法。

后裔测定是在奶牛/动物育种中特有的,一头种公 牛不产奶,它往往需要很多女儿的生产记录才能验证它适不适合做种。

xijiang 说...

GWA实际就是通常所说的GWAS。