Genome-Wide Association Studies

JAMA Guide to Statistics and Methods

Xiuqing Guo, PhD; Jerome I. Rotter, MD

Each individual’s genetic makeup influences the presence of, mani- festation of, and susceptibility to disease. The identification of spe- cific genetic regions that influence disease for mendelian genetic con-

Clinical Review & Education

Related article page 1682

ditions, such as cystic fibrosis and Huntington disease, have often been elucidated through familial

group. The history of humans and the human genome has resulted in some SNP variants being coinherited more frequently with some variants of disease-related effector genes, with the likelihood of co- inheritance decreasing the farther the SNP is from the effector gene because of random recombination. This phenomenon is known as linkage disequilibrium. If the allele frequency at a particular SNP is significantly different between affected individuals and control in- dividuals, the allele variant is said to be associated with the disease. Because each SNP is analyzed for association with disease indepen- dently and hundreds of thousands to millions of SNPs are analyzed in a single GWAS, very strict criteria for statistical significance must be applied to avoid false-positive results.3 A P value threshold of 5 × 10−8 is typically used to define statistical significance (ie, a Bonferroni correction, with the threshold determined by di- viding .05 by 106, to reflect the number of tests).

The odds ratios (ORs) obtained in GWASs are the odds of disease among individuals who have a specific allele vs the odds of disease among individuals who do not have that same allele, reflecting both the degree of coinheritance or linkage disequilibrium of the SNP allele and the disease-causing allele in the effector gene and the magnitude of the effector gene’s effect on disease risk. Most disease-associated SNPs have very small effect sizes (OR <1.5) and large numbers of indi- viduals are needed to identify predisposing (or protective) SNPs.4

Table. Genomic Terms and Definitions

linkage studies, combining familial patterns of disease with a limited set of genomic markers. In contrast, many diseases have very com- plex underlying mechanisms with many genes and the environment influencing risk. Understanding the influence of genetics on risk for these diseases requires approaches beyond familial linkage studies.

A candidate gene association study begins by identifying the candidate genes—either 1 gene or multiple genes thought to be- long to a common pathway. The association between genetic varia- tions in the candidate genes and the presence of disease is investi- gated. The success of this strategy is highly dependent on the correct choice of genes to study, although the overall experience with can- didate gene association studies has been disappointing.1

In contrast to a candidate gene association study, a genome- wide association study (GWAS) is based on a hypothesis-free strat- egy with no need to specify target genes in advance, and can be used to survey the entire genome to elucidate susceptibility to com- mon heritable human diseases. A GWAS quantifies the association be- tween the presence of disease and genetic variations at known posi- tions in the genome, referred to as single-nucleotide polymorphisms (SNPs; see the Table for related terminology), to pinpoint relatively smaller areas of the genome that may contribute to the risk of disease.

In this issue of JAMA, Hauser et al2 report a GWAS that evaluated genetic disposition for primary open-angle glaucoma (POAG) in indi- viduals with African ancestry. SNP rs59892895*C in the amyloid β A4 precursor protein-binding family B member 2 (APBB2) gene was found to be significantly associated with POAG in this population, while no association between this gene and POAG was found in European or Asian populations. The authors conclude that there are differences in genetic mechanisms underlying glaucoma in African ancestry popu- lations compared with European and Asian ancestry populations.

Use of the Method

Why Is the Method Used?

GWASs take advantage of variation in the millions of known SNPs, occurring in known locations across the entire genome, to deter- mine whether one genetic variant (ie, allele) at the location of each SNP occurs more often than expected in individuals with a particu- lar disease than in those without the disease. The associated SNPs are then considered to mark a region of the human genome that in- fluences the risk of disease. The approach allows identification of small genetic regions that contain potential effector genes (ie, genes that may affect the likelihood of disease).

Description of the Method

The most common approach used in a GWAS is to compare allele fre- quency among affected individuals with that of a healthy control

Terminology

Allele Effector gene Gene

Genome-wide association study

Imputation

Linkage analysis

Definition

One of 2 or more DNA sequences occurring at a particular gene locus (eg, blood groups A and B)

The gene (whether protein coding or RNA coding) that underlies an SNP association with a trait

The basic unit of heredity that occupies a specific location
on a chromosome; each gene consists of nucleotides arranged in a linear manner; although many genes code for a specific protein or segments of protein leading to a particular characteristic or function, other genes just code for RNA

A way for scientists to identify inherited genetic variants associated with risk of disease or a particular trait; this method surveys the entire genome for genetic polymorphisms, typically SNPs, that occur more frequently in individuals with the disease or trait being assessed (cases) than in individuals without the disease or trait (controls)

The statistical inference of unobserved genotypes; it is achieved by using a known genotype in a population (eg, from the HapMap or the 1000 Genomes Project)

A gene-hunting technique that traces patterns of disease or traits in families and attempts to locate a trait-causing gene by identifying genetic markers of known chromosomal location that are coinherited with the trait

Linkage Where alleles of different SNPs occur together more often disequilibrium than can be accounted for by chance (ie, beyond the

association due to their physical proximity on a chromosome)

Locus The physical site or location of a specific gene on a chromosome

Meta-analysis The statistical procedure for combining data from multiple studies, extensively used in genome-wide association studies

Single- DNA sequence variations that occur when a single nucleotide nucleotide (adenine, thymine, cytosine, or guanine) in the genome polymorphism sequence is altered; usually present in at least 1%
(SNP) of the population

jama.com

(Reprinted)

Volume 322, Number 17 1705

Downloaded From: https://jamanetwork.com/ by a National Institute of Mental Health & Neuro Science (NIMHNS) User on 11/06/2019

Clinical Review & Education JAMA Guide to Statistics and Methods

This can be achieved by prospectively gathering samples from large populations (eg, the deCode project, which gathered genotypic and medical data from more than 160 000 volunteer participants, com- prising well over half of the adult population in Iceland)5 or, more often, via combining samples from different study cohorts using meta-analytic methods.

Genotyping arrays used for GWASs do not directly genotype all known SNP variations in the genome. Information for SNPs not di- rectly measured can be imputed using reference panels that cover a greater number of SNPs, spanning both SNPs directly measured in a particular GWA study and those that were not included, along with information on the known SNP locations.

What Are the Limitations of the Method?

GWASs have several limitations. Early in the evolution of GWASs, de- spite stringent significance thresholds, false-positive associations were common and many findings failed to replicate in subsequent studies. Issues including differences in phenotyping of patients, the genotyping methods used, and a failure to account for artifacts in- troduced by subpopulations of patients in cohorts of differing an- cestral backgrounds (population stratification) all likely contrib- uted to challenges in replicating findings. Over time, methods for accounting for these and other technical issues have been im- proved, increasing the reliability of GWAS findings.

Genotyping arrays designed for GWASs rely on coinheritance or linkage disequilibrium between SNPs to provide coverage of the en- tire genome, even though only a subset of SNPs are characterized in any particular GWAS. Thus, one identified SNP usually represents many others, and the identified associated SNP variants are unlikely to be within the potentially causal effector genes. The most common as- sociated SNP variants identified in GWASs are noncoding complicat- ing attempts to establish the molecular effects of these GWAS loci.

From a clinical perspective, it is tempting to use GWAS findings to predict disease risk. However, the predictive ability of SNP markers with very low ORs (including the SNP identified in the study by Hauser et al2) is quite poor. Approaches for aggregating multiple SNPs with low ORs into a risk model, known as a polygenic risk score, are evolv- ing for a number of common conditions.6 To date, GWASs have been conducted in mostly European populations, so findings may not be gen- eralizable to other populations. The study by Hauser et al2 is important in that it conducts a GWAS including individuals of diverse African ancestries, which comparatively few successful GWASs have done.

A key scientific step in modern GWASs is to move from finding associated SNPs to identifying the actual effector transcript that codes for protein or for RNA and is responsible for the underlying disease pathophysiology. One way to do that is to try to focus nar- rowly on the fewest amount of SNPs within a given region or locus to eventually even a single SNP. The SNP itself may play a role in pro- tein function or processing, regulation of transcription, or may sim- ply be in linkage disequilibrium with other rare variants that are re- sponsible for disease risk.7

How Was the Method Used?

Hauser et al2 performed a discovery GWAS of 2320 patients with POAG and 2121 unaffected control individuals without POAG of African an- cestry. Replication of the study was carried out in 5401 individuals with POAG and 13 015 control individuals of African ancestry for SNPs with significant associations at genome-wide significance level (P < 5 × 10−8). A significant association was mapped to the APBB2 rs59892895T>C locus, whereas the minor allele C was observed to be associated with increased risk of POAG. A second de novo replication that included 1536 individuals with POAG and 1902 control individu- als further confirmed the association. Functional studies of a very small number of donor eyes suggested that individuals with African ances- try who carried the risk allele had higher levels of APBB2 in the retina. This increased expression level was accompanied by increased levels of cytotoxic β-amyloid, which colocalizes with retinal ganglion cells, and the death of these cells defines POAG. Of interest, the mecha- nism hypothesized by the investigators of increased β-amyloid in the retina potentially links glaucoma to Alzheimer disease.2

How Should the Results Be Interpreted?

In their GWAS, Hauser et al2 also found that 26 SNPs from 15 loci that were previously identified to be associated with POAG in individu- als with European and Asian ancestry had significantly lower effect sizes in individuals with African ancestry. This finding coupled with the finding of a risk-associated SNP that appears to be unique to in- dividuals of African ancestry suggests that genetic influences on POAG in African populations could be different from those in European and Asian populations. The identification of associated SNPs in a GWAS study, such as rs59892895 for POAG, may help to identify and delineate the actions of effector transcripts, in this case the APBB2 gene, thus yielding a possible explanatory mechanism for the association and potential targets for new therapies.

ARTICLE INFORMATION

Author Affiliations: The Institute for Translational Genomics and Population Sciences, Los Angeles Biomedical Research Institute, Department of Pediatrics, Harbor-UCLA Medical Center, Torrance, California.

Corresponding Author: Jerome I. Rotter, MD,
The Institute for Translational Genomics and Population Sciences, Los Angeles Biomedical Research Institute, Department of Pediatrics, Harbor-UCLA Medical Center, 1124 W Carson St, E-5, Torrance, CA 90502 (jrotter@labiomed.org).

Section Editors: Roger J. Lewis, MD, PhD, Department of Emergency Medicine, Harbor-UCLA Medical Center and David Geffen School of Medicine at UCLA; and Edward H. Livingston, MD, Deputy Editor, JAMA.

Conflict of Interest Disclosures: Drs Rotter and Guo reported receiving grants from the National Institutes of Health.

REFERENCES

1. Hirschhorn JN, Lohmueller K, Byrne E, Hirschhorn K. A comprehensive review of genetic association studies. Genet Med. 2002;4(2):45-61.

2. Hauser MA, Allingham RR, Aung T, et al;
The Genetics of Glaucoma in People of African Ancestry (GGLAD) Consortium. Association of genetic variants with primary open-angle glaucoma among individuals with African Ancestry [published November 5, 2019]. JAMA. doi:10.1001/jama.2019. 16161

3. Cao J, Zhang S. Multiple comparison procedures. JAMA. 2014;312(5):543-544.

4. Visscher PM, Brown MA, McCarthy MI, Yang J. Five years of GWAS discovery. Am J Hum Genet. 2012;90(1):7-24. doi:10.1016/j.ajhg.2011.11.029

5. Greely HT. The uneasy ethical and legal underpinnings of large-scale genomic biobanks. Annu Rev Genomics Hum Genet. 2007;8:343-364. doi:10.1146/annurev.genom.7.080505.115721

6. Sugrue LP, Desikan RS. What are polygenic scores and why are they important? JAMA. 2019; 321(18):1820-1821. doi:10.1001/jama.2019.3893

7. MahajanA,TaliunD,ThurnerM,etal. Fine-mapping type 2 diabetes loci to single-variant resolution using high-density imputation and islet-specific epigenome maps. Nat Genet. 2018;50 (11):1505-1513. doi:10.1038/s41588-018-0241-6

(Reprinted)

jama.com

Downloaded From: https://jamanetwork.com/ by a National Institute of Mental Health & Neuro Science (NIMHNS) User on 11/06/2019