Loading metrics

Open Access

What is mutation? A chapter in the series: How microbes “jeopardize” the modern synthesis

Affiliations Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas, United States of America, Department of Biochemistry and Molecular Biology, Baylor College of Medicine, Houston, Texas, United States of America, Department of Molecular Virology and Microbiology, Baylor College of Medicine, Houston, Texas, United States of America, The Dan L Duncan Comprehensive Cancer Center, Baylor College of Medicine, Houston, Texas, United States of America

ORCID logo

* E-mail: [email protected]

  • Devon M. Fitzgerald, 
  • Susan M. Rosenberg

PLOS

Published: April 1, 2019

  • https://doi.org/10.1371/journal.pgen.1007995
  • Reader Comments

Fig 1

Mutations drive evolution and were assumed to occur by chance: constantly, gradually, roughly uniformly in genomes, and without regard to environmental inputs, but this view is being revised by discoveries of molecular mechanisms of mutation in bacteria, now translated across the tree of life. These mechanisms reveal a picture of highly regulated mutagenesis, up-regulated temporally by stress responses and activated when cells/organisms are maladapted to their environments—when stressed—potentially accelerating adaptation. Mutation is also nonrandom in genomic space, with multiple simultaneous mutations falling in local clusters, which may allow concerted evolution—the multiple changes needed to adapt protein functions and protein machines encoded by linked genes. Molecular mechanisms of stress-inducible mutation change ideas about evolution and suggest different ways to model and address cancer development, infectious disease, and evolution generally.

Citation: Fitzgerald DM, Rosenberg SM (2019) What is mutation? A chapter in the series: How microbes “jeopardize” the modern synthesis. PLoS Genet 15(4): e1007995. https://doi.org/10.1371/journal.pgen.1007995

Editor: W. Ford Doolittle, Dalhousie University, CANADA

Copyright: © 2019 Fitzgerald, Rosenberg. This is an open access article distributed under the terms of the Creative Commons Attribution License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Funding: This work was supported by the American Cancer Society Postdoctoral Fellowship 132206-PF-18-035-01-DMC (DMF) and NIH grant R35-GM122598. The funders had no role in the preparation of the article.

Competing interests: The authors have declared that no competing interests exist.

Introduction

Mutation is any change in the sequence of an organism’s genome or the process by which the changes occur. Mutations range from single-basepair alterations to megabasepair deletions, insertions, duplications, and inversions. Though seemingly simple, ideas about mutation became entangled with the initially simplifying assumptions of both Darwin himself and the “Modern Synthesis”—the geneticists who embraced Darwin in the pre-DNA early 20th century, beginning evolutionary biology. The assumptions of purely “chance” mutations that occur constantly, gradually, and uniformly in genomes have underpinned biology for almost a century but began as a “wait-and-see”–based acknowledgment by early evolutionary biologists that they did not know the chemical nature of genes or how mutations in genes might occur.

Darwin considered generation of variation by chance to be a simplifying assumption, given that the origins of variation (and genes!) were unknown in his time, but he appears to have thought chance variation to be unlikely: “I have hitherto sometimes spoken as if the variations—so common and multiform in organic beings under domestication, and in a lesser degree in those in a state of nature—had been due to chance. This, of course, is a wholly incorrect expression, but it serves to acknowledge plainly our ignorance of the cause of particular variation [Chapter 5, 1].”

He also described multiple instances in which the degree and types of observable variation change in response to environmental exposures, thus seeming open to the possibility that the generation of variation might be environmentally responsive [ 1 ]. However, even once mutations were described on a molecular level, many continued to treat spontaneous mutations as necessarily chance occurrences—typically as mistakes occurring during DNA replication or repair. Darwinian evolution, however, requires only two things: heritable variation (usually genetic changes) and selection imposed by the environment. Any of many possible modes of mutation—purely “chance” or highly biased, regulated mechanisms—are compatible with evolution by variation and selection.

Here, we review some of the wealth of evidence, much of which originated in microbes, that reframes mutagenesis as dynamic and highly regulated processes. Mutation is regulated temporally by stress responses, occurring when organisms are poorly adapted to their environments, and occurs nonrandomly in genomes. Both biases may accelerate adaptation.

Bacteria teach biologists about evolution

Microbes were initially held as proof of the independence of mutational processes and selective environments. The Luria–Delbruck experiment (1943) demonstrated that bacterial mutations to phage resistance can occur prior to phage exposure [ 2 ], and the Lederbergs showed similar results for resistance to many antibiotics [ 3 ]. However, discovery of the SOS DNA-damage response and its accompanying mutagenesis [ 4 – 7 ] in the post-DNA world of molecular genetics began to erode the random-mutation zeitgeist. Harrison Echols thought that the SOS response conferred “inducible evolution” [ 8 ], echoing Barbara McClintock’s similar SOS-inspired suggestion of adaptation by regulated bursts of genome instability [ 9 ]. But SOS mutagenesis might be an unavoidable byproduct of DNA repair, and high-fidelity repair might be difficult to evolve, many argued. John Cairns’ later proposal of “directed” or “adaptive” mutagenesis in starvation-stressed Escherichia coli [ 10 , 11 ] reframed the supposed randomness of mutation as an exciting problem not yet solved. The mutagenesis they studied under the nonlethal environment of starvation is now known to reflect stress-induced mutagenesis—mutation up-regulated by stress responses. Its molecular mechanism(s), reviewed here, demonstrate regulation of mutagenesis. Similar mechanisms are now described from bacteria to humans, suggesting that regulated mutagenesis may be the rule, not the exception (discussed here and reviewed more extensively, [ 12 ]).

Stress-induced mutagenic DNA break repair in E . coli

DNA double-strand breaks (DSBs) occur spontaneously in approximately 1% of proliferating E . coli [ 13 , 14 ]. In unstressed E . coli , DSB repair by homologous recombination (HR) is relatively high fidelity. However, activation of the general stress response, for example, by starvation, flips a switch, causing DSB repair to become mutagenic [ 15 , 16 ]. This process of mutagenic break repair (MBR) causes mutations preferentially when cells are poorly adapted to their environment—when stressed—and, as modeling indicates [ 17 – 20 ], may accelerate adaptation.

At least three stress responses cooperate to increase mutagenesis in starving E . coli . The membrane stress response contributes to DSB formation at some loci [ 21 ]; the SOS response up-regulates error-prone DNA polymerases used in one of two MBR mechanisms [ 22 – 24 ]; and the general stress response licenses the use of, or persistence of errors made by, those DNA polymerases in DSB repair [ 15 , 16 ]. The requirement for multiple stress responses indicates that cells check a few environmental conditions before flipping the switch to mutation [ 25 ]. E . coli MBR is a model of general principles in mutation from bacteria to human: the regulation of mutation in time, by stress responses, and its restriction in genomic space, limited to small genomic regions, in the case of MBR, near DNA breaks. We look at MBR, then other mutation mechanisms in microbes and multicellular organisms, which share these common features.

MBR mechanisms

Two distinct but related MBR mechanisms occur in starving E . coli , and both require activation of the general/starvation response. Moreover, both occur without the starvation stress if the general stress response is artificially up-regulated [ 15 , 16 ], indicating that the stress response itself without actual stress is sufficient. Homologous-recombinational (homology-directed) MBR (HR-MBR) generates base substitutions and small indels via DNA-polymerase errors during DSB-repair synthesis ( Fig 1A–1F ). Microhomologous MBR causes amplifications and other gross chromosomal rearrangements (GCRs) [ 26 – 28 ], most probably by microhomology-mediated break-induced replication (MMBIR) [ 28 , 29 ] ( Fig 1A–1C , 1G and 1H ). Both MBR pathways challenge traditional assumptions about the "chance" nature of mutations.

thumbnail

  • PPT PowerPoint slide
  • PNG larger image
  • TIFF original image

(a–c) RecBCD nuclease loads RecA HR protein onto ssDNA, similarly to human BRCA2 loading RAD51; basepairing with a strand of identical duplex DNA (gray, e.g., a sister chromosome). Parallel lines, basepaired DNA strands. Repair synthesis (dashed lines) is switched to a mutagenic mode by the general stress response (sigma S). DNA polymerase errors (d, purple X) generate indels (e, purple XX) and base substitutions (f, purple XX). Microhomologous MBR requires DNA Pol I for template switching to regions containing microhomology (g), of as little as a few basepairs, and initiates replication, creating genome rearrangements; (h) a duplicated chromosome segment (blue arrows) is shown here. Circled numbers and shading indicate the three main events in HR-MBR: ① a DSB and its repair by HR, ② the SOS response (pink), and ③ the general stress response (blue). Note that HR-MBR (d–f, purple) requires both the SOS response (②, pink, which up-regulates error-prone DNA Pol IV, necessary for HR-MBR) and general stress response (③, blue), but microhomologous MBR (g–h, blue) requires the general stress response but not SOS (③, blue). Figure modified from [ 12 ]. HR, homologous recombination; MBR, mutagenic break repair; ssDNA, single-stranded DNA.

https://doi.org/10.1371/journal.pgen.1007995.g001

Both MBR mechanisms are initiated by a DSB and require HR DSB-repair proteins ( Fig 1 , ①) [ 15 , 28 , 30 – 33 ]. The first steps mirror standard HR DSB repair: RecBCD nuclease processes DSB ends and loads RecA HR protein ( Fig 1A and 1B ). Next, the RecA–DNA nucleoprotein filament can activate the SOS response ( Fig 1 , ② pink), which is required for HR-MBR but not microhomologous MBR. RecA also facilitates strand invasion—the initial contact between the broken DNA molecule and an identical sister chromosome from which repair is templated ( Fig 1C ). In unstressed cells, this intermediate leads to high-fidelity HR repair; however, if the general stress response is activated, repair proceeds via one of two mutagenic pathways ( Fig 1D–1H , ③). In HR-DSB repair, errors generated by error-prone SOS-up-regulated DNA polymerases IV (DinB), V (UmuDC), and II (PolB) accumulate in the tracts of repair synthesis during HR repair ( Fig 1D ) [ 22 , 23 , 34 ]. Activation of the general stress response licenses the use of these polymerases and/or prevents the removal of errors they generate: base substitutions and small indels ( Fig 1E, 1F ) [ 35 , 36 ] that are located mostly in clusters/hotspots of about 100 kb around the original DSB location [ 30 ]. Microhomologous MBR requires DNA Pol I, which is proposed to promote microhomology-dependent template switching during repair synthesis to generate GCRs ( Fig 1G and 1H ) [ 28 ]. Similar MMBIR mechanisms are proposed to underlie many DSB-driven GCRs in human genetic diseases and cancers [ 28 , 29 , 37 ].

Stress response regulation of E . coli MBR

Environmentally responsive and temporally regulated MBR mechanisms challenge long-held assumptions about the constant, gradual nature of mutagenesis and its blindness to an organism’s environmental suitability, or the lack of it, showing that mutagenesis is regulated tightly via environmental inputs. The general stress response controls the switch between high-fidelity or mutagenic DSB repair [ 15 , 16 ]. This stress response, controlled by the alternative sigma factor σ S , is activated by starvation, cold, acid, antibiotic, oxidative, and osmotic stresses, among others. During a general stress response, the σ S transcriptional activator increases the transcription of hundreds of genes (approximately 10% of all E . coli genes) that provide a range of protective functions (reviewed, [ 38 ]). We do not know exactly how the general stress response promotes mutagenesis. Two possibilities are as follows. First, the general stress response modestly up-regulates error-prone Pol IV above SOS-induced levels [ 39 ]. This might be the rate-limiting step. Also, the general stress response down-regulates mismatch repair (MMR) enzymes MutS and MutH [ 40 , 41 ]. The HR-MBR mutation spectrum is similar to that of unstressed MMR-deficient strains [ 35 , 36 , 42 ], suggesting that MMR becomes limiting transiently during HR-MBR [ 36 , 43 , 44 ]. Other σ S targets are also plausible, including down-regulation of the high-fidelity replicative DNA Pol III. Together, these observations suggest a model in which the general stress response enables error-prone polymerases to participate in DSB repair and/or allows the errors introduced by these polymerases to escape mismatch repair.

At least two other stress responses also contribute to one or both MBR mechanisms. The SOS DNA-damage response is required for HR-MBR [ 45 ] but not microhomologous MBR [ 22 ]. The SOS response is detected in about 25% of cells with a reparable DSB [ 13 ] and so comes automatically with the DSB that initiates MBR. (The 75% without SOS may repair fast enough to avoid SOS [ 13 ].) The SOS response halts cell division and activates DNA-damage tolerance and repair pathways. The primary role of the SOS response in HR-MBR is the upregulation of the error-prone DNA polymerases IV and V and possibly II. In some assays, production of Pol IV completely restores mutagenesis in SOS-defective cells [ 23 ]. In others, Pols II and V also contribute to mutagenesis [ 16 , 34 , 46 ]. Finally, the membrane stress response, regulated by σ E , promotes MBR at some loci by playing a role in spontaneous DSB formation through an unknown mechanism (see “Localization of MBR-dependent mutations”) [ 21 ]. The membrane stress response is triggered by an accumulation of unfolded envelope proteins caused by heat and other stressors [ 47 ] and therefore appears to couple these stressors to mutagenesis.

A genome-wide screen revealed a network of 93 genes required for starvation stress–induced MBR [ 25 ]. Strikingly, over half participate in sensing or signaling various types of stress and act upstream of activation of the key stress response regulators, which are hubs in the MBR network. During starvation stress, at least 31 genes function upstream of (in activation of) the general stress response. Most encode proteins used in electron transfer and other metabolic pathways, suggesting that these may be the primary sensors of starvation stress. Additionally, at least six genes are required for activation of the SOS response during MBR, and at least 33 MBR-network genes are required for activation of the membrane stress response. The 93 MBR genes form a highly connected network based on protein–protein interactions with the three stress response regulators (σ S , RecA/LexA, and σ E ) as nonredundant network hubs [ 25 ]. The MBR network highlights the importance of tight, combinatorial stress response regulation of mutagenesis in response to multiple inputs.

Generality of general stress response–promoted mutation

In E . coli , σ S -dependent mutagenesis has a mutational signature that is distinct from that seen in low-stress mutation accumulation (MA) studies and generation-dependent mutagenesis [ 34 , 35 , 42 , 48 ]. Importantly, the nucleotide diversity in genomes of extant E . coli and other bacteria is described better by the σ S -dependent signature than the signature seen in MA studies [ 48 ]. Specifically, both σ S -dependent mutations and those seen in extant species have much higher ratios of transitions to transversions than is seen in MA experiments or expected by chance. This suggests that a significant portion of adaptive mutations in bacteria arise from σ S -dependent stress-induced mutation mechanisms such as MBR [ 48 ]. Furthermore, mathematical modeling suggests that stress response–regulated mutagenesis, such as MBR, promotes adaptation in changing environments [ 17 – 20 ]. Organisms that encode regulated mutagenesis mechanisms may have an increased ability to evolve, which would promote the evolution and maintenance of such mechanisms by second-order selection [ 17 , 19 , 20 ].

Localization of MBR-dependent mutations

MBR generates mutations in hotspots close to the site of the instigating DSB, not at random locations in the genome [ 30 , 49 ]. Hotspotting near DSBs is best described for HR-MBR initiated by engineered DSBs at various sites in the bacterial chromosome [ 30 ]. Mutations are most frequent within the first kilobase (kb) pair on either side of the DSB, and then fall off to near background levels approximately 60 kb from the break, with a weak long-distance hot zone of around 1 MB from the DSB site. This pattern of mutations supports the model that most MBR-dependent mutations arise from DNA polymerase errors during HR repair synthesis, and the remainder arise during more processive error-prone break-induced replication. The observation that mutations occur near DSBs does not, in itself, suggest that mutations are more likely to occur in certain genomic regions or in locations related to an organism’s adaptive “need.” However, it does suggest that the distribution of mutations is likely to mirror the distribution of DSBs, and the following lines of evidence suggest that DSB distributions may be nonrandom and reflect potential utility of genes in particular environments.

The sources and distributions of spontaneous DSBs are poorly understood in all organisms (reviewed, [ 14 ]), but we have some clues about the origins of DSBs that lead to MBR. First, transcriptional RNA–DNA hybrids (R-loops) are one source of MBR-promoting DSBs [ 50 ]. R-loops have been implicated in DSB formation in many experimental systems, although the exact mechanism(s) of DNA breakage is unresolved (reviewed, [ 51 ]). Though the distribution of R-loops has not been thoroughly assessed in starving E . coli , R-loops tend to be biased toward highly transcribed genes, promoters, and noncoding-RNA genes [ 52 – 54 ] and might, therefore, target DSBs and mutations to those sites. Also, activation of the σ E membrane stress response is required for DSB formation in some assays and might target DSBs in genomic space [ 21 ]. The mechanism by which the σ E stress response causes DSBs is unknown, but one possibility is that σ E -activated transcription causes DSBs directly (rather than via gene products’ up- or down-regulation), via an R-loop–dependent or other transcription-dependent mechanism. R-loops and the σ E stress response might direct DSBs, and thus mutations, to regions of the genome with more adaptive potential for a given environment: transcribed genes and regulatory elements (promoters and regulatory small RNAs).

Additionally, MBR-dependent mutations can occur in clusters [ 55 ]. When a MBR-induced mutation occurs, the probability of finding another mutation at neighboring sites 10 kb away is approximately 10 3 times higher than if the first mutation did not occur [ 55 ], and this is not true for a distant unlinked site in the genome [ 43 ], indicating that nearby mutations are not independent events. That is, linked mutations appear to occur simultaneously, in single MBR events. Such clusters are predicted to promote concerted evolution by simultaneously introducing changes to multiple domains of a protein or subunits of a complex protein machine [ 15 , 20 , 55 ]. Because multiple mutations are often needed for new functions to emerge, and often, the intermediate mutated states are less fit and counter selected, how complex protein machines evolve has been a long-standing problem [ 56 ]. Similar clusters have been identified in many organisms [ 57 ] and in cancer genomes, in which mutation clusters are called kataegis , Greek for (mutation) storms [ 58 – 60 ]. The mechanisms of mutation localization and co-occurrence revealed by MBR in E . coli have guided more mechanistic understanding of how mutation clusters occur across the tree of life.

Analyses of E . coli mutation accumulation lines and natural isolates indicate that local mutation rates vary by about one order of magnitude on the scale of approximately 10–100 kb [ 61 , 62 ]. It is possible, even likely, that the DSB-dependent mutation localization and co-occurring mutation clusters characteristic of MBR are important contributors to this nonuniformity in mutation rate. Similar degrees of variation in local mutation rates have been reported for other bacteria [ 63 ], yeast [ 64 ], and mammals (mouse, human, and other primates [ 65 , 66 ]) and could also result from MBR-like mutation mechanisms. Further analysis of natural isolates, with a specific focus on identifying clusters of cosegregating single-nucleotide variants, could indicate how frequent MBR-dependent mutation clusters are and how they shape genomes.

The molecular mechanisms of MBR reveal many ways by which mutations do not occur uniformly or independently from one another in genomic space. More work is needed to assess fully whether the MBR mechanism or genomes themselves have evolved to bias mutations to locations where they are most likely to be beneficial, such as genes actively transcribed in response to the experienced stressor.

Other regulated mutagenesis mechanisms in microbes

In addition to starvation-induced MBR in E . coli , diverse bacteria and single-celled eukaryotes display examples of stress response–up-regulated mutagenesis. Some of these mutation mechanisms provide additional insight into how mutation rates vary across genomes in ways that may accelerate adaptive evolution. Many share characteristics with E . coli MBR but differ enough to suggest that regulated mutagenesis has evolved independently multiple times, thus highlighting the importance of regulated mutagenesis to evolution-driven problems, such as combatting infectious disease and antimicrobial resistance. Potential strategies to counteract pathogen evolution require understanding of how genetic variation is generated in these organisms. Continued study of regulated-mutagenesis mechanisms may reveal potential new drug targets to block mutagenesis and thus evolution [ 12 , 25 , 67 ].

Other mechanisms of starvation stress–induced mutagenesis in bacteria

Diverse wild E . coli isolates show increased mutation rates during extended incubation on solid medium compared with vegetative growth, known as mutagenesis in aging colonies (MAC) [ 68 ]. In the one isolate tested for genetic requirements, MAC required σ S , decreased MMR capacity and error-prone Pol II but not DSB-repair proteins or SOS activation [ 68 ]—like, but not identical to, MBR in E . coli . Bacillus subtilis undergoes starvation-induced mutagenesis that is up-regulated by the ComK starvation-stress response and requires the SOS-induced Pol IV homolog YqjH but does not require DSB repair [ 69 , 70 ]. In B . subtilis , starvation-induced mutation of reporter genes increases with increased levels of transcription of those genes, dependently on the transcription-coupled repair factor Mfd [ 71 ], similarly to E . coli MBR [ 50 ]. This suggests that transcription directs starvation-induced mutations to transcribed regions of the B . subtilis genome, where they are more likely to be adaptive. This is similar to the hypothesized targeting of E . coli MBR but occurs through a DSB-independent mechanism.

Antibiotic-induced mutagenesis in bacteria

Many antibiotics, especially at subinhibitory concentrations, increase mutation rate and generate de novo resistance and cross-resistance in a variety of bacteria, including important pathogens. The β-lactam antibiotic ampicillin induces mutagenesis in E . coli , Pseudomonas aeruginosa , and Vibrio cholera via a mechanism requiring σ S , Pol IV, and limiting mismatch repair [ 41 ]. Whether DSBs are involved remains untested. The topoisomerase-inhibiting antibiotic ciprofloxacin (cipro) induces cipro resistance rapidly in E . coli , requiring HR proteins, SOS induction, and error-prone Pols II, IV, and V [ 72 ]. A requirement for σ S has only very recently been demonstrated, along with the demonstration that cipro-induced mutagenesis is σ S -dependent MBR, similar to that induced by starvation[ 73 ]. In fact, diverse antibiotics both create DSBs [ 74 ] and activate the general stress response in E . coli [ 41 ], suggesting that these antibiotics may increase mutagenesis both by increasing DNA damage and triggering a switch to low-fidelity repair of that damage.

Stress response regulation of mobile DNA elements in bacteria

Environmental stress up-regulates the activity of mobile DNA elements in many organisms, and this inducible genome instability is likely to be an important driver of evolution (reviewed, [ 75 ]). Although the mechanisms of regulation are poorly understood, stress response regulators have been implicated in a few cases. The general stress response promotes excision of an E . coli transposable prophage [ 76 ] and a Pseudomonas transposon [ 77 ]. Starvation increases the retromobility of Lactobacillus lactis LtrB group II intron through signaling by the small molecule regulators guanine pentaphosphate (ppGpp) and cyclic adenosine monophosphate (cAMP) [ 78 ]. Mobility of an E . coli transposon is increased by metabolic disruptions and negatively regulated by the σ E membrane stress response [ 79 ]. Also, stress can directly regulate mobile element activity without an intervening stress response: movement of the T4 td intron becomes promiscuous during oxidative stress through ROS-induced oxidation of an amino acid in the intron-encoded homing endonuclease, which makes it a transposase [ 80 ].

Regulated mutagenesis in eukaryotic microbes

Many examples of stress-associated mutagenesis and MBR have been reported in yeast, but stress response regulation has been demonstrated in only two cases. First, in the budding yeast Saccharomyces cerevisiae , the proteotoxic drug canavanine induces mutagenesis dependently on the MSN environmental stress response [ 81 ]. MSN-dependent mutagenesis requires the nonhomologous end-joining (NHEJ) protein Ku and two error-prone polymerases, Rev1 and Pol zeta (ζ) [ 81 ]. NHEJ is a relatively genome-destabilizing DSB-repair pathway, so MSN-dependent mutagenesis represents a stress-induced switch to MBR. NHEJ proteins are required for starvation-induced mutations in yeast as well [ 82 ]. Others have reported yeast MBR dependent on the error-prone DNA polymerase Rev3 [ 83 ] and spontaneous mutations dependent on error-prone polymerases Rev1 and Pol ζ [ 84 ]. Yeast also form mutation clusters by MBR [ 85 ] and undergo MMBIR similar to E . coli microhomologous MBR [ 86 ]. It is unknown whether these observations represent one or more mechanisms of mutation and whether MSN or other stress responses regulate mutagenesis in these cases. In all cases of yeast MBR, mutations are likely to occur near DSBs and, therefore, may be localized within genomes, as discussed for E . coli MBR.

Second, a heat shock response, activated by heat shock or protein denaturation, induces aneuploidy in S . cerevisiae by titration of the chaperone heat shock protein 90 (HSP90) [ 87 ]. Inhibitors of HSP90, such as radicicol, also induce aneuploidy. HSP90 is required for proper folding of kinetochore proteins in unstressed cells, so HSP90 titration or inhibition probably triggers aneuploidy through the disruption of kinetochore assembly [ 87 ]. The resulting yeast cell populations show high karyotypic and phenotypic variation and harbor cells resistant to radicicol and other drugs [ 87 ]. Aneuploidy in the form of extra chromosome copies may also facilitate adaptive evolution by providing a larger mutational target. Extra chromosomes may also buffer otherwise deleterious mutations through the sharing of gene products. Similar heat- and other stress-induced aneuploidy has been reported in Candida albicans and other yeast species, and can cause resistance to a variety of compounds, including clinically relevant antifungal drugs (reviewed, [ 88 ]). Some of these examples are likely to result from HSP90 titration, but other stress responses may be involved also.

Regulated mutagenesis in multicellular organisms

Although microbes led the way in revealing mechanisms of stress response–up-regulated mutagenesis, many microbial mutation mechanisms are mirrored throughout the tree of life, including in multicellular organisms. Stress response–up-regulated mutation mechanisms have been discovered in plants, flies, and human cells (reviewed, [ 12 ]). The potential adaptive roles of these mutation mechanisms are less clear in multicellular organisms than in microbes. Do these mechanisms contribute to germline variation (and thus organismal evolution), mosaicism and somatic cell evolution, or both? Or are they simply biproducts of other required cellular functions or stress-induced dysfunctions?

In the Drosophila germline, the HSP90 heat shock response increases transposon-mediated mutagenesis and can drive organismal adaptation [ 89 ]. Most other regulated mutation mechanisms characterized to date have been in somatic cells, in which they might contribute to mosaicism. Somatic diversity may be important during development and contribute to organismal fitness, as is the case with antibody diversification during B-cell maturation. For example, neural development might require genetic complexity and plasticity as organisms get differently “wired” during development, based on their experiences. However, up-regulated mutagenesis is also likely to drive pathogenic somatic evolution, such as during cancer development. For example, hypoxic stress responses trigger down-regulation of mismatch repair and down-regulate HR DSB-repair proteins RAD51 and BRCA1, leaving only chromosome-rearranging nonhomologous or microhomologous DSB-repair mechanisms (reviewed, [ 90 ]). Hypoxic stress response–induced mutagenesis occurs in mouse and human, suggesting an adaptive function in addition to its probable relevance to tumor biology. Tumors become hypoxic and induce hypoxic stress responses, which promote angiogenesis. Hypoxic stress responses may also promote tumor evolution via mutagenesis. The tumor growth factor β (TGF-β) signaling pathway also induces genome rearrangement by reduction of HR DSB repair in human cancer cell lines, leading to increased copy number alterations and resistance to multiple chemotherapeutic drugs [ 91 , 92 ]. Stress-induced and localized mutagenesis in multicellular organisms and the relevance of these mechanisms to cancer are reviewed in more detail elsewhere [ 12 ].

Evolution and applications of stress-induced mutation

Mutations provide the raw material for evolution but can also decrease the fitness of an organism. Therefore, mutation rates have, presumably, been finely tuned, apparently through second-order selection. Constitutively high mutation rates are advantageous in rapidly changing environments but decrease fitness in more stable (or periodically changing) environments. By biasing mutation to times of stress and to particular genomic regions, perhaps such regions relevant to a specific stress, stress-induced mutagenesis mechanisms provide the benefits of high mutation rate, while mitigating the risks. The ubiquity of these mechanisms throughout the tree of life supports their crucial role in evolution.

Stress-induced mutation mechanisms, first discovered in bacteria, challenge historical assumptions about the constancy and uniformity of mutation but do not violate strict interpretations of the Modern Synthesis. Mutation is still viewed as probabilistic, not deterministic, but we argue that regulated mutagenesis mechanisms greatly increase the probability that the useful mutations will occur at the right time, thus increasing an organism’s ability to evolve and, possibly, in the right places. Assumptions about the constant, gradual, clock-like, and environmentally blind nature of mutation are ready for retirement.

Stress-induced mutation mechanisms are likely to play important roles in human disease by promoting pathogen and tumor evolution and may drive evolution more generally. Mutation mechanisms may also be attractive drug targets for combatting infectious disease, cancer, and drug-resistance evolution in both [ 73 ]. Although many mechanisms of stress-inducible mutation have been identified in the past two decades [ 12 ], these are likely to be the tip of the iceberg. Some current pressing questions are highlighted below.

Open questions in mutation research

  • What fraction of total “spontaneous” mutagenesis results from mutagenesis up-regulated by stress responses? Do stress response–regulated mutation programs drive much of adaptive evolution in microbes? Multicellular organisms?
  • Are DSBs and the mutations they cause randomly distributed in genomic space? Or is DSB formation regulated, biased, or directed? By what mechanisms? Is this targeting adaptive?
  • Can stress response–regulated mutation mechanisms be targeted by anti-evolvability drugs that limit the generation of heritable diversity? Can these drugs prevent pathogens and cancers from out-evolving host responses and drugs?

Acknowledgments

We thank P.J. Hastings for comments on the manuscript and our colleagues in this bundle for extreme patience.

  • 1. Darwin CR. On the origin of species by means of natural selection, or the preservation of favoured races in the struggle for life. London: John Murray; 1859.
  • View Article
  • Google Scholar
  • PubMed/NCBI
  • 6. Radman M. Phenomenology of an inducible mutagenic DNA repair pathway in Escherichia coli : SOS hypothesis. In: Prokash L, Sherman F, Miller M, Lawrence C, Tabor H, editors. Molecular and Environmental Aspects of Mutagenesis. Springfield, Illinois: Charles C. Thomas; 1974. p. 128–42.
  • 7. Radman M. SOS Repair Hypothesis: Phenomenology of an Inducible DNA Repair Which Is Accomplanied by Mutagenesis. In: Hanawalt P, editor. Molecular Mechanisms for Repair of DNA. New York: Plenum Press; 1975.
  • 73. Pribis JP, García-Villada L, Zhai Y, Lewin-Epstein O, Wang A, Liu J, et al. Gamblers: an antibiotic-induced evolvable cell subpopulation differentiated by reactive-oxygen-induced general stress response. Molecular cell. 2019;74 (in press). https://doi.org/10.1016/j.molcel.2019.02.037
  • Open access
  • Published: 15 July 2015

Defining “mutation” and “polymorphism” in the era of personal genomics

  • Roshan Karki 1 ,
  • Deep Pandya 1 ,
  • Robert C. Elston 2 &
  • Cristiano Ferlini 1  

BMC Medical Genomics volume  8 , Article number:  37 ( 2015 ) Cite this article

17k Accesses

89 Citations

27 Altmetric

Metrics details

The growing advances in DNA sequencing tools have made analyzing the human genome cheaper and faster. While such analyses are intended to identify complex variants, related to disease susceptibility and efficacy of drug responses, they have blurred the definitions of mutation and polymorphism.

In the era of personal genomics, it is critical to establish clear guidelines regarding the use of a reference genome. Nowadays DNA variants are called as differences in comparison to a reference. In a sequencing project Single Nucleotide Polymorphisms (SNPs) and DNA mutations are defined as DNA variants detectable in >1 % or <1 % of the population, respectively. The alternative use of the two terms mutation or polymorphism for the same event (a difference as compared with a reference) can lead to problems of classification. These problems can impact the accuracy of the interpretation and the functional relationship between a disease state and a genomic sequence.

We propose to solve this nomenclature dilemma by defining mutations as DNA variants obtained in a paired sequencing project including the germline DNA of the same individual as a reference. Moreover, the term mutation should be accompanied by a qualifying prefix indicating whether the mutation occurs only in somatic cells (somatic mutation) or also in the germline (germline mutation). We believe this distinction in definition will help avoid confusion among researchers and support the practice of sequencing the germline and somatic tissues in parallel to classify the DNA variants thus defined as mutations.

Peer Review reports

The human genome consists of over 3 billion base pairs which reside in every nucleated cell of the body [ 1 , 2 ]. The genome, which has remained well conserved throughout evolution, is at least 99.5 % identical between any two humans on the planet [ 3 ]. Modern genomic tools have revealed that it is more complex, diverse, and dynamic than previously thought, even though the genetic variation is limited to between 0.1 % [ 4 – 6 ] and 0.4 % [ 7 ] of the genome. Sequence variations, even in non-protein coding regions of the DNA, have begun to alter our understanding of the human genome. While some studies have linked certain variants to being predictive of disease susceptibility and drug response, the majority of diseases have a very complex genetic signature (reviewed in [ 8 , 9 ]). Biomedical research is shifting towards understanding the functional importance of many such variations and their association with human diseases.

At the heart of these novel discoveries are the modern DNA sequencing tools, which continue to evolve at a rapid pace. The new sequencing technologies continue to become cheaper and more precise, and facilitate novel medical and biological breakthroughs all over the world [ 10 , 11 ]. Scientific research has become nearly inconceivable without employing sequencing technology but, with the progress of technology and the increasing sequencing of individuals, a massive amount of data is being generated. However, any data without context and analysis is useless. The data from sequencing must be carefully annotated, securely stored, and easily accessible from repositories when needed. Such arduous tasks require functional collaboration among clinicians, researchers, and health professionals [ 12 ].

In a recent thread in the ResearchGate portal [ 13 ], an ongoing discussion on the difference between a mutation and a polymorphism elicited a response from more than three hundred participants from various scientific backgrounds. The variety of responses prompted us to write this document as a paper aimed at stimulating the discussion further and possibly finding a consensus on the usage of the terms mutation and polymorphism in the context of a reference sequence in a personal genome project.

The rise of genomics and its impact on human health

Established in 1990, the Human Genome Project was one of the most expensive and collaborative ventures ever undertaken in science. Ten years since its completion, it has continued to provide a wealth of novel information, the implications of which are not yet fully understood [ 8 ]. The open-access nature of the project has stimulated scientists, as well as scientific companies, to develop better sequencing tools and accompanying analytical software. The ensuing innovations have helped to mark down the price of whole genome sequencing over the years, from nearly $3 billion at its inception to under $3,000, making it accessible to researchers from different biomedical disciplines [ 14 ].

Sequencing tools will play an important role in the development of personalized medicine. Some sequencing technologies are already used in clinics to test genetic conditions, diagnose complex diseases, or screen patient samples for rare variants. These tests allow health professionals to accurately diagnose a disease and prescribe appropriate medication specific to the patient [ 15 , 16 ]. With the recent support of NIH grants in the US, neonatal sequencing is being explored to probe rare and complex disorders of newborn babies [ 17 , 18 ]. There are technologies in development that allow non-invasive ways of sequencing a genome of an unborn child [ 19 ]. Personalized genome sequencing will transform the future of the healthcare landscape. However, the rise in the number of sequenced genomes is creating new problems. In particular, the way the genome analysis software works is through comparison of the obtained sequences with a reference. Because the human genome is different between different individuals, what is the reference sequence? What is the threshold to distinguish common from rare DNA variants?

Amid all these interesting implications of genome sequencing, the debate concerning the correct use of scientific terminology remains. Specifically, the nomenclature “mutation” and “polymorphism”, and also “point mutation” versus “SNP”, can be independently used to describe the same event, namely a difference in the sequence as compared with a reference. From a strictly grammatical and etymological point of view, a mutation is an event (of mutating) and a polymorphism is a condition or quality (of being polymorphic); but these terms by extension quickly came to mean the resulting event or condition itself. In principle, a point DNA variant can be labeled as a mutation or SNP. Since no clear rules are available, currently used software tools used for genome sequencing make no assignment and label the difference simply as DNA variant, blurring the distinction between the two categories.

“Mutation” and “polymorphism”: earlier definitions

The uniform and unequivocal description of sequence variants in human DNA and protein sequences (mutations, polymorphisms) were initiated by two papers published in 1993 [ 20 , 21 ]. In this context, any rare change in the nucleotide sequence, usually but not always with a disease causing attribute, is termed a “mutation” [ 22 ]. This change in the nucleotide sequence may or may not cause phenotypic changes. Mutations can be inherited from parents (germline mutations) or acquired over the life of an individual (somatic mutations), the latter being the principal driver of human diseases like cancer. Germline mutations occur in the gametes. Since the offspring is initially derived from the fusion of an egg and a sperm, germline mutations of parents may also be found in each nucleated cell of their progeny. Mutations usually arise from unrepaired DNA damage, replication errors, or mobile genetic elements. There are several major classes of DNA mutations. A point mutation occurs when a single nucleotide is added, deleted or substituted. Along with point mutations, the whole structure of a chromosome can be altered, with chromosomal regions being flipped, deleted, duplicated, or translocated [ 23 ]. Another kind of DNA mutation is defined as “copy number variation”. In this case, the expression of a gene is amplified (or reduced) through increased (decreased) copy number of a locus allele [ 24 , 25 ].

A variation in the DNA sequence that occurs in a population with a frequency of 1 % or higher is termed a polymorphism [ 26 ]. The higher incidence in the population suggests that a polymorphism is naturally occurring, with either a neutral or beneficial effect. Polymorphisms can also be of one or more nucleotide changes, just like mutations. The SNP exemplifies the commonest polymorphism, thought to arise every 1,000 base pairs in the human genome, and is usually found in areas flanking protein-coding genes [ 27 ] – regions now recognized as critical for microRNA binding and regulation of gene/protein expression [ 28 ]. However, SNPs can also occur in coding sequences, introns, or in intergenic regions [ 27 ]. SNPs are used as genetic signatures in populations to study the predisposition to certain traits, including diseases [ 29 ].

The anatomy of the problem

In the era of advanced DNA sequencing tools and personal genomics, these earlier definitions of mutation and polymorphism are antiquated. Before multiple parallel sequencing was developed, it was impossible to sequence multiple times the genome of the same patient. For these reasons at that time it was required to use a reference sequence coming from the assembly of multiple genomes. In the preparation of the consensus sequence, an arbitrary threshold of 1 % was established to distinguish common (polymorphism) from rare (mutation) variants [ 26 ].

The 1 % or higher frequency associated with a polymorphism is an arbitrary number [ 30 ] recommended by scientists prior to the era of Next Gen Sequencing. The threshold being arbitrary, redefining the population itself may affect the classification, with rare variants becoming polymorphisms or polymorphisms becoming rare variants according to the population analyzed. For decades, the use of this frequency to develop population models was preferred to the use of sequencing tools, which at that time were error-prone and labor-intensive. With the advent of new sequencing technologies and the subsequent sequencing of individuals, a very different picture of population dynamics has begun to emerge. Mutations that were thought to be rare in a population have been found to exceed the frequency threshold set at 1 % [ 31 ]. Even more surprising, there is a lack of association of some of these rare mutations with human diseases. When comparing populations separated by geographic and physical barriers, a disease-causing mutation in one population is found to be harmless in another, and vice versa [ 32 ].

For instance, sickle-cell anemia is caused by a nucleotide change (SNP rs334) in a gene coding for the beta chain of the hemoglobin protein [ 33 ]. In fact, rs334 is classified as a SNP, since its minor allele frequency in the population is >1 %. The disease manifests in people who have two copies of the mutated gene (rs334(T;T) genotype). Sickle cell anemia is usually rare (<1 %) in the populations of developed nations [ 34 ]. However, the heterozygous form of the gene (rs334(A;T) genotype) is persistent in populations of Africa, India, and other developing nations, where malaria is endemic [ 33 ]. In these geographic locations, heterozygote carriers of rs334 have a survival advantage against the malaria pathogen, and therefore this beneficial mutation is passed through the offspring to succeeding generations [ 35 – 37 ]. Here, a rare variant, which in one population (developed nations) causes a severe disease in homozygosis, can persist in another population to confer a survival advantage as a polymorphism in heterozygosis [ 38 ]. Such exceptions are increasing and show the need to redefine the terms mutation and polymorphism. The distinction between mutation and polymorphism on the basis of their disease-causing capacity is further complicated. Although thought to be naturally occurring, recent research into SNPs has shown that they can be associated with diseases like diabetes and cancers. At least 40 SNPs have been shown to associate with type-2 diabetes alone [ 39 ]. In short, it is not possible to classify the functional role of variations according to frequency in the population or their capability to cause a disease.

Context of personal genomics

This debate on “mutation” and “polymorphism” needs urgent evaluation in the era of Next Gen Sequencing and precision medicine. Multiple international collaborative projects like ENCODE (Encyclopedia of DNA elements) and HapMap (Haplotype Map) have ensued to map all the genes, genetic variation, and regulatory elements of the genome, to find associations with human biology, personal traits, and diseases [ 40 ].

In this climate, commercial companies like Illumina and Roche are developing advanced and robust platforms that tailor to the need of both small and large research facilities. The increasing competition among these companies has resulted in many different technologies, which are now available to facilitate new insights into genomics [ 11 ]. Similarly, advanced genomic tools and analytical software have been developed that can function independently of the particular platform. Researchers using tools like CLC genomics, Next Gene and Geno Matrix, can access and download sequencing datasets for their own streamlined research. The primary goal of such research is to look for subtle, complex, and dynamic sequence variations. The lack of consistent definitions and a uniform scientific language can hamper this upcoming field, where genomic platforms may formulate incorrect hypotheses and researchers may misinterpret data based on earlier definitions.

The problem is particularly important in the case of precision medicine and personalized treatments. For example, one of the main reasons to sequence the genome of a cancer consists in the identification of unique genetic features of cancer cells which may then be targeted with a personalized treatment [ 41 ]. Accordingly, it is required to classify the somatic mutations of the cancer cells and use such knowledge to exploit therapeutically all the differences between cancer and noncancerous cells. Therefore, in order to be treated with a targeted agent a cancer patient needs to express the target originated by the specific mutation occurring in cancer cells. However, should a difference be misclassified, it becomes possible for a polymorphism (present in all the cells of the patient) to be taken as a somatic mutation. The result could be a toxic effect, since the targeted treatment will impact both cancer and noncancerous cells carrying the same genetic variant. This problem is prevented if both germline and somatic cancer genomes would be sequenced in the same patient.

Another important reason underlying the need of such distinction is that a disease may originate with two subsequent mutations according to the two-hit hypothesis [ 42 ]. Within a population, a germline mutation (first hit) may predispose a subset of patients to a second, somatic, mutation whose effects will create the diseased phenotype [ 43 ]. In this context, in order to identify populations at risk it would be extremely helpful to distinguish between somatic and germline mutations. For example, multiple meningiomas occur in <10 % of meningioma patients. A first germline mutation in the SMARCB1 gene will predispose to meningioma, but this will occur only when a somatic mutation in the NF2 gene intervenes [ 44 ]. In the absence of a clear distinction between somatic and germline variants this kind of pathogenic discovery may be impossible.

This approach is now supported by a recent study. Jones et al. evaluated 815 tumor-normal paired samples coming from 15 different tumor types [ 45 ] using Next Gene Sequencing. Library preparation was performed with two methods, whole exome preparation and targeted amplification, for 111 genes. Analyses were then conducted either as if only the cancer tissue was sequenced (reference human genome assembly GRch37-lite) or taking as reference the germline DNA of the same patient. With the first analysis, the authors reported a very high rate of false-positive variants (31 % and 65 % in exome and targeted libraries, respectively). Furthermore, they identified germline mutations in 3 % of the cancers, even if they came from a cohort without family history (sporadic cancer). Now that the new sequencing technologies have dramatically reduced the cost of sequencing, precision medicine and personal genomics require that the reference of the DNA sequencing project should be obtained from the germline DNA of the same patient.

Ongoing debate and HGVS (Human Genome Variation Society) recommendations

The ongoing debate among scientists to resolve the nomenclature mutation and polymorphism is a step in the right direction. The HGVS, an alliance of 600 members from 34 countries, incorporates discussion and recommendations to establish consensus definitions and descriptions of generic terms that are accepted worldwide. Since the early 1990s, the HGVS has been instrumental in its push to standardize the mutation nomenclature. The recommendations of the HGVS have been based on extensive discussions among scientists over the years.

The papers published on this topic for the last 20 years show that HGVS was visionary to recommend new changes and extensions based on discoveries of relatively complex variants. In 2002, several researchers tried to address this nomenclature problem and the challenges to make more inclusive definitions.A special article by Condit et al. found that mutation had become increasingly negative in connotation since its use in the biological sciences, but particularly over the course of the 20th century [ 22 ]. This negativity of the term became entrenched with radiation experiments and the use of atomic weapons during the II nd world war, and later with science fiction books and movies. The paper suggested that a better term like “variation” and “alteration” might be useful, but its inconsistent usage in the scientific world makes it problematic.

More recently, additional papers have highlighted the urgency of a “consensus” guiding the selection of the sequencing methods (data collection) and reporting. These studies point out that the accurate classification of pathogenic variants requires a standardized approach and the building of data repositories including all these data [ 46 ]. In this context, Richards et al. on the behalf of the American College of Medical Genetics and Genomics (ACMG) have noted that the terms “mutation” and “polymorphism” often lead to confusion because of incorrect assumptions of pathogenic and benign effects, respectively. Thus, they recommended that both terms be replaced by the term “variant” with the following modifiers: (i) pathogenic, (ii) likely pathogenic, (iii) uncertain significance, (iv) likely benign, or (v) benign [ 47 ].

Despite this rhetoric to better define the terms, there is no consensus in research papers or HGVS recommendations on how a mutation is different from a polymorphism. The lack of a consensus is creating a problem in the interpretation of data coming from personal genome software analysis, as described above. What is the reference? What is the threshold to distinguish common from rare DNA variants? This problem is not trivial when looking at the downstream effects. In fact, the term mutation is commonly conceived (wrongly) to carry an intrinsic negative impact on the function of a given gene.

We propose that the term “mutation” be used to indicate the result of a recent mutation event which has been detected using as a reference the germline DNA of the same individual. Therefore, a mutation would be a “DNA variant” acquired over the lifetime of an organism, i.e. a somatic mutation. In this sense, mutations are the principal causes of many diseases like cancer but are typically not inherited by their offspring. Alterations in the DNA of germ cells – sperms and eggs – can be inherited by offspring and are currently called germline mutations. In this case, the term mutation should be used only if the germline “variant” has been detected using as a reference the germline DNA of the same individual. While germline mutations can also increase the likelihood of succumbing to certain diseases, the signature of such mutations is found in each and every cell of the offspring [ 48 ]. This is because the original embryo (first cell in the body) is formed through the fusion of germ cells, from where all the somatic cells arise. So in essence, while these alterations in the parents’ germ cells are appropriately termed germline mutations, calling both somatic and germline mutations simply “mutations” seems incongruent. The variation of genotypes among individuals, inherited from parents but still present in the DNA of each cell in the body, is the classic definition of a genetic polymorphism and we propose going back to this original definition: a polymorphism occurs in a population when the observed variation from individual to individual is not maintained by recurrent mutation.

Whereas it is perhaps not unreasonable to use the term mutation for the result of a mutation event, there is no analogy that would imply using the term polymorphism for a common variant because polymorphism is a condition found in a population, not an event. Genetic polymorphism, just like any other biological polymorphism (e.g. the siphonophores) occurs when members of a species differ in form [ 49 ]. When the notion that the different forms could be genotypes rather than phenotypes was first introduced [ 49 ], the focus was on the least frequent genotype not being due to recurrent mutation, and hence the arbitrary 1 % threshold; but a genetic locus with a thousand equi-frequent alleles would be considered extremely polymorphic. Most SNPs are tri-morphic, but are appropriately called polymorphisms in contrast to being mono-morphic.

Different from diseases associated with SNPs, which are expressed in all the cells of an organism, some diseases, like cancer, are caused by genetic variations typical of a small subset of somatic cells. In order to keep the difference between the two categories of DNA variants, we propose a clear distinction between a SNP and a somatic mutation. A tumor sample and a normal tissue sample from the same individual can be sequenced for analysis of genetic variations. For statistical and computing power, additional sequences coming from buccal swabs or peripheral blood DNA can be used to sequence the germline reference of a patient (paired approach). Since the tumor samples have additional genetic changes as compared with the specific individual’s germline reference, these changes will serve as key attributes to understanding the cancer of this specific individual.

It is possible that germline sequences between this individual and others also differ, and this would constitute a polymorphism in the population, as originally defined. The genotypes/alleles that constitute a polymorphism should be called variants but never, without attribute, simply “mutations”. In our proposal, the term “mutation” should be used only if the sequencing project used the germline reference (Fig.  1a ). In this context, in order to have a mutation it is not only required to detect a variation as compared with the reference, but also the reference needs to be represented by the germline cells of the same individual. Accordingly, the term “mutation” should always be accompanied by a qualifying prefix indicating if the “mutation” occurs only in somatic cells (somatic mutation) or also in the germ line cells (germline mutation) (Fig.  1a ). This would prevent mutations and polymorphisms from being incorrectly annotated in a sequencing project, with potential deleterious effects on the efficacy of genomics applied to precision medicine, as recently highlighted in recent studies [ 45 – 47 ].

Nomenclature of variants according to sequencing design. In a paired approach ( a ), diseased (tumor) DNA and DNA from the germline (blood, saliva, or other non-diseased tissue) have been extracted and individually sequenced and mapped against a human genome reference assembly. If there are common variants found in both the tumor and germline DNA, they should be called germline mutations. If there are variants found only in tumor DNA, they should be called somatic mutations. In a non-paired approach of variant detection ( b ), only diseased DNA is extracted from the tissue of interest. The extracted DNA has been sequenced and mapped against a human genome reference assembly and differences as compared with the reference will be labeled as variants

In the case a sequencing project did not include as a reference the germ-line DNA of an individual, the term “mutation” could not be used and should be replaced by the neutral term “variant” (Fig.  1b ), as previously suggested [ 47 ]. Therefore, in the sequencing report the alternative use of the term “mutation” or “variant” will also clarify which kind of reference was adopted. We anticipate that this approach will encourage the use of referencing germline DNA in a sequencing project and will allow an immediate comparison between studies that used the same referencing method. Importantly, the term “polymorphism” should only be used in the context of a population. Accordingly, this term cannot be approved to classify variants in personal genomics.

Abbreviations

Single nucleotide polymorphism

Human Genome Variation Society

Venter JC, Adams MD, Myers EW, Li PW, Mural RJ, Sutton GG, et al. The sequence of the human genome. Science. 2001;291:1304–51.

Article   CAS   PubMed   Google Scholar  

Lander ES, Linton LM, Birren B, Nusbaum C, Zody MC, Baldwin J, et al. Initial sequencing and analysis of the human genome. Nature. 2001;409:860–921.

Levy S, Sutton G, Ng PC, Feuk L, Halpern AL, Walenz BP, et al. The diploid genome sequence of an individual human. PLoS Biol. 2007;5:e254.

Article   PubMed   PubMed Central   Google Scholar  

Sachidanandam R, Weissman D, Schmidt SC, Kakol JM, Stein LD, Marth G, et al. A map of human genome sequence variation containing 1.42 million single nucleotide polymorphisms. Nature. 2001;409:928–33.

Schneider JA, Pungliya MS, Choi JY, Jiang R, Sun XJ, Salisbury BA, et al. DNA variability of human genes. Mech Ageing Dev. 2003;124:17–25.

Jorde LB, Wooding SP. Genetic variation, classification and ‘race’. Nat Genet. 2004;36:S28–33.

Tishkoff SA, Kidd KK. Implications of biogeography of human populations for ‘race’ and medicine. Nat Genet. 2004;36:S21–7.

Lander ES. Initial impact of the sequencing of the human genome. Nature. 2011;470:187–97.

Gonzaga-Jauregui C, Lupski JR, Gibbs RA. Human genome sequencing in health and disease. Annu Rev Med. 2012;63:35–61.

Article   CAS   PubMed   PubMed Central   Google Scholar  

Metzker ML. Emerging technologies in DNA sequencing. Genome Res. 2005;15:1767–76.

Metzker ML. Sequencing technologies - the next generation. Nat Rev Genet. 2010;11:31–46.

Kircher M, Kelso J. High-throughput DNA sequencing--concepts and limitations. Bioessays. 2010;32:524–36.

What is the difference between polymorphism and a mutation? [ http://www.researchgate.net/post/What_is_the_difference_between_polymorphism_and_a_mutation ]

Cordero P, Ashley EA. Whole-genome sequencing in personalized therapeutics. Clin Pharmacol Ther. 2012;91:1001–9.

Shendure J, Lieberman Aiden E. The expanding scope of DNA sequencing. Nat Biotechnol. 2012;30:1084–94.

Boyd SD. Diagnostic applications of high-throughput DNA sequencing. Annu Rev Pathol. 2013;8:381–410.

Knoppers BM, Senecal K, Borry P, Avard D. Whole-genome sequencing in newborn screening programs. Sci Transl Med. 2014;6:229cm222.

Article   Google Scholar  

Landau YE, Lichter-Konecki U, Levy HL. Genomics in newborn screening. J Pediatr. 2014;164:14–9.

Article   PubMed   Google Scholar  

Snyder MW, Simmons LE, Kitzman JO, Santillan DA, Santillan MK, Gammill HS, et al. Noninvasive fetal genome sequencing: a primer. Prenat Diagn. 2013;33:547–54.

Beaudet AL, Tsui LC. A suggested nomenclature for designating mutations. Hum Mutat. 1993;2:245–8.

Beutler E. The designation of mutations. Am J Hum Genet. 1993;53:783–5.

CAS   PubMed   PubMed Central   Google Scholar  

Condit CM, Achter PJ, Lauer I, Sefcovic E. The changing meanings of “mutation:” A contextualized study of public discourse. Hum Mutat. 2002;19:69–75.

Vissers LE, de Vries BB, Osoegawa K, Janssen IM, Feuth T, Choy CO, et al. Array-based comparative genomic hybridization for the genomewide detection of submicroscopic chromosomal abnormalities. Am J Hum Genet. 2003;73:1261–70.

Iafrate AJ, Feuk L, Rivera MN, Listewnik ML, Donahoe PK, Qi Y, et al. Detection of large-scale variation in the human genome. Nat Genet. 2004;36:949–51.

Sebat J, Lakshmi B, Troge J, Alexander J, Young J, Lundin P, et al. Large-scale copy number polymorphism in the human genome. Science. 2004;305:525–8.

Brookes AJ. The essence of SNPs. Gene. 1999;234:177–86.

Aerts J, Wetzels Y, Cohen N, Aerssens J. Data mining of public SNP databases for the selection of intragenic SNPs. Hum Mutat. 2002;20:162–73.

Lee EK, Gorospe M. Coding region: the neglected post-transcriptional code. RNA Biol. 2011;8:44–8.

Chanock S. Candidate genes and single nucleotide polymorphisms (SNPs) in the study of human disease. Dis Markers. 2001;17:89–98.

Schildgen V, Schildgen O. How is a molecular polymorphism defined? Cancer. 2013;119:1608.

Auer PL, Johnsen JM, Johnson AD, Logsdon BA, Lange LA, Nalls MA, et al. Imputation of exome sequence variants into population- based samples and blood-cell-trait-associated loci in African Americans: NHLBI GO Exome Sequencing Project. Am J Hum Genet. 2012;91:794–808.

Myles S, Davison D, Barrett J, Stoneking M, Timpson N. Worldwide population differentiation at disease-associated SNPs. BMC Med Genomics. 2008;1:22.

Piel FB, Patil AP, Howes RE, Nyangiri OA, Gething PW, Williams TN, et al. Global distribution of the sickle cell gene and geographical confirmation of the malaria hypothesis. Nat Commun. 2010;1:104.

Hassell KL. Population estimates of sickle cell disease in the U.S. Am J Prev Med. 2010;38:S512–21.

Lanclos KD, Oner C, Dimovski AJ, Gu YC, Huisman TH. Sequence variations in the 5’ flanking and IVS-II regions of the G gamma- and A gamma-globin genes of beta S chromosomes with five different haplotypes. Blood. 1991;77:2488–96.

CAS   PubMed   Google Scholar  

Oner C, Dimovski AJ, Olivieri NF, Schiliro G, Codrington JF, Fattoum S, et al. Beta S haplotypes in various world populations. Hum Genet. 1992;89:99–104.

Lapoumeroulie C, Dunda O, Ducrocq R, Trabuchet G, Mony-Lobe M, Bodo JM, et al. A novel sickle cell mutation of yet another origin in Africa: the Cameroon type. Hum Genet. 1992;89:333–7.

Salih NA, Hussain AA, Almugtaba IA, Elzein AM, Elhassan IM, Khalil EA, et al. Loss of balancing selection in the betaS globin locus. BMC Med Genet. 2010;11:21.

McCarthy MI. Genomics, type 2 diabetes, and obesity. N Engl J Med. 2010;363:2339–50.

Consortium TEP. An integrated encyclopedia of DNA elements in the human genome. Nature. 2012;489:57–74.

Tuxen IV, Jonson L, Santoni-Rugiu E, Hasselby JP, Nielsen FC, Lassen U. Personalized oncology: genomic screening in phase 1. Apmis. 2014;122:723–33.

Knudson Jr AG. Mutation and cancer: statistical study of retinoblastoma. Proc Natl Acad Sci U S A. 1971;68:820–3.

Knudson AG. Two genetic hits (more or less) to cancer. Nat Rev Cancer. 2001;1:157–62.

Christiaans I, Kenter SB, Brink HC, van Os TA, Baas F, van den Munckhof P, et al. Germline SMARCB1 mutation and somatic NF2 mutations in familial multiple meningiomas. J Med Genet. 2011;48:93–7.

Jones S, Anagnostou V, Lytle K, Parpart-Li S, Nesselbush M, Riley DR, et al. Personalized genomic analyses for cancer mutation discovery and interpretation. Sci Transl Med. 2015;7:283ra253.

MacArthur DG, Manolio TA, Dimmock DP, Rehm HL, Shendure J, Abecasis GR, et al. Guidelines for investigating causality of sequence variants in human disease. Nature. 2014;508:469–76.

Richards S, Aziz N, Bale S, Bick D, Das S, Gastier-Foster J, et al. Standards and guidelines for the interpretation of sequence variants: a joint consensus recommendation of the American College of Medical Genetics and Genomics and the Association for Molecular Pathology. Genet Med. 2015;17:405–23.

Hoenigsberg H. Cell biology, molecular embryology, Lamarckian and Darwinian selection as evolvability. Genet Mol Res. 2003;2:7–28.

Ford EB. Polymorphism and taxonomy. Oxford: Clarendon; 1940.

Google Scholar  

Download references

Acknowledgements

We thank Dr. Oliver Schildgen for prompting us to write this white paper aimed at forming a consensus of these terms and editing some portions of the manuscript. We also thank all the participants of the Research Gate thread for the useful and stimulating contributions, which we have tried to summarize in this manuscript.

Author information

Authors and affiliations.

Danbury Hospital Research Institute, Western Connecticut Health Network, 131 West Street, Danbury, CT, 06810, USA

Roshan Karki, Deep Pandya & Cristiano Ferlini

Department of Epidemiology and Biostatistics, Case Western Reserve University School of Medicine, Cleveland, OH, USA

Robert C. Elston

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to Cristiano Ferlini .

Additional information

Competing interests.

The authors declare that they have no competing interests.

Authors’ contributions

RK, DP, RCE and CF conceived and wrote the manuscript. All authors read and approved the final manuscript.

Rights and permissions

Open Access  This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made.

The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

To view a copy of this licence, visit https://creativecommons.org/licenses/by/4.0/ .

The Creative Commons Public Domain Dedication waiver ( https://creativecommons.org/publicdomain/zero/1.0/ ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Cite this article.

Karki, R., Pandya, D., Elston, R.C. et al. Defining “mutation” and “polymorphism” in the era of personal genomics. BMC Med Genomics 8 , 37 (2015). https://doi.org/10.1186/s12920-015-0115-z

Download citation

Received : 02 October 2014

Accepted : 06 July 2015

Published : 15 July 2015

DOI : https://doi.org/10.1186/s12920-015-0115-z

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Personal genomics
  • Precision medicine
  • DNA sequencing
  • DNA variants
  • Human genome

BMC Medical Genomics

ISSN: 1755-8794

term paper genetic mutations

  • Introduction
  • Conclusions
  • Article Information

Hazard ratio for obesity was modeled according to mean daily step counts and 25th, 50th, and 75th percentile PRS for body mass index. Shaded regions represent 95% CIs. Model is adjusted for age, sex, mean baseline step counts, cancer status, coronary artery disease status, systolic blood pressure, alcohol use, educational level, and a PRS × mean steps interaction term.

Mean daily steps and polygenic risk score (PRS) for higher body mass index are independently associated with hazard for obesity. Hazard ratios model the difference between the 75th and 25th percentiles for continuous variables. CAD indicate coronary artery disease; and SBP, systolic blood pressure.

Each point estimate is indexed to a hazard ratio for obesity of 1.00 (BMI [calculated as weight in kilograms divided by height in meters squared] ≥30). Error bars represent 95% CIs.

eTable. Cumulative Incidence Estimates of Obesity Based on Polygenic Risk Score for Body Mass Index and Mean Daily Steps at 1, 3, and 5 Years

eFigure 1. CONSORT Diagram

eFigure 2. Risk of Incident Obesity Modeled by Mean Daily Step Count and Polygenic Risk Scores Adjusted for Baseline Body Mass Index

Data Sharing Statement

See More About

Sign up for emails based on your interests, select your interests.

Customize your JAMA Network experience by selecting one or more topics from the list below.

  • Academic Medicine
  • Acid Base, Electrolytes, Fluids
  • Allergy and Clinical Immunology
  • American Indian or Alaska Natives
  • Anesthesiology
  • Anticoagulation
  • Art and Images in Psychiatry
  • Artificial Intelligence
  • Assisted Reproduction
  • Bleeding and Transfusion
  • Caring for the Critically Ill Patient
  • Challenges in Clinical Electrocardiography
  • Climate and Health
  • Climate Change
  • Clinical Challenge
  • Clinical Decision Support
  • Clinical Implications of Basic Neuroscience
  • Clinical Pharmacy and Pharmacology
  • Complementary and Alternative Medicine
  • Consensus Statements
  • Coronavirus (COVID-19)
  • Critical Care Medicine
  • Cultural Competency
  • Dental Medicine
  • Dermatology
  • Diabetes and Endocrinology
  • Diagnostic Test Interpretation
  • Drug Development
  • Electronic Health Records
  • Emergency Medicine
  • End of Life, Hospice, Palliative Care
  • Environmental Health
  • Equity, Diversity, and Inclusion
  • Facial Plastic Surgery
  • Gastroenterology and Hepatology
  • Genetics and Genomics
  • Genomics and Precision Health
  • Global Health
  • Guide to Statistics and Methods
  • Hair Disorders
  • Health Care Delivery Models
  • Health Care Economics, Insurance, Payment
  • Health Care Quality
  • Health Care Reform
  • Health Care Safety
  • Health Care Workforce
  • Health Disparities
  • Health Inequities
  • Health Policy
  • Health Systems Science
  • History of Medicine
  • Hypertension
  • Images in Neurology
  • Implementation Science
  • Infectious Diseases
  • Innovations in Health Care Delivery
  • JAMA Infographic
  • Law and Medicine
  • Leading Change
  • Less is More
  • LGBTQIA Medicine
  • Lifestyle Behaviors
  • Medical Coding
  • Medical Devices and Equipment
  • Medical Education
  • Medical Education and Training
  • Medical Journals and Publishing
  • Mobile Health and Telemedicine
  • Narrative Medicine
  • Neuroscience and Psychiatry
  • Notable Notes
  • Nutrition, Obesity, Exercise
  • Obstetrics and Gynecology
  • Occupational Health
  • Ophthalmology
  • Orthopedics
  • Otolaryngology
  • Pain Medicine
  • Palliative Care
  • Pathology and Laboratory Medicine
  • Patient Care
  • Patient Information
  • Performance Improvement
  • Performance Measures
  • Perioperative Care and Consultation
  • Pharmacoeconomics
  • Pharmacoepidemiology
  • Pharmacogenetics
  • Pharmacy and Clinical Pharmacology
  • Physical Medicine and Rehabilitation
  • Physical Therapy
  • Physician Leadership
  • Population Health
  • Primary Care
  • Professional Well-being
  • Professionalism
  • Psychiatry and Behavioral Health
  • Public Health
  • Pulmonary Medicine
  • Regulatory Agencies
  • Reproductive Health
  • Research, Methods, Statistics
  • Resuscitation
  • Rheumatology
  • Risk Management
  • Scientific Discovery and the Future of Medicine
  • Shared Decision Making and Communication
  • Sleep Medicine
  • Sports Medicine
  • Stem Cell Transplantation
  • Substance Use and Addiction Medicine
  • Surgical Innovation
  • Surgical Pearls
  • Teachable Moment
  • Technology and Finance
  • The Art of JAMA
  • The Arts and Medicine
  • The Rational Clinical Examination
  • Tobacco and e-Cigarettes
  • Translational Medicine
  • Trauma and Injury
  • Treatment Adherence
  • Ultrasonography
  • Users' Guide to the Medical Literature
  • Vaccination
  • Venous Thromboembolism
  • Veterans Health
  • Women's Health
  • Workflow and Process
  • Wound Care, Infection, Healing

Get the latest research based on your areas of interest.

Others also liked.

  • Download PDF
  • X Facebook More LinkedIn

Brittain EL , Han L , Annis J, et al. Physical Activity and Incident Obesity Across the Spectrum of Genetic Risk for Obesity. JAMA Netw Open. 2024;7(3):e243821. doi:10.1001/jamanetworkopen.2024.3821

Manage citations:

© 2024

  • Permissions

Physical Activity and Incident Obesity Across the Spectrum of Genetic Risk for Obesity

  • 1 Division of Cardiovascular Medicine, Vanderbilt University Medical Center, Nashville, Tennessee
  • 2 Center for Digital Genomic Medicine, Department of Medicine, Vanderbilt University Medical Center, Nashville, Tennessee
  • 3 Division of Genetic Medicine, Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, Tennessee
  • 4 Vanderbilt Institute of Clinical and Translational Research, Vanderbilt University Medical Center, Nashville, Tennessee
  • 5 Department of Medicine, Vanderbilt University Medical Center, Nashville, Tennessee
  • 6 Department of Pharmacology, Vanderbilt University Medical Center, Nashville, Tennessee
  • 7 Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, Tennessee
  • 8 Department of Biomedical Engineering, Vanderbilt University Medical Center, Nashville, Tennessee
  • 9 Department of Biostatistics, Vanderbilt University Medical Center, Nashville, Tennessee
  • 10 Department of Psychiatry and Behavioral Sciences, Vanderbilt University Medical Center, Nashville, Tennessee

Question   Does the degree of physical activity associated with incident obesity vary by genetic risk?

Findings   In this cohort study of 3124 adults, individuals at high genetic risk of obesity needed higher daily step counts to reduce the risk of obesity than those at moderate or low genetic risk.

Meaning   These findings suggest that individualized physical activity recommendations that incorporate genetic background may reduce obesity risk.

Importance   Despite consistent public health recommendations, obesity rates in the US continue to increase. Physical activity recommendations do not account for individual genetic variability, increasing risk of obesity.

Objective   To use activity, clinical, and genetic data from the All of Us Research Program (AoURP) to explore the association of genetic risk of higher body mass index (BMI) with the level of physical activity needed to reduce incident obesity.

Design, Setting, and Participants   In this US population–based retrospective cohort study, participants were enrolled in the AoURP between May 1, 2018, and July 1, 2022. Enrollees in the AoURP who were of European ancestry, owned a personal activity tracking device, and did not have obesity up to 6 months into activity tracking were included in the analysis.

Exposure   Physical activity expressed as daily step counts and a polygenic risk score (PRS) for BMI, calculated as weight in kilograms divided by height in meters squared.

Main Outcome and Measures   Incident obesity (BMI ≥30).

Results   A total of 3124 participants met inclusion criteria. Among 3051 participants with available data, 2216 (73%) were women, and the median age was 52.7 (IQR, 36.4-62.8) years. The total cohort of 3124 participants walked a median of 8326 (IQR, 6499-10 389) steps/d over a median of 5.4 (IQR, 3.4-7.0) years of personal activity tracking. The incidence of obesity over the study period increased from 13% (101 of 781) to 43% (335 of 781) in the lowest and highest PRS quartiles, respectively ( P  = 1.0 × 10 −20 ). The BMI PRS demonstrated an 81% increase in obesity risk ( P  = 3.57 × 10 −20 ) while mean step count demonstrated a 43% reduction ( P  = 5.30 × 10 −12 ) when comparing the 75th and 25th percentiles, respectively. Individuals with a PRS in the 75th percentile would need to walk a mean of 2280 (95% CI, 1680-3310) more steps per day (11 020 total) than those at the 50th percentile to have a comparable risk of obesity. To have a comparable risk of obesity to individuals at the 25th percentile of PRS, those at the 75th percentile with a baseline BMI of 22 would need to walk an additional 3460 steps/d; with a baseline BMI of 24, an additional 4430 steps/d; with a baseline BMI of 26, an additional 5380 steps/d; and with a baseline BMI of 28, an additional 6350 steps/d.

Conclusions and Relevance   In this cohort study, the association between daily step count and obesity risk across genetic background and baseline BMI were quantified. Population-based recommendations may underestimate physical activity needed to prevent obesity among those at high genetic risk.

In 2000, the World Health Organization declared obesity the greatest threat to the health of Westernized nations. 1 In the US, obesity accounts for over 400 000 deaths per year and affects nearly 40% of the adult population. Despite the modifiable nature of obesity through diet, exercise, and pharmacotherapy, rates have continued to increase.

Physical activity recommendations are a crucial component of public health guidelines for maintaining a healthy weight, with increased physical activity being associated with a reduced risk of obesity. 2 - 4 Fitness trackers and wearable devices have provided an objective means to capture physical activity, and their use may be associated with weight loss. 5 Prior work leveraging these devices has suggested that taking around 8000 steps/d substantially mitigates risk of obesity. 3 , 4 However, current recommendations around physical activity do not take into account other contributors such as caloric intake, energy expenditure, or genetic background, likely leading to less effective prevention of obesity for many people. 6

Obesity has a substantial genetic contribution, with heritability estimates ranging from 40% to 70%. 7 , 8 Prior studies 9 - 11 have shown an inverse association between genetic risk and physical activity with obesity, whereby increasing physical activity can help mitigate higher genetic risk for obesity. These results have implications for physical activity recommendations on an individual level. Most of the prior work 9 - 11 focused on a narrow set of obesity-associated variants or genes and relied on self-reported physical activity, and more recent work using wearable devices has been limited to 7 days of physical activity measurements. 12 Longer-term capture in large populations will be required to accurately estimate differences in physical activity needed to prevent incident obesity.

We used longitudinal activity monitoring and genome sequencing data from the All of Us Research Program (AoURP) to quantify the combined association of genetic risk for body mass index (BMI; calculated as weight in kilograms divided by height in meters squared) and physical activity with the risk of incident obesity. Activity monitoring was quantified as daily step counts obtained from fitness tracking devices. Genetic risk was quantified by using a polygenic risk score (PRS) from a large-scale genomewide association study (GWAS) of BMI. 13 We quantified the mean daily step count needed to overcome genetic risk for increased BMI. These findings represent an initial step toward personalized exercise recommendations that integrate genetic information.

Details on the design and execution of the AoURP have been published previously. 14 The present study used AoURP Controlled Tier dataset, version 7 (C2022Q4R9), with data from participants enrolled between May 1, 2018, and July 1, 2022. Participants who provided informed consent could share data from their own activity tracking devices from the time their accounts were first created, which may precede the enrollment date in AoURP. We followed the Strengthening the Reporting of Observational Studies in Epidemiology ( STROBE ) reporting guideline. In this study, only the authorized authors who completed All of Us Responsible Conduct of Research training accessed the deidentified data from the Researcher Workbench (a secured cloud-based platform). Since the authors were not directly involved with the participants, institutional review board review was exempted in compliance with AoURP policy.

Activity tracking data for this study came from the Bring Your Own Device program that allowed individuals who already owned a tracking device (Fitbit, Inc) to consent to link their activity data with other data in the AoURP. By registering their personal device on the AoURP patient portal, patients could share all activity data collected since the creation of their personal device account. For many participants, this allowed us to examine fitness activity data collected prior to enrollment in the AoURP. Activity data in AoURP are reported as daily step counts. We excluded days with fewer than 10 hours of wear time to enrich our cohort for individuals with consistently high wear time. The initial personal activity device cohort consisted of 12 766 individuals. Consistent with our prior data curation approach, days with less than 10 hours of wear time, less than 100 steps, or greater than 45 000 steps or for which the participant was younger than 18 years were removed. For time-varying analyses, mean daily steps were calculated on a monthly basis for each participant. Months with fewer than 15 valid days of monitoring were removed.

The analytic cohort included only individuals with a BMI of less than 30 at the time activity monitoring began. The primary outcome was incident obesity, defined as a BMI of 30 or greater documented in the medical record at least 6 months after initiation of activity monitoring. The latter stipulation reduced the likelihood that having obesity predated the beginning of monitoring but had not yet been clinically documented. We extracted BMI values and clinical characteristics from longitudinal electronic health records (EHRs) for the consenting participants who were associated with a health care provider organization funded by the AoURP. The EHR data have been standardized using the Observational Medical Outcomes Partnership Common Data Model. 15 In the AoURP, upon consent, participants are asked to complete the Basics survey, in which they may self-report demographic characteristics such as race, ethnicity, and sex at birth.

We filtered the data to include only biallelic, autosomal single-nucleotide variants (SNVs) that had passed AoURP initial quality control. 16 We then removed duplicate-position SNVs and kept only individual genotypes with a genotype quality greater than 20. We further filtered the SNVs based on their Hardy-Weinberg equilibrium P value (>1.0 × 10 −15 ) and missing rate (<5%) across all samples. Next, we divided the samples into 6 groups (Admixed American, African, East Asian, European, Middle Eastern, and South Asian) based on their estimated ancestral populations 16 , 17 and further filtered the SNVs within each population based on minor allele frequency (MAF) (>0.01), missing rate (<0.02), and Hardy-Weinberg equilibrium P value (>1.0 × 10 −6 ). The SNVs were mapped from Genome Reference Consortium Human Build 38 with coordinates to Build 37. Because the existing PRS models have limited transferability across ancestry groups and to ensure appropriate power of the subsequent PRS analysis, we limited our analysis to the populations who had a sample size of greater than 500, resulting in 5964 participants of European ancestry with 5 515 802 common SNVs for analysis.

To generate principal components, we excluded the regions with high linkage disequilibrium, including chr5:44-51.5 megabase (Mb), chr6:25-33.5 Mb, chr8:8-12 Mb, and chr11:45-57 Mb. We then pruned the remaining SNVs using PLINK, version 1.9 (Harvard University), pairwise independence function with 1-kilobase window shifted by 50 base pairs and requiring r 2 < 0.05 between any pair, resulting in 100 983 SNPs for further analysis. 18 Principal component analysis was run using PLINK, version 1.9. The European ancestry linkage disequilibrium reference panel from the 1000 Genomes Project phase 3 was downloaded, and nonambiguous SNPs with MAF greater than 0.01 were kept in the largest European ancestry GWAS summary statistics of BMI. 13 We manually harmonized the strand-flipping SNPs among the SNP information file, GWAS summary statistics files, and the European ancestry PLINK extended map files (.bim).

We used PRS–continuous shrinkage to infer posterior SNP effect sizes under continuous shrinkage priors with a scaling parameter set to 0.01, reflecting the polygenic architecture of BMI. GWAS summary statistics of BMI measured in 681 275 individuals of European ancestry was used to estimate the SNP weights. 19 The scoring command in PLINK, version 1.9, was used to produce the genomewide scores of the AoURP European individuals with their quality-controlled SNP genotype data and these derived SNP weights. 20 Finally, by using the genomewide scores as the dependent variable and the 10 principal components as the independent variable, we performed linear regression, and the obtained residuals were kept for the subsequent analysis. To check the performance of the PRS estimate, we first fit a generalized regression model with obesity status as the dependent variable and the PRS as the independent variable with age, sex, and the top 10 principal components of genetic ancestry as covariates. We then built a subset logistic regression model, which only uses the same set of covariates. By comparing the full model with the subset model, we measured the incremental Nagelkerke R 2 value to quantify how much variance in obesity status was explained by the PRS.

Differences in clinical characteristics across PRS quartiles were assessed using the Wilcoxon rank sum or Kruskal-Wallis test for continuous variables and the Pearson χ 2 test for categorical variables. Cox proportional hazards regression models were used to examine the association among daily step count (considered as a time-varying variable), PRS, and the time to event for obesity, adjusting for age, sex, mean baseline step counts, cancer status, coronary artery disease status, systolic blood pressure, alcohol use, educational level, and interaction term of PRS × mean steps. We presented these results stratified by baseline BMI and provided a model including baseline BMI in eFigure 2 in Supplement 1 as a secondary analysis due to collinearity between BMI and PRS.

Cox proportional hazards regression models were fit on a multiply imputed dataset. Multiple imputation was performed for baseline BMI, alcohol use, educational status, systolic blood pressure, and smoking status using bootstrap and predictive mean matching with the aregImpute function in the Hmisc package of R, version 4.2.2 (R Project for Statistical Computing). Continuous variables were modeled as restricted cubic splines with 3 knots, unless the nonlinear term was not significant, in which case it was modeled as a linear term. Fits and predictions of the Cox proportional hazards regression models were obtained using the rms package in R, version 4.2.2. The Cox proportional hazards regression assumptions were checked using the cox.zph function from the survival package in R, version 4.2.2.

To identify the combinations of PRS and mean daily step counts associated with a hazard ratio (HR) of 1.00, we used a 100-knot spline function to fit the Cox proportional hazards regression ratio model estimations across a range of mean daily step counts for each PRS percentile. We then computed the inverse of the fitted spline function to determine the mean daily step count where the HR equals 1.00 for each PRS percentile. We repeated this process for multiple PRS percentiles to generate a plot of mean daily step counts as a function of PRS percentiles where the HR was 1.00. To estimate the uncertainty around these estimations, we applied a similar spline function to the upper and lower estimated 95% CIs of the Cox proportional hazards regression model to find the 95% CIs for the estimated mean daily step counts at each PRS percentile. Two-sided P < .05 indicated statistical significance.

We identified 3124 participants of European ancestry without obesity at baseline who agreed to link their personal activity data and EHR data and had available genome sequencing. Among those with available data, 2216 of 3051 (73%) were women and 835 of 3051 (27%) were men, and the median age was 52.7 (IQR, 36.4-62.8) years. In terms of race and ethnicity, 2958 participants (95%) were White compared with 141 participants (5%) who were of other race or ethnicity (which may include Asian, Black or African American, Middle Eastern or North African, Native Hawaiian or Other Pacific Islander, multiple races or ethnicities, and unknown race or ethnicity) ( Table ). The analytic sample was restricted to individuals assigned European ancestry based on the All of Us Genomic Research Data Quality Report. 16 A study flowchart detailing the creation of the analytic dataset is provided in eFigure 1 in Supplement 1 . The BMI-based PRS explained 8.3% of the phenotypic variation in obesity (β = 1.76; P  = 2 × 10 −16 ). The median follow-up time was 5.4 (IQR, 3.4-7.0) years and participants walked a median of 8326 (IQR, 6499-10 389) steps/d. The incidence of obesity over the study period was 13% (101 of 781 participants) in the lowest PRS quartile and 43% (335 of 781 participants) in the highest PRS quartile ( P  = 1.0 × 10 −20 ). We observed a decrease in median daily steps when moving from lowest (8599 [IQR, 6751-10 768]) to highest (8115 [IQR, 6340-10 187]) PRS quartile ( P  = .01).

We next modeled obesity risk stratified by PRS percentile with the 50th percentile indexed to an HR for obesity of 1.00 ( Figure 1 ). The association between PRS and incident obesity was direct ( P  = .001) and linear (chunk test for nonlinearity was nonsignificant [ P  = .07]). The PRS and mean daily step count were both independently associated with obesity risk ( Figure 2 ). The 75th percentile BMI PRS demonstrated an 81% increase in obesity risk (HR, 1.81 [95% CI, 1.59-2.05]; P  = 3.57 × 10 −20 ) when compared with the 25th percentile BMI PRS, whereas the 75th percentile median step count demonstrated a 43% reduction in obesity risk (HR, 0.57 [95% CI, 0.49-0.67]; P  = 5.30 × 10 −12 ) when compared with the 25th percentile step count. The PRS × mean steps interaction term was not significant (χ 2 = 1.98; P  = .37).

Individuals with a PRS at the 75th percentile would need to walk a mean of 2280 (95% CI, 1680-3310) more steps per day (11 020 total) than those at the 50th percentile to reduce the HR for obesity to 1.00 ( Figure 1 ). Conversely, those in the 25th percentile PRS could reach an HR of 1.00 by walking a mean of 3660 (95% CI, 2180-8740) fewer steps than those at the 50th percentile PRS. When assuming a median daily step count of 8740 (cohort median), those in the 75th percentile PRS had an HR for obesity of 1.33 (95% CI, 1.25-1.41), whereas those at the 25th percentile PRS had an obesity HR of 0.74 (95% CI, 0.69-0.79).

The mean daily step count required to achieve an HR for obesity of 1.00 across the full PRS spectrum and stratified by baseline BMI is shown in Figure 3 . To reach an HR of 1.00 for obesity, when stratified by baseline BMI of 22, individuals at the 50th percentile PRS would need to achieve a mean daily step count of 3290 (additional 3460 steps/d); for a baseline BMI of 24, a mean daily step count of 7590 (additional 4430 steps/d); for a baseline BMI of 26, a mean daily step count of 11 890 (additional 5380 steps/d); and for a baseline BMI of 28, a mean daily step count of 16 190 (additional 6350 steps/d).

When adding baseline BMI to the full Cox proportional hazards regression model, daily step count and BMI PRS both remain associated with obesity risk. When comparing individuals at the 75th percentile with those at the 25th percentile, the BMI PRS is associated with a 61% increased risk of obesity (HR, 1.61 [95% CI, 1.45-1.78]). Similarly, when comparing the 75th with the 25th percentiles, daily step count was associated with a 38% lower risk of obesity (HR, 0.62 [95% CI, 0.53-0.72]) (eFigure 2 in Supplement 1 ).

The cumulative incidence of obesity increases over time and with fewer daily steps and higher PRS. The cumulative incidence of obesity would be 2.9% at the 25th percentile, 3.9% at the 50th percentile, and 5.2% at the 75th percentile for PRS in year 1; 10.5% at the 25th percentile, 14.0% at the 50th percentile, and 18.2% at the 75th percentile for PRS in year 3; and 18.5% at the 25th percentile, 24.3% at the 50th percentile, and 30.9% at the 75th percentile for PRS in year 5 ( Figure 4 ). The eTable in Supplement 1 models the expected cumulative incidence of obesity at 1, 3, and 5 years based on PRS and assumed mean daily steps of 7500, 10 000, and 12 500.

We examined the combined association of daily step counts and genetic risk for increased BMI with the incidence of obesity in a large national sample with genome sequencing and long-term activity monitoring data. Lower daily step counts and higher BMI PRS were both independently associated with increased risk of obesity. As the PRS increased, the number of daily steps associated with lower risk of obesity also increased. By combining these data sources, we derived an estimate of the daily step count needed to reduce the risk of obesity based on an individual’s genetic background. Importantly, our findings suggest that genetic risk for obesity is not deterministic but can be overcome by increasing physical activity.

Our findings align with those of prior literature 9 indicating that engaging in physical activity can mitigate genetic obesity risk and highlight the importance of genetic background for individual health and wellness. Using the data from a large population-based sample, Li et al 9 characterized obesity risk by genotyping 12 susceptibility loci and found that higher self-reported physical activity was associated with a 40% reduction in genetic predisposition to obesity. Our study extends these results in 2 important ways. First, we leveraged objectively measured longitudinal activity data from commercial devices to focus on physical activity prior to and leading up to a diagnosis of obesity. Second, we used a more comprehensive genomewide risk assessment in the form of a PRS. Our results indicate that daily step count recommendations to reduce obesity risk may be personalized based on an individual’s genetic background. For instance, individuals with higher genetic risk (ie, 75th percentile PRS) would need to walk a mean of 2280 more steps per day than those at the 50th percentile of genetic risk to have a comparable risk of obesity.

These results suggest that population-based recommendations that do not account for genetic background may not accurately represent the amount of physical activity needed to reduce the risk of obesity. Population-based exercise recommendations may overestimate or underestimate physical activity needs, depending on one’s genetic background. Underestimation of physical activity required to reduce obesity risk has the potential to be particularly detrimental to public health efforts to reduce weight-related morbidity. As such, integration of activity and genetic data could facilitate personalized activity recommendations that account for an individual’s genetic profile. The widespread use of wearable devices and the increasing demand for genetic information from both clinical and direct-to-consumer sources may soon permit testing the value of personalized activity recommendations. Efforts to integrate wearable devices and genomic data into the EHR further support the potential future clinical utility of merging these data sources to personalize lifestyle recommendations. Thus, our findings support the need for a prospective trial investigating the impact of tailoring step counts by genetic risk on chronic disease outcomes.

The most important limitation of this work is the lack of diversity and inclusion only of individuals with European ancestry. These findings will need validation in a more diverse population. Our cohort only included individuals who already owned a fitness tracking device and agreed to link their activity data to the AoURP dataset, which may not be generalizable to other populations. We cannot account for unmeasured confounding, and the potential for reverse causation still exists. We attempted to diminish the latter concern by excluding prevalent obesity and incident cases within the first 6 months of monitoring. Genetic risk was simplified to be specific to increased BMI; however, genetic risk for other cardiometabolic conditions could also inform obesity risk. Nongenetic factors that contribute to obesity risk such as dietary patterns were not available, reducing the explanatory power of the model. It is unlikely that the widespread use of drug classes targeting weight loss affects the generalizability of our results, because such drugs are rarely prescribed for obesity prevention, and our study focused on individuals who were not obese at baseline. Indeed, less than 0.5% of our cohort was exposed to a medication class targeting weight loss (phentermine, orlistat, or glucagonlike peptide-1 receptor agonists) prior to incident obesity or censoring. Finally, some fitness activity tracking devices may not capture nonambulatory activity as well as triaxial accelerometers.

This cohort study used longitudinal activity data from commercial wearable devices, genome sequencing, and clinical data to support the notion that higher daily step counts can mitigate genetic risk for obesity. These results have important clinical and public health implications and may offer a novel strategy for addressing the obesity epidemic by informing activity recommendations that incorporate genetic information.

Accepted for Publication: January 30, 2024.

Published: March 27, 2024. doi:10.1001/jamanetworkopen.2024.3821

Open Access: This is an open access article distributed under the terms of the CC-BY License . © 2024 Brittain EL et al. JAMA Network Open .

Corresponding Author: Evan L. Brittain, MD, MSc ( [email protected] ) and Douglas M. Ruderfer, PhD ( [email protected] ), Vanderbilt University Medical Center, 2525 West End Ave, Suite 300A, Nashville, TN 37203.

Author Contributions: Drs Brittain and Ruderfer had full access to all of the data in the study and take responsibility for the integrity of the data and the accuracy of the data analysis.

Concept and design: Brittain, Annis, Master, Roden, Ruderfer.

Acquisition, analysis, or interpretation of data: Brittain, Han, Annis, Master, Hughes, Harris, Ruderfer.

Drafting of the manuscript: Brittain, Han, Annis, Master, Ruderfer.

Critical review of the manuscript for important intellectual content: All authors.

Statistical analysis: Brittain, Han, Annis, Master.

Obtained funding: Brittain, Harris.

Administrative, technical, or material support: Brittain, Annis, Master, Roden.

Supervision: Brittain, Ruderfer.

Conflict of Interest Disclosures: Dr Brittain reported receiving a gift from Google LLC during the conduct of the study. Dr Ruderfer reported serving on the advisory board of Illumina Inc and Alkermes PLC and receiving grant funding from PTC Therapeutics outside the submitted work. No other disclosures were reported.

Funding/Support: The All of Us Research Program is supported by grants 1 OT2 OD026549, 1 OT2 OD026554, 1 OT2 OD026557, 1 OT2 OD026556, 1 OT2 OD026550, 1 OT2 OD 026552, 1 OT2 OD026553, 1 OT2 OD026548, 1 OT2 OD026551, 1 OT2 OD026555, IAA AOD21037, AOD22003, AOD16037, and AOD21041 (regional medical centers); grant HHSN 263201600085U (federally qualified health centers); grant U2C OD023196 (data and research center); 1 U24 OD023121 (Biobank); U24 OD023176 (participant center); U24 OD023163 (participant technology systems center); grants 3 OT2 OD023205 and 3 OT2 OD023206 (communications and engagement); and grants 1 OT2 OD025277, 3 OT2 OD025315, 1 OT2 OD025337, and 1 OT2 OD025276 (community partners) from the National Institutes of Health (NIH). This study is also supported by grants R01 HL146588 (Dr Brittain), R61 HL158941 (Dr Brittain), and R21 HL172038 (Drs Brittain and Ruderfer) from the NIH.

Role of the Funder/Sponsor: The NIH had no role in the design and conduct of the study; collection, management, analysis, and interpretation of the data; preparation, review, or approval of the manuscript; and decision to submit the manuscript for publication.

Data Sharing Statement: See Supplement 2 .

Additional Contributions: The All of Us Research Program would not be possible without the partnership of its participants.

  • Register for email alerts with links to free full-text articles
  • Access PDFs of free articles
  • Manage your interests
  • Save searches and receive search alerts

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • View all journals
  • Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • Research Highlight
  • Published: 12 April 2024

Development

A developmental exit from totipotency

  • Henry Ertl 1  

Nature Reviews Genetics ( 2024 ) Cite this article

Metrics details

  • Chromatin remodelling
  • Totipotent stem cells

The earliest embryonic cells have the potential to produce all differentiated tissues, but this totipotency is lost after zygotic genome activation (ZGA), via unknown mechanisms. A recent study in Nature Genetics by Vega-Sendino et al. has identified how the homeobox transcription factor DUXBL regulates development after ZGA and out of stages marked by totipotency in mice.

Next, the authors tested whether DUXBL has a function similar to that of DUX in activating the 2C-stage transcriptional programme that entails ZGA. They generated embryonic stem cells with inducible expression of DUXBL and found that overexpression of DUXBL resulted in fewer embryonic stem cells becoming 2C-like cells and decreased levels of DUX-induced transcription. These findings suggest that DUXBL antagonizes the function of DUX to drive transcription during the 2C stage that is associated with totipotency.

This is a preview of subscription content, access via your institution

Access options

Access Nature and 54 other Nature Portfolio journals

Get Nature+, our best-value online-access subscription

24,99 € / 30 days

cancel any time

Subscribe to this journal

Receive 12 print issues and online access

176,64 € per year

only 14,72 € per issue

Rent or buy this article

Prices vary by article type

Prices may be subject to local taxes which are calculated during checkout

Original article

Vega-Sendino, M. et al. The homeobox transcription factor DUXBL controls exit from totipotency. Nat. Genet. https://doi.org/10.1038/s41588-024-01692-z (2024)

Article   PubMed   Google Scholar  

Download references

Author information

Authors and affiliations.

Nature Reviews Genetics http://www.nature.com/nrg/

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to Henry Ertl .

Rights and permissions

Reprints and permissions

About this article

Cite this article.

Ertl, H. A developmental exit from totipotency. Nat Rev Genet (2024). https://doi.org/10.1038/s41576-024-00730-0

Download citation

Published : 12 April 2024

DOI : https://doi.org/10.1038/s41576-024-00730-0

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

Quick links

  • Explore articles by subject
  • Guide to authors
  • Editorial policies

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

term paper genetic mutations

Site Logo

  • MyVGL Login
  • Create Account

Veterinary Genetics Laboratory

Young Quarter horse foal standing by its mother on a grassy field.

New Test Available: Equine Juvenile Spinocerebellar Ataxia (EJSCA)

  • by Liza Crissiuma Gershony
  • April 09, 2024

Equine Juvenile Spinocerebellar Ataxia (EJSCA) is an inherited neurologic disease that causes ataxia in American Quarter Horses. The variant causing this disease was identified at UC Davis by Dr. Carrie Finno, Gregory L. Ferraro Endowed Director of the UC Davis Center for Equine Health (CEH), and colleagues, and the scientific paper describing this finding is currently in progress.

Dr. Finno found that affected foals developed ataxia, or incoordination, between 1 and 4 weeks of age. In most affected foals, the hind limbs appeared to be more severely affected than the front limbs. As the disease progressed, these foals would turn the hind limbs to one side, with the front limbs planted on the ground, causing them to appear to walk sideways. Within a few days, the affected foals were unable to stand without assistance and had to be euthanized.

The disease is inherited as an autosomal recessive trait and genetic testing can determine whether a horse is a carrier of the causative variant. Breeding of two carrier animals has a 25% chance of producing an affected foal.

To read more about the test, visit https://vgl.ucdavis.edu/test/equine-juvenile-spinocerebellar-ataxia-ejs…

Contact the VGL

Connect with us on social media.

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • v.205(4); 2017 Apr

The Evolving Definition of the Term “Gene”

Petter portin.

* Laboratory of Genetics, Department of Biology, University of Turku, 20014, Finland

Adam Wilkins

† Institute of Theoretical Biology, Humboldt Universität zu Berlin, 10115, Germany

This paper presents a history of the changing meanings of the term “gene,” over more than a century, and a discussion of why this word, so crucial to genetics, needs redefinition today. In this account, the first two phases of 20th century genetics are designated the “classical” and the “neoclassical” periods, and the current molecular-genetic era the “modern period.” While the first two stages generated increasing clarity about the nature of the gene, the present period features complexity and confusion. Initially, the term “gene” was coined to denote an abstract “unit of inheritance,” to which no specific material attributes were assigned. As the classical and neoclassical periods unfolded, the term became more concrete, first as a dimensionless point on a chromosome, then as a linear segment within a chromosome, and finally as a linear segment in the DNA molecule that encodes a polypeptide chain. This last definition, from the early 1960s, remains the one employed today, but developments since the 1970s have undermined its generality. Indeed, they raise questions about both the utility of the concept of a basic “unit of inheritance” and the long implicit belief that genes are autonomous agents. Here, we review findings that have made the classic molecular definition obsolete and propose a new one based on contemporary knowledge.

IN 1866, Gregor Mendel, a Moravian scientist and Augustinian friar, working in what is today the Czech Republic, laid the foundations of modern genetics with his landmark studies of heredity in the garden pea ( Pisum sativum ) ( Mendel 1866 ). Though he did not speak of “genes”—a term that first appeared decades later—but rather of elements , and even “cell elements” (original German Zellelemente p. 42), it is clear that Mendel was hypothesizing the hereditary behavior of miniscule hidden factors or determinants underlying the stably inherited visible characteristics of an organism, which today we would call genes. This is apparent throughout his publication in his use of abstract letter symbols for hereditary determinants to denote the physical factors underlying the inheritance of characteristics. There is no doubt that he considered the mediators of heredity to be material entities, though he made no conjectures about their nature.

The word “gene” was not coined until early in the 20th century, by the Danish botanist Johannsen (1909) , but it rapidly became fundamental to the then new science of genetics, and eventually to all of biology. Its meaning, however, has been evolving since its birth. In the beginning, the concept was used as a mere abstraction. Indeed, Johannsen thought of the gene as some form of calculating element (a point to which we will return), but deliberately refrained from speculating about its physical attributes ( Johannsen 1909 ). By the second decade of the 20th century, however, a number of genes had been localized to specific positions on specific chromosomes, and could, at least, be treated, if not thought of precisely, as dimensionless points on chromosomes. Furthermore, groups of genes that showed some degree of coinheritance could be placed in “linkage groups,” which were the epistemic equivalent of the cytological chromosome. We term this phase the “classical period” of genetics. By the early 1940s, certain genes had been shown to have internal structure, and to be dissectable by genetic recombination; thus, the gene, at this point, had conceptually acquired a single dimension, length. Twenty years later, by the early 1960s, the gene had achieved what seemed like a definitive physical identity as a discrete sequence on a DNA molecule that encodes a polypeptide chain. At this point, the gene had a visualizable three-dimensional structure as a particular kind of molecule. We will call this period—from roughly the end of the 1930s to the early 1960s—the “neoclassical period.”

The 1960s definition of the gene is the one most geneticists employ today, but it is clearly out-of-date for deoxyribonucleic acid (DNA)-based organisms. (We will deal only with the latter; RNA viruses and their genes will not be discussed.) Here, we review the older history of the terminology, and then the findings from the 1970s onwards that have undermined the generality of the 1960s definition. We will then propose a contemporary definition of the “gene” that accounts for the complexities revealed in recent decades. This publication is a follow-on paper to an earlier paper by one of us ( Portin 2015 ).

The Classical Period of Genetics

The development of modern genetics began in 1900, when three botanists—the German Carl Correns, the Dutchman Hugo de Vries, and the Austrian Erich von Tschermak—independently cited and discussed the experiments of Mendel as basic to understanding the nature of heredity. They presented results similar to Mendel’s though using different plants as experimental material ( Correns 1900 ; de Vries 1900 ; Tschermak 1900 ). Their conceptual contributions as “rediscoverers” of Mendel, however, were probably not equivalent. De Vries and Correns claimed that they had discovered the essential facts and developed their interpretation before they found Mendel’s article, and they demonstrated that they fully understood the essential aspects of Mendel’s theory ( Stern 1970 ). In contrast, Tschermak’s analysis of his own data was inadequate, and his paper lacked an interpretation. Thus, while he sensed the significance of Mendel’s work, Tschermak should not be given credit equal to that due to de Vries and Correns.

In 1900, chromosomes were already known, and they were soon seen to provide a concrete basis for Mendel’s abstract hereditary factors. This postulated connection between genes and chromosomes, which later came to be known as the chromosome theory of inheritance, was initially provided by the German biologist T. H. Boveri and the American geneticist and physician W. S. Sutton during the years 1902–1903. Boveri first demonstrated the individuality of chromosomes with microscopic observations on the sea urchin Paracentrotus lividus ( Boveri 1902 ). He went on to demonstrate the continuity of chromosomes through cell divisions with studies of Ascaris megalocephala , a parasitic nematode worm ( Boveri 1903 ). These two characteristics—individuality and continuity—are necessary, although not sufficient, characteristics of the genetic material. Sutton’s contribution ( Sutton 1903 ), on the basis of his studies on the spermatogenesis of Brachystola magna , a large grasshopper, was to demonstrate a clear equivalence between the behavior of chromosomes at the meiotic divisions and Mendel’s postulated separation and independent inheritance of character differences at gamete formation. Thus, this early version of the chromosome theory of inheritance suggests an explanation for Mendel’s laws of inheritance: the law of segregation and the law of independent assortment. It was not until 1916, however, that it could be considered to be proven. In that year, C. B. Bridges, an American geneticist, showed in Drosophila melanogaster that nondisjunction, a rare exceptional behavior of genetic makers (lack of segregation) during gamete formation, was always associated with an analogous exceptional behavior of a given chromosome pair during meiosis ( Bridges 1916 ).

Shortly after the birth of the chromosome theory, however, a new phenomenon had been discovered that appeared to contradict Mendel’s law of independent assortment. This was the phenomenon of linkage, initially found in the sweet pea ( Lathyrus odoratus ), in which some genes were found to exhibit “coupling,” violating independent assortment ( Bateson et al. 1905a , b ). This exception to the rule, however, became the basis of an essential extension of the chromosome theory when it was realized that genes showing linkage are located on the same chromosome, and genes showing independent assortment are located on different chromosomes.

According to the canonical history of genetics, it was the American geneticist T. H. Morgan who was the first to propose in 1910 this extension of the chromosome theory ( Morgan 1910 , 1917 ). Recent studies on the history of genetics ( Edwards 2013 ), however, show that, most likely, Morgan was influenced by the first textbook of genetics in English written by R. H. Lock, a British botanist associated with Bateson and Punnet, published in 1906, where the possibility that linkage might result from genes lying on the same chromosome was first suggested ( Lock 1906 ). Thus, it is Lock to whom the credit of explaining linkage must be given.

It was soon understood that genes sufficiently far apart on the chromosome can also show independent assortment, due to extensive genetic recombination during meiosis, while genes that are closer to each other show a degree of coinheritance, the frequency of their separation by recombination being directly related to the distance between them. Owing to the work of Morgan and his group on the fruit fly ( D. melanogaster ), the phenomenon of linkage and its breakdown via crossing over became the essential basis for the mapping of genes ( Morgan 1919 , 1926 ; Morgan et al. 1915 ). The first map, of the Drosophila X-chromosome, was constructed by Alfred Sturtevant, one of Morgan’s students ( Sturtevant 1913 ). The linear sequence of genes he diagrammed was the abstract genetic epistemic equivalent of the chromosome itself.

The genetic maps of the linkage groups were subsequently followed by cytological maps of the chromosomes. These were first constructed by showing that X-ray-induced changes of the order of the genes in the linkage groups, such as translocations and deletions, were associated with corresponding changes in the structure of chromosomes ( Dobzhansky 1929 ; Muller and Painter 1929 ; Painter and Muller 1929 ). This was followed by detailed cytological mapping of genes, made possible by the existence of the “giant” chromosomes of the salivary glands of the fruit fly, in which genes identified by their inheritance patterns could be localized to specific (visible) locations on chromosomes ( Painter 1934 ; Bridges 1935 , 1938 ).

Morgan conceived the cytological explanation for the genetical phenomenon of crossing over by adopting the chiasmatype theory of Frans Alfons Jannsens, a Belgian cytologist, that was based on his observations of meiosis at spermatogenesis in the salamander Batrachoseps attenuatis ( Janssens 1909 ; see also Koszul et al. 2012 ). Janssens observed cross figures at synapsis in meiotic chromosome preparations of this amphibian that resembled the Greek letter chi (χ). Accordingly, he called such a junction “chiasma” (pl. chiasmata). Janssens interpreted each of these as due to fusion at one point between two of the four strands of the tetrad of chromatids at the pachytene stage of the meiotic prophase. According to the chiasmatype theory, chiasmata were due to breakage and reunion of one maternal and one paternal chromatid of the tetrad. Consequently, the formation of each chiasma leads to an exchange of equal and corresponding regions of two of the four chromatids. This mechanism of exchange provided the needed physical explanation for the partial genetic linkage of genes that Morgan had observed. In other words, chiasmata are cytological counterparts of the genetical crossover points.

An alternative explanation for the origin of chiasmata was the so called classical hypothesis, which did not require breakage and rejoining of chromosomes, but assumed that chiasmata were simply a result of the paternal and maternal chromatids going across each other, forming a cross-like configuration at the pachytene stage of meiosis ( McClung 1927 ; Sax 1932a , b ). This hypothesis did not explain the phenomenon of genetic recombination, but was preferred by most cytologists of that time because it did not threaten the permanence and individuality of the chromosomes, which the chiasmatype theory initially seemed to do. During subsequent years, many cytological facts, as reviewed, for example, in Whitehouse (1973) , supported the chiasmatype theory, but not the classical theory.

Thus, by the early 1930s, the concept of the gene had become more concrete. Genes were regarded as indivisible units of inheritance, each located at a specific point on a specific chromosome. Furthermore, they could be defined in terms of their behavior as fundamental units on the basis of four criteria: (1) hereditary transmission, (2) genetic recombination, (3) mutation, and (4) gene function. Moreover, it was believed, albeit without any empirical evidence, that these four ways of defining the gene fully agreed with one another (reviewed in Portin 1993 ; Keller 2000 ). As A. Sturtevant and G. W. Beadle wrote in (1939), near the end of what we are calling the classical period of genetics, it was also clear that genes determine the nature of developmental reactions and thus, ultimately, the visible traits they generate. But how genes do these things was unknown; indeed, that was considered one of the major unsolved problems in biology, and it remained so for two decades ( Sturtevant and Beadle 1962 , p. 335). Further, it was believed that the integration of genetics with such fields as biochemistry, developmental physiology, and experimental embryology would lead to a deep understanding of the nature and role of genes, and that this integration would add to our understanding of those processes that make up development ( Sturtevant and Beadle 1962 , p. 357; see also Sturtevant 1965 ).

The significance of this perspective was initially elaborated by H. J. Muller, an American geneticist and a student of Morgan’s, who had done important work on several key aspects of the subject: the mapping of genes ( Muller 1920 ), the relation between genes and characteristics of organisms ( Muller 1922 ), and the nature of gene mutation ( Muller 1927 ; also see Carlson 1966 ). In his classic paper dealing with the effect of changes in individual genes on the variation of the organism, Muller (1922 ) published arguments that can be viewed as a theoretical summary of the essence of the classical period of genetics. On the basis of a considerable body of earlier work, he put forward an influential theory that genes are molecules with three essential capacities: autocatalysis (self-reproduction), heterocatalysis (production of nongenetic material or effects), and ability to mutate (while retaining the first two properties). In this view, genes were undoubted physical entities, three-dimensional ultramicroscopic ones, possessing individuated heritable structures, with some capacity for change that itself could be passed on.

In another visionary paper, Muller (1926) connected the concept of the gene to the theory of evolution, while he described the gene as the basis of evolution and the origin of life itself, indeed as the basis of life itself. These profound views of Muller strongly influenced the direction of much future research, not only in genetics, but in biology as a whole ( Carlson 1966 p. 82).

The Neoclassical Period of Genetics

Whatever the speculations of Muller and a few others, the classical period of genetics was one in which the gene could be treated effectively as a dimensionless point on a chromosome. It was followed, however, by what we are calling the neoclassical period, in which the gene first acquired an unambiguous spatial dimension, namely length, and later a likewise linear chemical identity, in the form of the DNA molecule. This period of genetics involved two different, but complementary, research programs: on the one hand, it was demonstrated, using the classic genetic tool of recombinational mapping, that genes have an internal structure; on the other hand, the basic molecular nature of the gene and its function began to be revealed. These two streams fused in the late 1950s.

The neoclassical period began in the early 1940s, with work in formal genetics showing that genes could be dissected into contiguous segments by genetic recombination. Hence, they were not dimensionless points but entities with length. These observations were made first in D. melanogaster ( Oliver 1940 ; Lewis 1941 ; 1945 ; Green and Green 1949 ), and then in microbial fungi ( Bonner 1950 ; Giles 1952 ; Pontecorvo 1952 ; Pritchard 1955 ).

If genes had length, however, they must be long molecules of some sort, and the question was whether those molecules were proteins or DNA, the two major molecular constituents of the chromosomes. Critically important work in the early 1940s, in the laboratory of Oswald Avery at Rockefeller University, answered the question. Avery and his colleagues showed that DNA is the hereditary material by demonstrating that the causative agent in bacterial transformation, which entailed a heritable change in the morphology of the bacterial cells ( Griffith 1928 ), was DNA ( Avery et al. 1944 ). Though this work was published in 1944, it would take nearly a decade for this to become universally accepted. The experimental proof that convinced the scientific community was the experiment of Hershey and Chase (1952) , in which these authors showed that the DNA component of bacteriophages was the one responsible for their multiplication.

The most critical and final breakthrough for the DNA theory of inheritance, however, was the revelation of the double-helical structure of DNA ( Watson and Crick 1953a , 1954 ), and the realization of the genetic implications of that ( Watson and Crick 1953b ). This was followed by demonstrations in the early 1960s that genes are first transcribed into messenger RNA (mRNA), which transmitted the genetic information from the nucleus to the protein synthesis machinery in the cytoplasm (reviewed in Portin 1993 ; Judson 1996 ). Earlier work in the 1940s had established the connection between genes and proteins, in the “one gene-one enzyme” hypothesis of Beadle and Tatum (1941) (see also Srb and Horowitz 1944 ; reviewed in Strauss 2016 ). By the late 1950s, there was thus a satisfying molecular theory of both the nature of the gene, and the connections between genes and proteins.

Crucial further work involved the genetic fine structure mapping of genes—a research program that reached its culmination with work by S. Benzer and C. Yanofsky. Benzer, using the operational cis -trans test, originated by E. B. Lewis in Drosophila , defined the unit of genetic complementation, i.e. , the basic unit of gene function, which he called the cistron ( Box 1 ). He also defined the smallest units of genetic recombination and gene mutation: the recon and muton, respectively ( Benzer 1955 , 1959 , 1961 ). The postulate of the classical period that the gene was a fundamental unit not only of function, but also of recombination and mutation, was definitively disproved by Benzer’s work showing that the “gene” had many mutons and recons. Yanofsky and his coworkers validated the material counterparts of these formal concepts of Benzer. The equivalent of the cistron is a sequence of nucleotide pairs in DNA that contain information for the synthesis of a polypeptide, and determines its amino acid sequences, an idea known as the colinearity hypothesis. Furthermore, the physical DNA equivalent of the recon and the muton was shown to be one nucleotide pair ( Crawford and Yanofsky 1958 ; Yanofsky and Crawford 1959 ; Yanofsky et al. 1964 , 1967 ). The period of neoclassical genetics culminated in the cracking of the universal genetic code by several teams, revealing that nucleotide sequences specify the sequence of polypeptide chains (reviewed in Ycas 1969 ; Judson 1996 ).

The neoclassical concept of the gene, outlined above, can be summarized in the formulation “one gene—one mRNA—one polypeptide,” which combines the idea of mRNA, as developed by Jacob and Monod (1961a) ; Gros et al. (1961) ; Brenner et al. (1961) , and the earlier “one gene—one enzyme” hypothesis of Beadle and Tatum (1941) (and see Srb and Horowitz 1944 ). Another version of this hypothesis is that of “one cistron—one polypeptide” ( Crick 1963 ), which emerged as a slogan in the 1960s–1980s. Altogether, the conceptual journey from Johannsen’s totally abstract entities termed “genes” to a defined, molecular idea of what a gene is, and how it works, had taken a little over half a century.

The Breakdown of the Neoclassical Concept of the Gene and the Beginning of the Modern Period of Genetics

Deviations from the one gene—one mrna—one polypeptide hypothesis.

The hypothesis of “one gene—one mRNA—one polypeptide” as a general description of the gene and how it works started to expire, however, when it was realized that a single gene could produce more than one mRNA, and that one gene can be a part of several transcription units. This one-to-several relationship of genes to mRNAs occurs by means of complex promoters and/or alternative splicing of the primary transcript.

Multiple transcription initiation sites, i.e. , alternative promoters, have been found in all kingdoms of organisms, and they have been classified into six classes ( Schibler and Sierra 1987 ). All of them can produce transcripts that do not obey the rule of one-to-one correspondence between the gene and the transcription unit, since transcription can be initiated at different promoters. The result is that a single gene can produce more than one kind of transcript ( Schibler and Sierra 1987 ).

The discovery of alternative splicing as a way of producing different transcripts from one gene had a more complex history. In the late 1970s it was discovered, first in animal viruses and then in eukaryotes, that genes have a split structure. That is, genes are interrupted by introns (see review by Portin 1993 ). Split genes produce one pre-mRNA molecule, from which the introns are removed during the maturation of the mRNA by pre-mRNA splicing. Depending on the gene, the splicing pattern can be invariant (“constitutive”) or variable (“alternative”). In constitutive splicing, all the exons present in a transcript are incorporated into one mature mRNA through invariant ligation of consecutive exons, yielding a single kind of mRNA from the gene. In alternative splicing, nonconsecutive exons are joined by the processing of some, but not all, transcripts from a gene. In other words, individual exons can be excluded from the mature mRNA in some transcripts, but they can be included in others ( Leff et al. 1986 ; Black 2003 ). Alternative splicing is a regulated process, being tissue-specific and developmental-stage-specific. Nevertheless, the colinearity of the gene and the mRNA is preserved, since the order of the exons in the gene is not changed.

In addition to alternative splicing, two other phenomena are now known that contradict a basic tenet of the neoclassical gene concept, namely that amino-acid sequences of proteins, and consequently their functions, are always derivable from the DNA of the corresponding gene. These are the phenomena of RNA editing (reviewed by Brennickle et al. 1999 ; Witzany 2011 ) and of gene sharing originally found by J. Piatigorsky (reviewed in Piatigorsky 2007 ). The term RNA editing describes post-transcriptional molecular processes in which the structure of an RNA molecule is altered. Though a rare event, it has been observed to occur in eukaryotes, their viruses, archaea and prokaryotes, and involves several kinds of base modifications in RNA molecules. RNA editing in mRNAs effectively alters the amino acid sequence of the encoded protein so that it differs from that predicted by the genomic DNA sequence ( Brennickle et.al . 1999 ). The concept of gene sharing describes the fact that different cells contain identically sequenced polypeptides, derived from the same gene, but so differently configured in different cellular contexts that they perform wildly different functions. This phenomenon, facetiously called “protein moonlighting,” means that a gene may acquire and maintain a second function without gene duplication, and without loss of the primary function. Such genes are under two or more entirely different selective constraints ( Piatigorsky and Wistow 1989 ).

Despite these observations, showing the potential one-to-many relationships of genes to mRNAs and their encoded proteins, the concept of the gene remained intact; the gene itself could still be seen as a defined and localized nucleotide sequence of DNA even though it could contain information for more than one kind of polypeptide chain. Matters changed, however, when the sequencing projects revealed still more bizarre phenomena.

Severe cracks in the concept of the gene

These new findings have shown that there are multiple possible relationships between DNA sequences and the molecular products they specify. The net result has been the realization that the basic concept of the gene as some form of generic, universal “unit of heredity” is too simple, and correspondingly, that, a new definition or concept of “the gene” is needed ( Keller 2000 ; Falk 2009 ; Portin 2009 ). Several observations have been crucial to this re-evaluation, and one of us has reviewed these relatively recently ( Portin 2009 ). They are worth summarizing here:

  • In eukaryotic organisms, there are few if any absolute boundaries to transcription, making it impossible to establish simple general relationships between primary transcripts and the ultimate products of those transcripts.

Hence, the structural boundaries of the gene as the unit of transcription are often far from clear, as documented particularly well in mammals (reviewed by Carninci 2006 ). In reality, whole chromosomes, if not the whole genome, seem to be continuums of transcription ( Gingeras 2007 ). Furthermore, the genome is full of overlapping transcripts, thus making it impossible to draw 1:1:1 relationship between specific DNA sequences, transcripts and functions ( Pearson 2006 ). Indeed, convincing evidence indicates that the human genome is comprehensively transcribed from both DNA strands, so that the majority of its bases can be found in primary transcripts that compendiously overlap one another (The FANTOM Consortium and RIKEN Genome Exploration Group 2005; The ENCODE Project Consortium 2007 ; 2012 ). Both protein coding and noncoding transcripts may be derived from either or both DNA strands, and they may be overlapping and interlaced. Furthermore, different transcripts often include the same coding sequences ( Mattick 2005 ). The functional significance of these overlaps is still largely unclear, but there is an increasing number of examples in which both transcripts are known to have protein-coding exons from one position in the genome combined with exons from another part of the genome hundreds of thousands of nucleotides away ( Kapranov et al. 2007 ). This was wholly unanticipated when the 1960s definition of the gene was formulated.

  • 2. Exons of different genes can be members of more than one transcript.

Gene fusion, at the level of transcripts, is a reality, and is completely at odds with the “one gene—one mRNA—one protein” hypothesis. And this is not a rare phenomenon. It has been estimated that at least 4–5% of the tandem gene pairs in the human genome can be transcribed into a single RNA sequence, called chimeric transcripts, encoding a putative chimeric protein ( Parra et al. 2006 ).

  • 3. Comparably, in the organelles of microbial eukaryotes, many examples of “encrypted” genes are known: genes are often in pieces that can be found as separate segments around the genome.

Hence, in addition to the fusion of two adjacent genes at the level of transcription, different building blocks of a given mRNA molecule can often be located, as modules, on different chromosomes (reviewed in Landweber 2007 ). Some evidence indicates that, even in multicellular eukaryotes, protein-coding transcripts are derived from different nonhomologous chromosomes (reviewed in Claverie 2005 ).

  • 4. In contradiction to the neoclassical definition of a gene, which posits that the hereditary information resides solely in DNA sequences, there is increasing evidence that the functional status of some genes can be inherited from one generation of individuals to the next, a phenomenon known as transgenerational epigenetic inheritance ( Holliday 1987 ; Gerhart and Kirschner 2007 ; Jablonka and Raz 2009 ).

One example is mouse epigenetic changes mediated by RNA that are inherited between generations in a non-Mendelian fashion ( Rassoulzadegan et al. 2006 ). On the other hand, many of the epigenetic changes, or so called epimutations, are inherited otherwise in a Mendelian fashion, except that, in contrast to conventional mutations, they are not always inherited with the same stability, but can be swept away during the course of some generations ( e.g. , Jablonka and Raz 2009 ).

  • 5. “Genetic restoration” a mechanism of non-Mendelian inheritance of extragenomic information, first found in Arabidopsis thaliana , may also take place ( Lolle et al. 2005 ).

It was observed that several independent mutant strains yielded apparently normal progeny at a high frequency of a few percent, which is higher than could be expected if it were a question of random mutations. It seems neither to be a question of epigenetic changes, but rather healing of fixed mutations. Lolle et al. (2005) suggested that this is due to precise reversion of the original DNA with a mechanism that involves template-directed restoration of ancestral DNA passed on in an RNA cache. This phenomenon, called the “RNA cache” hypothesis, means that organisms can sometimes rewrite their DNA on the basis of RNA messages inherited from generations past ( Lolle et al. 2005 ). The RNA cache hypothesis has, however, been disputed by several authors ( Comai and Cartwright 2005 ; Mercier et al. 2008 ; Miyagawa et al. 2013 ).

  • 6. Finally, in addition to protein coding genes, there are many RNA-encoding genes that produce diverse RNA molecules that are not translated to proteins.

That there are special genes that specify only RNA products was recognized in the early 1960s; these are the ribosomal RNA and tRNA genes, vital for protein synthesis. Yet, it is now apparent that there are many transcripts that do not encode proteins, and that are not the classic structural RNAs of protein synthesis (tRNAs and rRNAs). Those sequences that specify long noncoding RNAs (lncRNAs), and which serve some biological function, surely deserve to be called genes. In contrast, sequences specifying lncRNAs or transcripts from defunct mobile elements, which are made constitutively in all or most cell types probably do not have biological function and should not be designated as genes. The surprisingly large multitude of different noncoding RNA genes and their function has been reviewed by several authors ( e.g. , Eddy 2001 ; Carninci and Hayashizaki 2007 ; Carninci et al. 2008 ).

Current Status and Future Perspectives Regarding the Concept of the Gene

The observations summarized above, together with many others, have created the interesting situation that the central term of genetics— “the gene”—can no longer be defined in simple terms. The neoclassical molecular definition of the gene does not capture the bewildering variety of hereditary elements, all based in DNA, that collectively specify the organism, and which therefore deserve the appellation of “genes.” Even the classical notion of the gene simply as a fundamental “unit of heredity” is itself problematic. After all, if it is difficult or impossible to generalize about the nature of such “units,” it is probably not very helpful to speak about them. Unsurprisingly, this realization has called forth various attempts to redefine the gene, in terms of both DNA sequence properties, and those of the products specified by those sequences. A number of proposed definitions are listed in Table 1 . A detailed discussion of these ideas will not be given here, but they have been summarized, classified, and characterized (see Waters 2013 ). These definitions, however, all tend to neglect one central, albeit implicit, aspect of the earlier notions of the “gene”: its presumed autonomy of action. We return to this matter below.

How should geneticists deal with this situation? Should we simply invoke a plurality of different kinds of genes and leave it at that? In effect, we could settle for using the collective term “the genes” as a synonym for the genome, and not fuss over the seeming impossibility of defining the singular form, the “gene.” This, however, would seem to be more of an evasion of the problem than its solution. Alternatively, would it be preferable to accept the inadequacy of the notion of a simple general “unit of heredity,” and foreswear the use of the term “gene” altogether?

The problem with that last suggestion, junking the term “gene,” is not just that the word is used ubiquitously by geneticists and laymen alike, but that it seems indispensable to the discipline’s discourse. This is apparent in the foundations of several subdisciplines of genetics, such as many fields of applied genetics, like medical genetics and plant and animal breeding, that frequently deal in genes identified solely by their nonmolecular mutant phenotypes. It also applies to quantitative genetics and population genetics, which operate using mathematical modeling, and in which the gene is often regarded merely as an abstract unit of calculation (not dissimilarly to the view of Johannsen described below), but one that is vital to conceptualizing the genetic compositions of populations and their changes. In those fields, the molecular intricacies and complications of the genetic material can be largely ignored, at least initially, but the term “gene” itself seems irreplaceable. It is hard to imagine those disciplines abandoning it, whatever the range of molecular complexities that the word both hides and embraces.

In other subdisciplines, such as developmental genetics and molecular genetics, however, there is an urgent need to redefine the gene because the molecular details are often crucial to understanding the phenomena being investigated. The definitions that have been attempted so far ( Table 1 ), however, seem inadequate; for the most part, they focus on either structural or functional aspects, yet it is ultimately meaningless to separate structure and function, even though both can initially be studied in isolation from one another. One attempt to unite the structural and functional aspects of the gene in a single definition has been made by P. E. Griffiths and E. M. Neumann-Held, who introduced the “molecular process” gene concept. In this idea, the word “gene” denotes not some structural “unit of heredity” but the recurring process that leads to the temporally and spatially regulated expression of a particular polypeptide product ( Griffiths and Neumann-Held 1999 ; Neumann-Held 1999 , 2001 ). One difficulty with this redefinition is that it neglects all the nonconventional genes that specify only RNA products. More fundamentally, it has nothing to say about hereditary transmission, which was the original and fundamental impetus for coining the term “gene.”

Perhaps the way forward is to take a step backward in history, and focus on the initial concerns of Johannsen. He not only coined the term “gene,” but was also responsible for the words “genotype” and “phenotype,” and the crucial distinction between them in heredity. Though he could say nothing about how genes (genotype) specified or determined traits (phenotype), he clearly saw this as a crucial question. Indeed, that issue has been at the heart of genetics since the 1930s, in contrast to the questions about how genes are transmitted in heredity, which dominated the first decades of 20th-century genetics. It is apparent, however, that Johannsen thought that the genotype is primary, and that genes are minute computational devices whose precise material nature could be left for solution to a later time. He wrote: “Our formulas, as used here for not directly observable genotypic factors—genes as we used to say—are and remain computational-formulas , placement-devices that should facilitate our overview. It is precisely therefore that the little word “gene” is in place; no imagination of the nature of this “construction” is prejudiced by it, rather the different possibilities remain open from case to case.” ( Johannsen 1926 p. 434, English translation in Falk 2009 p. 70).

The initial expectations were that the connections between genes and phenes would be fairly direct, an expectation bolstered initially by findings about pigmentation genetics, and later by mutations affecting nutritional requirements in microbial cells. In both situations, the connection between the mutant effects and the known biochemistry were often direct and easy to understand. Furthermore, the early success of Mendelian genetics had been based, in large part, on the fact that many of the genetic variants initially studied had constant, unambiguous effects; this was vital to the work of Mendel and to the early 20th-century Mendelians. As the field matured, however, it became apparent that the phenotypic effects of many alleles could be influenced by other genes, influencing both the degree of severity of a mutation’s expression (its “expressivity”), and the proportion of individuals possessing the mutation that expressed it at all (its “penetrance”).

To illustrate the differences in the manifestation of a given gene’s function caused by genetic background effects, take the various degrees of expression of the gene regulating the size and shape of incisors in man. Copies of one dominant gene, identical by descent, caused missing, or peg-shaped, or strongly mesio-distally reduced upper lateral incisors in subsequent generations ( Alvesalo and Portin 1969 ). Though the precise nature of the gene involved is not known, the example shows that the same gene can have different manifestations in different individuals, i.e. , in different genetic backgrounds. There is an enormous number of documented examples of such genetic background effects in all organisms that have been investigated genetically.

The phenomenon of genetic background effects was already well recognized by geneticists in the second decade of the 20th century, as illustrated, for example, in the multi-part series of papers, dealing with coat color inheritance in mammals by S. Wright, published in Journal of Genetics ( Wright 1917a , b ). (Wright would later achieve eminence as one of the key founders of population genetics, but he started his career in what was then known as “physiological genetics.”) The whole matter, however, was raised to a new conceptual level in the 1930s, by C. H. Waddington, a British developmental biologist and geneticist, who called the totality of interactions among genes and between genes and the environment “the epigenotype.”

The epigenotype consists of the total developmental system lying between the genotype and the phenotype through which the adult form of an organism is realized ( Waddington 1939 ). Although a clear concept of “gene regulation” did not exist in the 1940s and 1950s, Waddington, with this concept, was clearly edging toward it. When the Jacob-Monod model of gene regulation came forth in the early 1960s, Waddington promptly saw its relevance for development ( Waddington 1962 ; 1966 ) as, of course, did Jacob and Monod themselves ( Jacob and Monod 1961a , b ; Monod and Jacob 1961 ). The crucial point, with respect to the definition of the gene, is that genes are not autonomous, independent agents—as was implicit in much of the early treatment of genes, and which indeed remains potent in much contemporary thinking, as exemplified in R. Dawkins’ still influential book, “The Selfish Gene” ( Dawkins 1976 ). Rather, they exert their effects within, or as the output of, complex systems of gene interactions. Today, we term such systems “genetic networks” or “genetic regulatory networks” (GRNs). Sewall Wright, along with Waddington, was an early exponent of such network thinking ( Wright 1968 ), but the modern concept of GRNs reached its fruition only in the late 1990s (reviewed in Davidson 2001 ; Wilkins 2002 ; Davidson and Erwin 2006 ; Wilkins 2007 ).

The conceptual consequences of viewing individual genes not as autonomous actors but as interactive elements or outputs of networks are profound. For one thing, it becomes relatively easy to think about the nature of genetic background effects in terms of the structure of GRNs ( Box 2 ). While much of the thinking of the 20th century about genes was based on the premise that the route from gene to phenotype was fairly direct, and often deducible from the nature of the gene product, the network perspective envisages far more complexity and indirectness of effects. In general, the path from particular genes to specific phenes is long, and the role of many gene products seems to be the activation or repression of the activities of other genes. As a result, for most of these interactive effects, the normal (wild-type) function of the gene can only rarely be deduced directly from the mutant phenotype, which often involves complicated secondary effects resulting from the disrupted operation of the GRN within which the gene acts. Hence, the widely held popular belief that particular genes govern or “determine” particular traits, including complex psychological ones ( e.g. , risk-taking, gender identity, autism), as inferred from studies of genetic variants, is a gross oversimplification, hence distortion, of a complex reality.

In effect, genes do not have independent “agency”; for the most part they are simply cogs in the complex machinery of GRNs, and interpreting their mutant phenotypes is often difficult. In contrast, the genes for which there is an obvious connection between the mutant form and an altered phenotype are usually ultimate outputs of GRNs, such as pigmentation genes, hemoglobins, and enzymes of intermediary metabolism. These genes, however, also lack true autonomy, being activated in response to the operation of GRNs. Therefore, to fully understand how a gene functions, one must comprehend the larger systems in which they operate. Genetics, in this sense, is becoming systems biology, a point that has also been made by others (see, for example, Keller 2005 ). In effect, since genes can only be defined with respect to their products, and those products are governed by GRNs, the particular cellular and regulatory (GRN) contexts involved may be considered additional “dimensions” vital to specifying a gene’s function and identity. The examples of “gene sharing,” in which the function of the gene is wholly a function of its cellular context, illustrate this in a particularly vivid way. The “gene”—however it comes to be defined—can therefore be seen not as a three-dimensional entity but as a multi-dimensional one.

Putting it all Together: Toward a New Definition of the “Gene”

Where do all these considerations leave us? It took approximately half a century to go from Johannsen’s wholly abstract formulation of the term “gene” as a “unit of heredity,” to reach the early 1960s concept of the gene as a continuous segment of DNA sequence specifying a polypeptide chain. A further half century’s worth of experimental investigation has brought us to the realization that the 1960s definition is no longer adequate as a general one. Yet the term “gene” persists as a vaguely understood generic description. It is, to say the least, an anomalous situation that the central term of genetics should now be shrouded in confusion and ambiguity. That is not only intellectually unsatisfactory for the discipline, but has detrimental effects on the popular understanding of genetics. Such misunderstanding is seen most starkly in the situation noted earlier, the commonly held view that there are individual genes responsible “for” certain complex conditions, e.g. , schizophrenia, alcoholism, etc. A clearer definition of the term would thus help both the field of genetics, and, ultimately, public understanding.

Here, therefore, we will propose a definition that we believe comes closer to doing justice to the idea of the “gene,” in light of current knowledge. It makes no reference to “the unit of heredity”—the long-standing sense of the term—because we feel that it is now clear that no such generic universal unit exists. By referring to DNA sequences, however, our definition embodies the hereditary dimension of genes (in a way that pure “process”-centered definitions focused on gene expression do not). Furthermore, in its emphasis on the ultimate molecular products and reference to GRNs as both evokers and mediators of the actions of those products, it recognizes the long causal chains that often operate between genes and their effects. Our provisional definition is this:

A gene is a DNA sequence (whose component segments do not necessarily need to be physically contiguous) that specifies one or more sequence-related RNAs/proteins that are both evoked by GRNs and participate as elements in GRNs , often with indirect effects , or as outputs of GRNs , the latter yielding more direct phenotypic effects .

This is an explicitly “molecular” definition, but we think that is what is needed now. In contrast, “genes” that are identified purely by their phenotypic effects, as for example in genome-wide association study (GWAS) experiments, would, in our view, not deserve such a characterization until found to specify one or more RNAs/proteins. The genetic effects picked up in such work often identify purely regulatory elements, and these should not qualify as genes, only as part of genes. Our definition, like the classic 1960s’ formulation, makes identifying the product(s) crucial to delimiting, hence identifying, the genes themselves. It, however, also emphasizes the molecular and cellular context in which those products form and function. Those larger contexts, in effect, become necessary to define the function of the specifying gene(s).

The new definition, however, is slightly cumbersome. We therefore offer it only as a tentative solution, hence as a challenge to the field to find a better formulation but one that does justice to the complex realities of the genetic material uncovered in the past half-century.

The cis -trans test

Of fundamental importance in the operational definition of the gene is the cis-trans test ( Lewis 1951 ; Benzer 1957 ). To test whether mutations a and b belong to the same gene or cistron ( Benzer 1957 ), or different cistrons, the cis -heterozygote a b/+ + and the trans -heterozygote a +/+ b are compared. If the cis -heterozygotes, and the trans -heterozygotes are phenotypically similar (usually wild type), they are said to “complement” one another, and the mutations are inferred to fall into different cistrons. If, however, the cis -heterozygotes and the trans -heterozygotes are phenotypically different, the trans -heterozygote being (usually) mutant, and the cis -heterozygote (usually) of wild type, the mutations do not complement, and are inferred to belong to the same cistron. The attached figure clarifies the idea.

An external file that holds a picture, illustration, etc.
Object name is 1353figx1.jpg

The principle of the cis-trans test. If mutations a and b belong to the same cistron, the phenotypes of the cis - and trans -heterozygotes are different. If, however, the cis - and trans -heterozygotes are phenotypically similar, the mutations a and b belong to different cistrons. The notation “works” on the Figure means that the cistron is able to produce a functional polypeptide. Mutations a and b are recessive mutations that both affect the same phenotypic trait, such as the eye color of D. melanogaster , for example.

Interpreting “genetic background” effects in terms of GRNs

Genetic background effects typically exhibit either of two forms, when a pre-existing mutation, with an associated phenotypic manifestation, is crossed into a different strain: the reduction (“suppression”) of the mutant phenotype or its increase (“enhancement”). The effects involve either changes in the degree (“expressivity”) of the mutant effect, or the number of individuals) affected (its “penetrance”), or both. When analyzed genetically, these effects could often be traced to specific “suppressor” or “enhancer” loci, which could be either tightly linked or distant in the genome from the original mutant locus. Typically regarded as an unnecessary complication in analysis of the original mutation, they were usually not pursued further. Yet, in terms of current understanding of GRNs, they are not, in principle, mysterious. Each gene that is part of a GRN can be thought of as either transmitting a signal for the activation or repression of one or more other “downstream” genes in that network, but, given the hierarchical nature of GRNs, it follows that a mutational alteration in a specific gene in the network can be either strengthened or reduced by other mutational changes in the network, either upstream or downstream of the original mutation. The particular effect achieved will depend on the characteristics of each of the two mutations involved—whether they are loss-of- or gain-of-function mutations—and the precise nature of their connectivity. Such effects are most readily illustrated with linear sequences of gene actions, genetic pathways ( Wilkins 2007 ), but can be understood in networks, when the network structure and the placement of the two genes within them is known. Some genetic background effects, in principal, however, might involve partially redundant networks, in which the effects of the two pathways are additive. In those cases, a mutant effect in one pathway may be either compensated, hence suppressed, or exacerbated, by a second mutation in the other pathway, the precise effects again depending upon the specific characteristics of the mutations and the degree of redundancy between the two GRNs.

Acknowledgments

We thank Mark Johnston and Richard Burian for many helpful suggestions, both editorial and substantive, on previous drafts. A.W. would also like to acknowledge earlier conversations with Jean Deutsch on the subject of this article; we disagreed on much but the process was stimulating and helpful. P.P. wants to thank his friends Marja Vieno, M.Sc. for linguistic aid at the very first stages of this project, and Harri Savilahti, Ph.D. for a fruitful discussion, and Docent Mikko Frilander, Ph.D. for consultation. The authors declare no conflict of interest.

Communicating editor: M. Johnston

Literature Cited

  • Alvesalo L., Portin P., 1969.   The inheritance pattern of missing, peg-shaped, and strongly mesio-distally reduced upper lateral incisors. Acta Odontol. Scand. 27 : 563–575. [ PubMed ] [ Google Scholar ]
  • Avery O. T., MacLeod C. M., MacCarty M., 1944.   Studies on the chemical nature of the substance inducing transformation of Pneumococcal types. Induction of transformation by a deoxyribonucleic acid fraction isolated from Pneumococcus type III. J. Exp. Med. 79 : 137–159. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Bateson W., Saunders E. R., Punnett R. C., 1905a  Experimental studies in the physiology of heredity. Reports to the Evolution Committee of the Royal Society, Report II. pp. 4–99 1–55.
  • Bateson W., Saunders E. R., Punnett R. C., 1905b  Experimental studies in the physiology of heredity. Reports to the Evolution Committee of the Royal Society, Report II. pp. 80–99.
  • Beadle G. W., Tatum E. L., 1941.   Genetic control of biochemical reactions in Neurospora . Proc. Natl. Acad. Sci. USA 27 : 499–506. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Benzer S., 1955.   Fine structure of a genetic region in bacteriophage. Proc. Natl. Acad. Sci. USA 41 : 344–354. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Benzer S., 1957.   The elementary units of heredity , pp. 70–93 in The Chemical Basis of Heredity , edited by McElroy W. D., Glass B. Johns Hopkins Press, Baltimore. [ Google Scholar ]
  • Benzer S., 1959.   On the topology of the genetic fine structure. Proc. Natl. Acad. Sci. USA 45 : 1607–1620. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Benzer S., 1961.   On the topography of the genetic fine structure. Proc. Natl. Acad. Sci. USA 47 : 403–415. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Black D. L., 2003.   Mechanisms of alternative pre-messenger RNA splicing. Annu. Rev. Biochem. 72 : 291–336. [ PubMed ] [ Google Scholar ]
  • Bonner D. M., 1950.   The Q locus of Neurospora. Genetics 35 : 655–656. [ Google Scholar ]
  • Boveri T., 1902.   Über mehrpolige Mitosen als Mittel zur Analyse des Zellkerns. Verh. phys-med. Ges. Würzb. 35 : 60–90. [ Google Scholar ]
  • Boveri T., 1903.   Über die Konstitution der chromatischen Kernsubstanz. Verh. deutsch. zool. Ges. Würzb. 13 : 10–33. [ Google Scholar ]
  • Brenner S., Jacob F., Meselson M., 1961.   An unstable intermediate carrying information from genes to ribosomes for protein synthesis. Nature 190 : 576–581. [ PubMed ] [ Google Scholar ]
  • Brennickle A., Marchfelder A., Binder S., 1999.   RNA editing. FEMS Microbiol. Rev. 23 : 297–316. [ PubMed ] [ Google Scholar ]
  • Bridges C. B., 1916.   Non-disjunction as proof of the chromosome theory of heredity. Genetics 1 : 1–52, 107–163. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Bridges C. B., 1935.   Salivary chromosome maps with a key to the banding of the chromosomes of Drosophila melanogaster . J. Hered. 26 : 60–64. [ Google Scholar ]
  • Bridges C. B., 1938.   A revised map of the salivary gland X-chromosome of Drosophila melanogaster . J. Hered. 29 : 11–13. [ Google Scholar ]
  • Burian R. M., 2004.   Molecular epigenesis, molecular pleiotropy, and molecular gene definitions. Hist. Philos. Life Sci. 26 : 59–80. [ PubMed ] [ Google Scholar ]
  • Carlson E. A., 1966.   The Gene: A Critical History . W. B. Saunders Company, Philadelphia. [ Google Scholar ]
  • Carninci P., 2006.   Tagging the mammalian transcription complexity. Trends Genet. 22 : 501–510. [ PubMed ] [ Google Scholar ]
  • Carninci P., Hayashizaki Y., 2007.   Noncoding RNA transcription beyond annotated genes. Curr. Opin. Genet. Dev. 17 : 139–144. [ PubMed ] [ Google Scholar ]
  • Carninci P., Yasuda J., Hayashizaki Y., 2008.   Multifaceted mammalian transcriptome. Curr. Opin. Cell Biol. 20 : 274–280. [ PubMed ] [ Google Scholar ]
  • Claverie J.-M., 2005.   Fewer genes, more noncoding RNA. Science 309 : 1529–1530. [ PubMed ] [ Google Scholar ]
  • Comai L., Cartwright R. A., 2005.   A toxic mutator and selection alternative to the non-Mendelian RNA cache hypothesis for hothead reversions. Plant Cell 17 : 2856–2858. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Correns C. G., 1900.   Mendels Regel über das Verhalten der Nachkommenschaft der Rassenbastarde. Ber. Deut. Bot. Ges. 18 : 158–168. [ Google Scholar ]
  • Crawford I. P., Yanofsky C., 1958.   On the separation of the tryptophan synthetase of Escherichia coli into two protein components. Proc. Natl. Acad. Sci. USA 44 : 1161–1170. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Crick F. H. C., 1963.   On the genetic code. Science 139 : 461–464. [ PubMed ] [ Google Scholar ]
  • Davidson E. H., 2001.   Genomic Regulatory Systems: Development and Evolution . Academic Press, San Diego. [ Google Scholar ]
  • Davidson E. H., Erwin D. H., 2006.   Gene regulatory networks and the evolution of animal body plans. Science 311 : 796–800. [ PubMed ] [ Google Scholar ]
  • Dawkins R., 1976.   The Selfish Gene . Oxford University Press, Oxford. [ Google Scholar ]
  • de Vries H., 1900.   Sur la loi de disjonction des hybrides. CR. Acad. Sci. Paris. 130 : 845–847. [ Google Scholar ]
  • Dobzhansky Th., 1929.   Genetical and cytological proof of translocations involving the third and fourth chromosome in Drosophila melanogaster . Biol. Zentralbl. 49 : 408–419. [ Google Scholar ]
  • Eddy S. R., 2001.   Non-coding RNA genes and the modern RNA world. Nat. Rev. Genet. 2 : 919–929. [ PubMed ] [ Google Scholar ]
  • Edwards A. W. F., 2013.   Robert Heath Lock and his textbook of genetics, 1906. Genetics 194 : 529–537. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Falk R., 2009.   Genetic Analysis. A History of Genetic Thinking . Cambridge University Press, Cambridge. [ Google Scholar ]
  • Gerhart J., Kirschner M., 2007.   The theory of facilitated variation. Proc. Natl. Acad. Sci. USA 104 ( Suppl. 1 ): 8582–8589. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Giles N. H., 1952.   Studies on the mechanism of reversion in biochemical mutants of Neurospora crassa. Cold Spring Harb. Symp. Quant. Biol. 16 : 283–313. [ PubMed ] [ Google Scholar ]
  • Gingeras T. R., 2007.   Origin of phenotypes: genes and transcripts. Genome Res. 17 : 682–690. [ PubMed ] [ Google Scholar ]
  • Green M. M., Green K. C., 1949.   Crossing over between alleles of the lozenge locus in Drosophila melanogaster . Proc. Natl. Acad. Sci. USA 35 : 586–591. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Griffith F., 1928.   The significance of pneumococcal types. J. Hyg. (Lond.) 27 : 113–159. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Griffiths P. E., Neumann-Held E. M., 1999.   The many faces of the gene. Bioscience 49 : 656–662. [ Google Scholar ]
  • Griffiths P. E., Stotz K., 2006.   Genes in the postgenomic era. Theor. Med. Bioeth. 27 : 499–521. [ PubMed ] [ Google Scholar ]
  • Gros F., Gilbert W., Hiatt H. H., Attardi G., Spahr D. F., et al., 1961.   Molecular and biological characterization of messenger RNA. Cold Spring Harb. Symp. Quant. Biol. 26 : 111–132. [ PubMed ] [ Google Scholar ]
  • Hershey A. D., Chase M., 1952.   Independent functions of viral protein and nucleic acid in growth of bacteriophage. J. Gen. Physiol. 36 : 39–56. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Holliday R., 1987.   The inheritance of epigenetic defects. Science 238 : 163–170. [ PubMed ] [ Google Scholar ]
  • Jablonka E., Raz G., 2009.   Transgenerational epigenetic inheritance: prevalence, mechanisms, and implications for the study of heredity and evolution. Q. Rev. Biol. 84 : 131–176. [ PubMed ] [ Google Scholar ]
  • Jacob F., Monod J., 1961a  Genetic regulatory mechanisms in the synthesis of proteins. J. Mol. Biol. 3 : 318–356. [ PubMed ] [ Google Scholar ]
  • Jacob F., Monod J., 1961b  On the regulation of gene activity. Cold Spring Harb. Symp. Quant. Biol. 26 : 193–211. [ PubMed ] [ Google Scholar ]
  • Janssens F. A., 1909.   La théorie de la chiasmatypie, nouvelle interpretation des cinéses de maturation. Cellule 25 : 387–406. [ Google Scholar ]
  • Johannsen W., 1909.   Elemente der exakten Erblichkeitslehre . Gustav Fischer, Jena. [ Google Scholar ]
  • Johannsen W., 1926.   Elemente der exakten Erblichkeitslehre , Ed. 3rd Gustav Fischer, Jena. [ Google Scholar ]
  • Judson H. F., 1996.   The Eighth Day of Creation: Markers of the Revolution in Biology. Expanded Edition . Cold Spring Harbor Laboratory Press, Cold Spring Harbor, New York. [ Google Scholar ]
  • Kapranov P., Willingham A. T., Gingeras T. R., 2007.   Genome-wide transcription and the implications for genomic organization. Nat. Rev. Genet. 8 : 413–423. [ PubMed ] [ Google Scholar ]
  • Keller E. F., 2000.   The Century of the Gene . Harvard University Press, Cambridge, MA. [ Google Scholar ]
  • Keller E. F., 2005.   The century of the gene. J. Biosci. 30 : 3–10. [ PubMed ] [ Google Scholar ]
  • Keller E. F., Harel D., 2007.   Beyond the gene. PLoS One 2 ( 11 ): e1231. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Koszul R., Meselson M., Van Donick K., Vandenhaute J., Zickler D., 2012.   The centenary of Janssens’s chiasmatype theory. Genetics 191 : 309–317. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Landweber L. F., 2007.   Why genomes in pieces? Science 318 : 406–407. [ PubMed ] [ Google Scholar ]
  • Leff S. E., Rosenfeld M. G., Evans R. M., 1986.   Complex transcriptional units: diversity in gene expression by alternative RNA processing. Annu. Rev. Biochem. 55 : 1091–1117. [ PubMed ] [ Google Scholar ]
  • Lewis E. B., 1941.   Another case of unequal crossing over in Drosophila melanogaster . Proc. Natl. Acad. Sci. USA 27 : 31–34. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Lewis E. B., 1945.   The relation of repeats to position effect in Drosophila melanogaster . Genetics 30 : 137–166. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Lewis E. B., 1951.   Pseudoallelism and gene evolution. Cold Spring Harb. Symp. Quant. Biol. 16 : 159–174. [ PubMed ] [ Google Scholar ]
  • Lock R. H., 1906.   Recent Progress in the Study of Variation , Heredity and Evolution . Murray, London. [ Google Scholar ]
  • Lolle S. J., Victor J. L., Young J. M., Pruitt R. E., 2005.   Genome-wide non-Mendelian inheritance of extra-genomic information in Arabidopsis . Nature 434 : 505–509. [ PubMed ] [ Google Scholar ]
  • Mattick J. S., 2005.   The functional genomics of noncoding RNA. Science 309 : 1527–1528. [ PubMed ] [ Google Scholar ]
  • McClung C. E., 1927.   The chiasmatype theory of Janssens. Q. Rev. Biol. 2 : 344–366. [ Google Scholar ]
  • Mendel G., 1866.   Versuche über Pflanzen-Hybriden. Verh. naturf. Ver. Brünn 4 : 3–47. [ Google Scholar ]
  • Mercier R., Jolivet S., Vignard J., Durand S., Drouaud J., et al., 2008.   Outcrossing as an explanation of the apparent unconventional genetic behavior of Arabidopsis thaliana hth mutants. Genetics 180 : 2295–2297. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Miyagawa Y., Ogawa J., Iwata Y., Koizumi N., Mishiba K.-I., 2013.   An attempt to detect siRNA-mediated genomic DNA modification by artificially induced mismatch siRNA in Arabidopsis . PLoS One 8 ( 11 ): e81326. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Monod J., Jacob F., 1961.   General conclusions: teleonomic mechanisms in cellular metabolism, growth and differentiation. Cold Spring Harb. Symp. Quant. Biol. 26 : 389–401. [ PubMed ] [ Google Scholar ]
  • Morgan T. H., 1910.   Chromosomes and heredity. Am. Nat. 44 : 449–496. [ Google Scholar ]
  • Morgan T. H., 1917.   The theory of the gene. Am. Nat. 51 : 513–544. [ Google Scholar ]
  • Morgan T. H., 1919.   The Physical Basis of Heredity . Yale University Press, New Haven. [ Google Scholar ]
  • Morgan T. H., 1926.   The Theory of the Gene . Yale University Press, New Haven. [ Google Scholar ]
  • Morgan T. H., Sturtevant A. H., Muller H. J., Bridges C. B., 1915.   The Mechanism of Mendelian Heredity . Henry Holt, New York. [ Google Scholar ]
  • Moss L., 2003.   What Genes Can’t Do . MIT Press, Cambridge, MA. [ Google Scholar ]
  • Muller H. J., 1920.   Are the factors of heredity arranged in a line? Am. Nat. 54 : 97–121. [ Google Scholar ]
  • Muller H. J., 1922.   Variation due to change in the individual gene. Am. Nat. 56 : 32–50. [ Google Scholar ]
  • Muller H. J., 1926.   The gene as the basis of life. Proc. Internat. Cong. Plant Sci. 1 : 897–921. [ Google Scholar ]
  • Muller H. J., 1927.   Artificial transmutation of the gene. Science 66 : 84–87. [ PubMed ] [ Google Scholar ]
  • Muller H. J., Painter T. S., 1929.   The cytological expression of changes in gene alignment produced by X-rays in Drosophila . Am. Nat. 63 : 193–200. [ Google Scholar ]
  • Neumann-Held E. M., 1999.   The gene is dead – Long live the gene: Conceptualizing genes the constructionist way , pp. 105–137 in Sociobiology and Bioeconomics. The Theory of Evolution in Biological and Economic Theory , edited by Koslowski P. Springer-Verlag, Berlin. [ Google Scholar ]
  • Neumann-Held E. M., 2001.   Let’s talk about genes: The process molecular gene concept and Its context , pp. 69–73 in Cycles of Contingency , edited by Oyama S., Griffiths P. E., Gray R. D. Bradford, MIT Press, Cambridge, MA. [ Google Scholar ]
  • Oliver P., 1940.   A reversion to wild type associated with crossing over in Drosophila melanogaster . Proc. Natl. Acad. Sci. USA 26 : 452–454. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Painter T. S., 1934.   A new method for the study of chromosome aberrations and the blotting of chromosome maps in Drosophila melanogaster . Genetics 19 : 175–188. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Painter T. S., Muller H. J., 1929.   Parallel cytology and genetics of induced translocations and deletions in Drosophila. J. Hered. 20 : 287–298. [ Google Scholar ]
  • Parra G., Reymond A., Dabbousch N., Dermitzakis E. T., Castelo R., et al., 2006.   Tandem chimerism as a means to increase protein complexity in the human genome. Genome Res. 16 : 37–44. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Pearson H., 2006.   What is a gene? Nature 441 : 399–401. [ PubMed ] [ Google Scholar ]
  • Pesole G., 2008.   What is a gene? An updated operational definition. Gene 417 : 1–4. [ PubMed ] [ Google Scholar ]
  • Piatigorsky J., 2007.   Gene Sharing and Evolution: The Diversity of Protein Functions . Harvard University Press, Cambridge, MA. [ Google Scholar ]
  • Piatigorsky J., Wistow G. J., 1989.   Enzyme/crystallins: gene sharing as an evolutionary strategy. Cell 57 : 197–199. [ PubMed ] [ Google Scholar ]
  • Pontecorvo G., 1952.   The genetic formulation of gene structure and action. Adv. Enzym. 13 : 121–149. [ PubMed ] [ Google Scholar ]
  • Portin P., 1993.   The concept of the gene: short history and present status. Q. Rev. Biol. 68 : 173–223. [ PubMed ] [ Google Scholar ]
  • Portin P., 2009.   The elusive concept of the gene. Hereditas 146 : 112–117. [ PubMed ] [ Google Scholar ]
  • Portin P., 2015.   The development of genetics in the light of Thomas Kuhn’s theory of scientific revolutions. Recent Adv. DNA Gene Seq. 9 : 14–25. [ PubMed ] [ Google Scholar ]
  • Pritchard R. H., 1955.   The linear arrangement of a series of alleles of Aspergillus nidulans . Heredity 9 : 343–371. [ Google Scholar ]
  • Rassoulzadegan M., Grandjean V., Gounon P., Vincent S., Gillot I., 2006.   RNA-mediated non-Mendelian inheritance of an epigenetic change in the mouse. Nature 441 : 469–474. [ PubMed ] [ Google Scholar ]
  • Sax K., 1932a  The cytological mechanism of crossing over. J. Arnold Arbor. 13 : 180–212. [ Google Scholar ]
  • Sax K., 1932b  Meiosis and chiasma formation in Paeonia suffruticosa . J. Arnold Arbor. 13 : 375–384. [ Google Scholar ]
  • Scherrer K., Jost J., 2007.   Gene and genon concept: coding vs. regulation. A conceptual and information-theoretic analysis of genetic storage and expression in the light of modern molecular biology. Theory Biosci. 126 : 65–113. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Schibler U., Sierra F., 1987.   Alternative promoters in developmental gene expression. Annu. Rev. Genet. 21 : 237–257. [ PubMed ] [ Google Scholar ]
  • Snyder M., Gerstein M., 2003.   Defining genes in the genomics era. Science 300 : 258–260. [ PubMed ] [ Google Scholar ]
  • Srb A. M., Horowitz N. H., 1944.   The ornithine cycle in Neurospora and its genetic control. J. Biol. Chem. 154 : 129–139. [ Google Scholar ]
  • Stadler P. F., Prohaska S. J., Frost C. V., Krakauer D. C., 2009.   Defining genes: a computational framework. Theory Biosci. 128 : 165–170. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Stern C., 1970.   The continuity of genetics. Daedalus 99 : 882–908. [ PubMed ] [ Google Scholar ]
  • Strauss B. S., 2016.   Biochemical genetics and molecular biology: the contributions of George Beadle and Edward Tatum. Genetics 203 : 13–20. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Sturtevant A. H., 1913.   The linear arrangement of the six sex-linked factors in Drosophila , as shown by their mode of association. J. Exp. Zool. 14 : 43–59. [ Google Scholar ]
  • Sturtevant A. H., 1965.   A History of Genetics . Harper & Row, New York. [ Google Scholar ]
  • Sturtevant A. H., Beadle G. W., 1939.   An Introduction to Genetics . W. B. Saunders, Philadelphia. [ Google Scholar ]
  • Sturtevant A. H., Beadle G. W., 1962.   An Introduction to Genetics. The Dover edition of the work first published by W. B. Saunders Company in 1939 . Dover Publications, New York. [ Google Scholar ]
  • Sutton W. S., 1903.   The chromosomes in heredity. Biol. Bull. 4 : 231–251. [ Google Scholar ]
  • The ENCODE Project Consortium , 2007.   Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project. Nature 447 : 799–816. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • The ENCODE Project Consortium , 2012.   An integrated encyclopedia of DNA elements in the human genome. Nature 489 : 57–74. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • The FANTOM Consortium and RIKEN Genome Exploration Research Group (Genome Network Project Core Group) , 2005.   The transcriptional landscape of the mammalian genome. Science 309 : 1559–1563. [ PubMed ] [ Google Scholar ]
  • Tschermak E., 1900.   Über künstliche Kreuzung bei Pisum sativum . Ber. Deut. Bot. Ges. 18 : 232–239. [ Google Scholar ]
  • Waddington C. H., 1939.   An Introduction to Modern Genetics . Allen & Unwin, London. [ Google Scholar ]
  • Waddington C. H., 1962.   New Patterns in Genetics and Development . Columbia University Press, New York. [ Google Scholar ]
  • Waddington C. H., 1966.   Principles of Development and Differentiation . Macmillan Company, New York. [ Google Scholar ]
  • Waters C. K., 1994.   Genes made molecular. Philos. Sci. 61 : 163–185. [ Google Scholar ]
  • Waters, K., 2013 Molecular genetics. in The Stanford Encyclopedia of Philosophy (Fall 2013 Edition) , edited by E. N. Zalta. Stanford University Press, Redwood City, CA. Available at: < http://plato.stanford.edu/archives/fall2013/entries/molecular-genetics/ >. Accessed: October 27, 2015.
  • Watson J. D., Crick F. H. C., 1953a  Molecular structure of nucleic acids. A structure for deoxyribose nucleic acid. Nature 171 : 737–738. [ PubMed ] [ Google Scholar ]
  • Watson J. D., Crick F. H. C., 1953b  Genetical implications of the structure of deoxyribonucleic acid. Nature 171 : 964–967. [ PubMed ] [ Google Scholar ]
  • Watson J. D., Crick F. H. C., 1954.   The structure of DNA. Cold Spring Harb. Symp. Quant. Biol. 18 : 123–131. [ PubMed ] [ Google Scholar ]
  • Whitehouse H. L. K., 1973.   Towards an Understanding of the Mechanism of Heredity , Ed. 3rd Edward Arnold, London. [ Google Scholar ]
  • Wilkins A. S., 2002.   The Evolution of Developmental Pathways , Sinauer Associates, Sunderland, MA. [ Google Scholar ]
  • Wilkins A. S., 2007.   Between “design” and “bricolage”: genetic networks, levels of selection, and adaptive evolution. Proc. Natl. Acad. Sci. USA 104 : 8590–8596. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Witzany G., 2011.   The agents of natural genome editing. J. Mol. Cell Biol. 3 : 181–189. [ PubMed ] [ Google Scholar ]
  • Wright S., 1917a  Color inheritance in mammals. III: the rat—few variations of factors known until recently—castle’s selection experiment—any interpretation of it demonstrates the efficacy of Darwinian selection. J. Hered. 8 : 426–430. [ Google Scholar ]
  • Wright S., 1917b  Color inheritance in mammals. V. The guinea-pig—great diversity in coat-pattern, due to interaction of many factors in development—some factors hereditary, others of the nature of accidents in development. J. Hered. 8 : 476–480. [ Google Scholar ]
  • Wright S., 1968.   Evolution and the Genetics of Populations. Vol. 1. Genetic and Biometric Foundations . University of Chicago Press, Chicago. [ Google Scholar ]
  • Yanofsky C., Crawford I. P., 1959.   The effects of deletions, point mutations on the two components of the tryptophan synthetase of Escherichia coli. Proc. Natl. Acad. Sci. USA 45 : 1016–1026. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Yanofsky C., Carlton B. C., Guest J. R., Helinski D. R., Henning U., 1964.   On the colinearity of gene structure and protein structure. Proc. Natl. Acad. Sci. USA 51 : 266–272. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Yanofsky C., Drapeau G. R., Guest J. R., Carlton B. C., 1967.   The complete amino acid sequence of the tryptophan synthetase A protein (a subunit) and its colinear relationship with the genetic map of the A gene. Proc. Natl. Acad. Sci. USA 57 : 296–298. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Ycas M., 1969.   The Biological Code . North-Holland Publishing Company, Amsterdam. [ Google Scholar ]

IMAGES

  1. mutations

    term paper genetic mutations

  2. Mutations

    term paper genetic mutations

  3. DNA Mutations Graphic Notes

    term paper genetic mutations

  4. Genetic Mutations- Definition, Types, Causes and Examples

    term paper genetic mutations

  5. PPT

    term paper genetic mutations

  6. Genetic Mutations Definition Types Causes And Example

    term paper genetic mutations

VIDEO

  1. DNA Topoisomerase

  2. 2022 GCE Biology paper 2_ Genetics [The genetic mutations, diagrams and stunted growth]

  3. 5.8 sequence genome

  4. Sec 1\ Biology -2nd term\ Unit 3\ Chapter 1\ Lesson 1 : Chromosomes

  5. Variant Classification

  6. Paper 58e New Orders of Life Can Appear Suddenly from Genetic Mutations

COMMENTS

  1. The population genetics of mutations: good, bad and indifferent

    Abstract. Population genetics is fundamental to our understanding of evolution, and mutations are essential raw materials for evolution. In this introduction to more detailed papers that follow, we aim to provide an oversight of the field. We review current knowledge on mutation rates and their harmful and beneficial effects on fitness and then ...

  2. The origin of human mutation in light of genomic data

    This paper develops a theory to investigate the relationship between damage-induced mutations and the ... H. G. De novo mutations in human genetic disease. Nat. Rev. Genet. 13, 565 ...

  3. The origins, determinants, and consequences of human mutations

    Advances in DNA sequencing have enabled the identification of human germline and somatic mutations at a genome-wide scale.These studies have confirmed, refined, and extended our understanding on the origins, mechanistic basis, and empirical characteristics of human mutations, including both replicative and nonreplicative errors (), heterogeneity in the rates and spectrum of mutations within ...

  4. Meiotic DNA breaks drive multifaceted mutagenesis in the ...

    1 Dec 2023. Meiotic recombination is essential for creation of gametes in most sexually reproducing species ( 1 ). It shuffles genetic material and, together with mutation, creates all genetic diversity. Recombination is initiated by the induction of hundreds of programmed DNA double-strand breaks (DSBs) ( 2, 3 ).

  5. What is mutation? A chapter in the series: How microbes ...

    Mutations drive evolution and were assumed to occur by chance: constantly, gradually, roughly uniformly in genomes, and without regard to environmental inputs, but this view is being revised by discoveries of molecular mechanisms of mutation in bacteria, now translated across the tree of life. These mechanisms reveal a picture of highly regulated mutagenesis, up-regulated temporally by stress ...

  6. Mutation—The Engine of Evolution: Studying Mutation and Its Role in the

    Abstract. Mutation is the engine of evolution in that it generates the genetic variation on which the evolutionary process depends. To understand the evolutionary process we must therefore characterize the rates and patterns of mutation. Starting with the seminal Luria and Delbruck fluctuation experiments in 1943, studies utilizing a variety of ...

  7. Genetic Mutations and Major Human Disorders: A Review

    A mutation is a change in the nucleotide sequence of a. short region of a genome [1] ( Figure 1 ). Mutation, (a. term coined by Hugo de Yeries in 1900, a rediscover. of Mendels principle s) is ...

  8. Mutation hotspots during meiosis

    Meiotic recombination is a genome-wide program of exchanges between the paternal and maternal chromosomes that occurs in germ cells ( 3 ). They found that the mutation rate at meiotic recombination sites was several hundred-fold higher than the genome average. Some mutations were products of the meiotic recombination pathway, and others ...

  9. Genetic Mutation

    One way to think of DNA and RNA is that they are substances that carry the long-term memory of the information required for ... Rates of spontaneous mutation. Genetics 148, 1667-1686 (1998 ...

  10. Mutagenesis

    Atom. RSS Feed. Mutagenesis is the process of generating a genetic mutation. This may occur spontaneously or be induced by mutagens. Researchers also use a number of techniques to create mutations ...

  11. Mutation Research

    A section of Mutation Research Mutation Research: Genetic Toxicology and Environmental Mutagenesis (MRGTEM) publishes papers advancing knowledge in the field of genetic toxicology. Papers are welcomed in the following areas: New developments in genotoxicity testing of chemical agents (e.g., in methodology of assay systems and interpretation of ...

  12. Defining "mutation" and "polymorphism" in the era of personal genomics

    In the case a sequencing project did not include as a reference the germ-line DNA of an individual, the term "mutation" could not be used and should be replaced by the neutral term "variant" (Fig. 1b), as previously suggested [].Therefore, in the sequencing report the alternative use of the term "mutation" or "variant" will also clarify which kind of reference was adopted.

  13. Mutation

    mutation, an alteration in the genetic material (the genome) of a cell of a living organism or of a virus that is more or less permanent and that can be transmitted to the cell's or the virus's descendants. (The genomes of organisms are all composed of DNA, whereas viral genomes can be of DNA or RNA; see heredity: The physical basis of ...

  14. Mechanisms of protein evolution

    Further, we describe various mechanisms that may expedite this process. For instance, it is possible that the genomic mutations needed for conferring a novel function might not be present in a population, they can however, rise by non‐genetic mechanisms mediated by errors in replication, transcription, and translation (phenotypic mutations). 2 , 3 , 4 Thus, the upcoming new function is ...

  15. What is a Genetic Mutation? Definition & Types

    A genetic mutation is a change to a gene's DNA sequence to produce something different. It creates a permanent change to that gene's DNA sequence. Genetic variations are important for humans to evolve, which is the process of change over generations. A sporadic genetic mutation occurs in one person.

  16. Changing fitness effects of mutations through long-term bacterial

    Transposon mutagenesis of E. coli strains from a long-term evolution experiment and bulk fitness assays enable characterization of genome-wide and gene-level distribution of fitness effects (DFE). The overall shape of the DFE is conserved, except for a declining beneficial tail, while the effects of specific mutations and gene essentiality often evolve in parallel across populations.

  17. Physical Activity and Incident Obesity Across the Spectrum of Genetic

    Importance Despite consistent public health recommendations, obesity rates in the US continue to increase. Physical activity recommendations do not account for individual genetic variability, increasing risk of obesity. Objective To use activity, clinical, and genetic data from the All of Us Research Program (AoURP) to explore the association of genetic risk of higher body mass index (BMI ...

  18. Term Paper on Mutation

    Term Paper # 1. Definition of Mutation: Mutation is defined as sudden heritable change in the structure of a gene or chromosome or a change in chromosome number. The term mutation was introduced by Hugo de Vries in the year 1901 for the inheritable variation which appeared in the Evening primrose, Oenothera lamarckiana.

  19. Genetic Mutations

    Inherited versus acquired gene mutations. Gene variants, including mutations, can be either inherited or acquired. An inherited gene mutation, as the name implies, is inherited from a parent, so it's present in the very first cell (once the egg cell is fertilized by a sperm cell) that eventually becomes a person.

  20. UF-led researchers link new genetic mutation to increased risk of

    The finding of the RAB32 Ser71Arg variant, reported April 10 in The Lancet Neurology, is the latest advance by UF neurogeneticist Matthew J. Farrer, Ph.D., senior author of the new paper, whose lab has made past key discoveries involving genetic mutations that can cause Parkinson's.

  21. Genetics, Mutagenesis

    Mutagenesis is the process by which an organism's deoxyribonucleic acids (DNA) change, resulting in a gene mutation. A mutation is a permanent and heritable change in genetic material, which can result in altered protein function and phenotypic changes. DNA consists of nucleotides that contain a phosphate backbone, a deoxyribose sugar, and one of four nitrogen-containing bases (adenine [A ...

  22. Term Paper on Genetic Polymorphism

    Term Paper # 2. Types of Genetic Polymorphism: There are six types of genetic polymorphism, viz.: ... Neutral Mutation: In a population, mutations do arise. However, the majority of mutants are harmful and deleterious. Such mutants are lost only few mutants will survive and replace the original allele. The changes in gene frequency depend on ...

  23. Mutation, Repair and Recombination

    Both mutation and recombination can have dramatic effects on the cell in which they occur. A mutation in a key gene may cause the cell to die if the protein coded by the mutant gene is defective (Section 14.1.2), and some recombination events lead to defining changes in the biochemical capabilities of the cell, for example by determining the mating type of a yeast cell or the immunological ...

  24. A developmental exit from totipotency

    A paper in Nature Genetics identifies a mechanism involving the transcription factor DUXBL that controls the development of early embryonic mouse cells past stages marked by totipotency.

  25. New Test Available: Equine Juvenile Spinocerebellar Ataxia (EJSCA

    Equine Juvenile Spinocerebellar Ataxia (EJSCA) is an inherited neurologic disease that causes ataxia in American Quarter Horses. The variant causing this disease was identified at UC Davis by Dr. Carrie Finno, Gregory L. Ferraro Endowed Director of the UC Davis Center for Equine Health (CEH), and colleagues, and the scientific paper describing this finding is currently in progress.

  26. The Evolving Definition of the Term "Gene"

    Abstract. This paper presents a history of the changing meanings of the term "gene," over more than a century, and a discussion of why this word, so crucial to genetics, needs redefinition today. In this account, the first two phases of 20th century genetics are designated the "classical" and the "neoclassical" periods, and the ...