Re: [asa] Nature editorial

From: Rich Blinne <>
Date: Wed Jun 13 2007 - 22:32:40 EDT

On Jun 13, 2007, at 7:31 PM, Randy Isaac wrote:

> That same issue has an intriguing article in it:
> Genome project turns up evolutionary surprises p760
> Findings reveal how DNA is conserved across animals
> Erika Check
> Whoever has access to this issue might be able to let us know about
> these surprises.

This must be a big deal because it has a news story, a review
article, and the study.
News story

> The latest studies of the instructions embedded in the human genome
> are revealing how evolution has shaped our species.
> On page 799 of this issue1, 2, and in a themed issue of Genome
> Research3, scientists report the first findings from a project
> called ENCODE. This 'encyclopedia of DNA elements' attempts to
> discover how our cells make sense of the DNA sequence in the human
> genome. Already, ENCODE is up-ending one piece of conventional
> scientific wisdom: the idea that biologically relevant DNA resists
> change over evolutionary time.
> ENCODE aims to catalogue all the "functional elements" in the
> genome the DNA sequences that control how and when our cells use
> our genes. Most of these controls seem to be written into so-called
> non-coding DNA, which does not make a detectable protein product.
> Because organisms depend on functional elements working correctly,
> scientists have long thought that such elements should not change
> much over evolutionary time. So researchers have mostly looked for
> key functional elements in non-coding DNA that is the same across
> species, known as conserved or constrained DNA.
> But ENCODE is the first project to compare long stretches of non-
> coding DNA across many mammals, from mice to monkeys to humans.
> This comparison suggests that evolutionary processes don't always
> freeze functional DNA in place.
> "The fact that we found so much functional sequence that did not
> seem to be evolutionarily constrained across all mammals is really
> surprising," says Elliott Margulies of the National Human Genome
> Research Institute in Bethesda, Maryland, who co-chaired one of the
> ENCODE analysis groups.
> The finding comes from the ENCODE pilot project, which used
> multiple methods to collect and analyse data on just 1% of the
> human genome not an easy task (see 'Scaling up to a monumental
> task'). In one part of the project, groups of experimental
> biologists used a suite of laboratory techniques to find out what
> portions of the genome might be functional. Meanwhile, groups of
> computational biologists compared the ENCODE sequences across
> humans and 28 other animals to find constrained regions of DNA that
> had changed little throughout evolution.
> But when the different groups compared their results, they found
> that their predictions about key portions of the genome didn't
> always agree: the biologists' list of functional sequences didn't
> match the computational group's list of constrained sequences.
> At first, many were sceptical of this result, says John
> Stamatoyannopoulos of the University of Washington in Seattle, a co-
> chair of one of the ENCODE analysis groups. "It raised some
> eyebrows," he says. "But eventually all the ENCODE groups started
> coming out with the same thing." Overall, biologists found no
> evidence of function for about 40% of the constrained ENCODE
> regions. On the flipside, about half of the functional elements
> found in non-coding DNA were totally unconstrained.
> The finding that many constrained regions weren't considered to be
> functional is not too surprising, because it is unlikely that
> ENCODE included enough tests on enough different types of cells to
> capture every major aspect of biology. But the idea that important
> DNA might also be unstable is newer, and intriguing, because it
> undermines the assumption that biological function requires
> evolutionary constraint.
> "We're generalizing this principle over mammals, and over many
> functional elements," says Ewan Birney, head of genome annotation
> at the European Bioinformatics Institute in Cambridge, UK, and a
> leader of ENCODE. "We're coming out quite strongly that this is not
> merely a curiosity of our genome it's a really important part of
> the way our genome works."
> But how can major components of the mammalian genome change
> essentially randomly over time? That is not entirely clear. The
> authors of the ENCODE paper speculate that the unconstrained
> genomic regions are evolving "neutrally" that is, they are
> constantly changing in ways that are neither good nor bad for the
> individual. This means that, on the whole, many genetic changes
> simply don't affect overall biology.
> This has major consequences for understanding the relationship
> between genetics and biology, Birney says. "It means, for example,
> that if you look at some conserved piece of biology say, how the
> kidneys work in mice and humans not all of those bits of biology
> will be conserved or constrained at the level of the DNA bases, and
> that's quite a strong shift."
> But not everyone agrees with that take. For example, John Mattick
> at the University of Queensland in Brisbane, Australia, argues that
> the widely accepted calculation of the baseline, or neutral, rate
> of mammalian evolution is flawed. Because measurements of
> constraint rely on a comparison with the neutral rate, it is
> possible that many of ENCODE's so-called unconstrained regions
> really aren't unconstrained, Mattick argues.
> "I would have said that this finding suggests that many regions of
> our genome are evolving under weak selection pressure, or that our
> measurements of the neutral rate of evolution are incorrect," says
> Mattick, who is an author on the ENCODE paper.
> In fact, Mattick thinks scientists are vastly underestimating how
> much of the genome is functional. He and Birney have placed a bet
> on the question. Mattick thinks at least 20% of possible functional
> elements in our genome will eventually be proven useful. Birney
> thinks fewer are functional. The loser will buy the winner a case
> of the beverage of his choice.
> Meanwhile, other scientists are gathering data to answer new
> questions raised by ENCODE. Many hope that other ongoing studies,
> such as comparable genome sequences from additional primate
> species, will help decide which parts of the ENCODE data to study
> first.
Review article

> Researchers of the ENCODE consortium have analysed 1% of the human
> genome. Their findings bring us a step closer to understanding the
> role of the vast amount of obscure DNA that does not function as
> genes.
> We usually think of the functional sequences in the genome solely
> in terms of genes, the sequences transcribed to messenger RNA to
> generate proteins. This perception is really the result of
> effective publicity by the genes, who take all of the credit even
> though their function is basically limited to communicating genomic
> information to the outside world. They have even managed to have
> the entire DNA sequence referred to as the 'genome', as if the
> collective importance of genes is all you need to know about the
> DNA in a cell.
> We should have guessed that this was merely prima-donna behaviour
> on the part of narcissist genes when the sequencing of the human
> genome revealed that they comprise only a small percentage of the
> DNA. And our confidence should have been shaken when some sequences
> located far from any genes were found to be strikingly conserved1,
> indicating that they have some important function. Now, on page 799
> of this issue2, the ENCODE Project Consortium shows through the
> analysis of 1% of the human genome that the humble, unpretentious
> non-gene sequences have essential regulatory roles (Fig. 1).
> We are increasingly being forced to pay attention to our non-gene
> DNA sequences. For example, in attempts to find the cause of an
> inherited disease, investigators study hundreds of thousands of
> sequence variations in the genome, known as single nucleotide
> polymorphisms (SNPs), to see which ones are non-randomly associated
> with the disease. Recently, such studies3, 4, 5, 6 revealed several
> sequence variations associated with type 2 diabetes and its related
> manifestations, but only a minority of the identified SNPs were
> located within genes. So, if we are to understand this insight into
> the causes of human diseases, we need to know what the functions of
> the non-gene majority are.
> The aim of the ENCODE (encyclopaedia of DNA elements) project is
> exactly that to identify every sequence with functional
> properties in the human genome7. The results of the pilot phase of
> this project2, which involved an analysis of 1% (30 megabases) of
> the human genome, are not good news for genes, which will no longer
> be able to hog the limelight. Even this preliminary study reveals
> that the genome is much more than a mere vehicle for genes, and
> sheds light on the extensive molecular decision-making that takes
> place before a gene is expressed.
> A valuable aspect of the project is that, when possible, ENCODE
> researchers addressed the same question using several different
> techniques. For example, combining microarray and sequencing
> approaches with computational analyses, they found that many more
> sections of the genome are transcribed into RNA than had previously
> been recognized. The concordance of these approaches lends strength
> to the conclusions, which are challenging to those accustomed to
> the idea that only protein-coding sequences are expressed. If you
> look hard enough, you can find evidence that most of the human
> genome is transcribed as RNA at some time or another.
> This pervasive transcription could be due to either many
> unrecognized sites in the genome that initiate transcription, or a
> previously unsuspected tendency for the RNA polymerase enzyme to
> stay on the DNA and keep going when it has already finished its job
> of expressing a gene. This question is addressed by the second
> component of the ENCODE report2, which looks at the regulation of
> transcription. These results are much more satisfying, probably
> because they are less challenging to our preconceptions on the
> subject.
> First, the authors identified the locations of the transcription
> start sites, and correlated these with how the DNA is packaged
> around histone proteins to form chromatin. They then identified
> where within the chromatin certain histones are marked by chemical
> modifications, and identified positions at which transcription-
> regulatory proteins were binding to the DNA. They found evidence of
> regulatory functions for sequences at transcription start sites, as
> expected, but also at other sites in the DNA. The combinations of
> regulatory marks differed between those at promoters (sequences
> upstream of genes that regulate their expression) and those
> elsewhere, indicating that the non-promoter sequences have distinct
> functions that are yet to be defined.
> The transcriptional regulators identified by the ENCODE
> investigators clearly occur in patterns within very short genomic
> regions. When one zooms out to look at more DNA at a time, it
> becomes apparent that there is also a larger-scale organization to
> the genome. Although the significance of this organization remains
> unclear, it parallels the scale of organization of another genomic
> phenomenon the regulation of DNA replication. Sequences
> associated with this process have long been known to be organized
> into domains of hundreds of thousands of base pairs8. A strength of
> the coordinated ENCODE approach is the ability to correlate DNA
> replication with large-scale patterns of the organization of
> transcriptional regulators in the same cell types. Accordingly, the
> authors show that early-replicating regions are enriched in histone
> modifications associated with gene activation, and late-replicating
> regions are marked with repressive modifications.
> The final question the ENCODE researchers addressed related to
> sequences that are known to be highly conserved1, but that are not
> part of recognized genes. Having defined regulatory sequences
> within the 1% of the genome, the authors could ask whether these
> sequences are unusually well conserved and, conversely, whether
> highly conserved sequences are functional. The answer to both
> questions was a qualified 'yes'. Some very conserved sequences are
> not obviously functional, but this isn't of much concern, as it is
> possible that looking at more tissues would have revealed cell-type-
> specific functions. In addition, a minority of obviously functional
> sequences is not very well conserved, but the authors acknowledge
> that sequence conservation studies may not be sensitive enough to
> detect retention of a transcription-factor binding site in the
> context of extensive local sequence divergence. This offers a
> glimpse of a potentially fascinating area of research studying
> sites of conserved function but with mostly diverged sequences, and
> looking for the preservation of very small sequence motifs that
> would elsewhere be considered noise and disregarded.
> So how much have we learnt about the functional regions of this 1%
> of the genome through the ENCODE study2? The significance of the
> different parts of the study varied. For example, the ascribed
> function of long transcripts that are not translated into proteins
> is very speculative. It is interesting that these non-coding
> transcripts are found at sites of genomic imprinting9 (in which the
> allele expressed is predetermined by the parent from which it
> originated), but this could be due to the intensive study of
> imprinted loci, at which non-coding RNA has yet to be shown to have
> consistent functional significance. Furthermore, the large-scale
> organization of the genome doesn't have much immediate effect on
> elucidating the function of individual sequences within them.
> Instead, the discrete, local, regulatory processes will be most
> relevant in the short term for instance, in the interpretation of
> association studies when the informative SNPs are not within, and
> are thus not altering, gene sequences3, 4, 5, 6.
> The big question is how the ENCODE consortium will scale-up its
> efforts to study the remaining 99% of the human genome. One problem
> to address is the choice of cell types for such studies. In the
> pilot phase, the consortium decided, for pragmatic reasons, to
> include certain cell lines such as the HeLa and HL60 cells that
> are easily grown in culture so that they could be distributed to
> geographically dispersed researchers. However, a disadvantage of
> using such cell types is that they often have broken chromosomes
> that have unusual additions and losses of DNA from different
> regions of the genome. So whether the regulatory processes in these
> cell lines are representative of those in primary cells from the
> human body is, to say the least, questionable.
> The other concern is the possibility that certain regulatory
> processes are cell-type specific. If the ENCODE project is to
> succeed in defining every regulatory element in the genome, the
> investigators will have to study the entire genome of every
> possible cell type, which is a daunting task. So although the
> glimpse we are provided by the ENCODE consortium into the ordered
> complexity of 1% of the human genome is tantalizing, the insights
> only confirm the challenges that lie ahead. However, as long as we
> continue to implicate the function of regions containing non-gene
> sequences in human disease, we will have to embrace this challenge.

> We report the generation and analysis of functional data from
> multiple, diverse experiments performed on a targeted 1% of the
> human genome as part of the pilot phase of the ENCODE Project.
> These data have been further integrated and augmented by a number
> of evolutionary and computational analyses. Together, our results
> advance the collective knowledge about human genome function in
> several major areas. First, our studies provide convincing evidence
> that the genome is pervasively transcribed, such that the majority
> of its bases can be found in primary transcripts, including non-
> protein-coding transcripts, and those that extensively overlap one
> another. Second, systematic examination of transcriptional
> regulation has yielded new understanding about transcription start
> sites, including their relationship to specific regulatory
> sequences and features of chromatin accessibility and histone
> modification. Third, a more sophisticated view of chromatin
> structure has emerged, including its inter-relationship with DNA
> replication and transcriptional regulation. Finally, integration of
> these new sources of information, in particular with respect to
> mammalian evolution based on inter- and intra-species sequence
> comparisons, has yielded new mechanistic and evolutionary insights
> concerning the functional landscape of the human genome. Together,
> these studies are defining a path for pursuit of a more
> comprehensive characterization of human genome function.

To unsubscribe, send a message to with
"unsubscribe asa" (no quotes) as the body of the message.
Received on Wed Jun 13 22:33:25 2007

This archive was generated by hypermail 2.1.8 : Wed Jun 13 2007 - 22:33:25 EDT