Sunday, June 17, 2007

The Genome Is Miscellaneous

Hopefully by now you have read David Weinberger's Everything Is Miscellaneous: The Power of the New Digital Disorder. It's quite an interesting and absorbing read, one of those books that makes you look at the world just a bit differently. I seem to be doing that an awful lot lately, finding unexpected applications of Weinberger's thesis all over the place. The latest? The human genome!

The ENCODE Project just published its findings from a detailed investigation of 1% of the human genome, and it looks like it's waaaaaaaaaay more complex and interesting than we thought. There's the main article (DOI: 10.1038/nature05874) in the current issue of the journal Nature, and a whole slew of additional articles in this month's Genome Research. I've been working through Gerstein, et al.'s What is a gene, post-ENCODE? History and updated definition (DOI: 10.1101/gr.6339607) for a very absorbing look at how our notion of a "gene" has changed dramatically in the years since Mendel and his peas, and where our understanding of "gene" stands in light of this exciting new data from ENCODE.

It looks like the genome, far from being a nicely organized library of genetic building blocks, is a messy snarl of bits of coding DNA, all mixed up together in a pile. There is of course some physical structure to it all, but it seems pretty well jumbled up; the parts of a gene don't even need to be on the same chromosome. It reminded me of Weinberger's big miscellaneous pile, into which all our information goes, waiting to be organized by users and searchers according to their needs and desires. In the Miscellaneous Genome, the users and searchers are the complex regulatory networks of the cell, which seek out and assemble the bits they need to create the machinery and processes of life. They know how to read the genomic metadata that we are trying to grasp; once we can read the metadata, we'll be able to sift through the Miscellaneous Genome with ease.

Go read the book; go read the articles. Good stuff.

1 comment:

Brian H said...

An '80s book called "Artificial Intelligence", and consisting of a number of reports and articles, ended with one making an interesting comparison; it seems the repeating chunks of non-transcribed code, the introns, have about the same frequency distribution pattern as words in a natural language. The implication was that they laid out an operating manual for the coding bits (genes), and even for their modification -- evolutionary strategy. The latter seems plausible once you try to account for the differential patterns of mutability throughout the coding genome.

Such a manual or strategy pool would be highly pro-survival and powerful, as a way of bypassing the huge statistical hurdle of reliance on the straight "random error hypothesis" that is the assumed base for mutation and evolution. Once even a single such "rule" was evolved and conserved, there would be tremendous selection pressure for more of the same.