Science & Technology

Year of the Genome


Forget the screwed-up presidential election. History will remember 2000 as the year in which the genomes of three important multicellular organisms were completely sequenced.

March saw the publication of the genetic sequence of that great laboratory stalwart, the fruit fly Drosophila melanogaster, which weighed in with 13,601 genes.

In June, Craig Venter of Celera Genomics and Francis Collins of the National Human Genome Research Institute announced at the White House that a draft version of the human genome had been completed.

And just this past week, researchers published the first analysis of the complete genome of a flowering plant, the diminutive weedy mustard plant, Arabidopsis thaliana. Its genome is the longest yet officially published, coming in at 25,498 genes located on five chromosome pairs. However, since 70 percent of its genome appears to have been duplicated, Arabidopsis is believed to have fewer than 15,000 different genes. It turns out that this kind of extensive genomic duplication is common in plants and seems to provide some evolutionary insurance against unpredictable environments. If a plant has many variants of a gene, one of those variants might give it the edge it needs to survive a disease or a drought.

All this genomic information will spur the pace of biotechnological progress. The more gene sequences researchers have to compare, the more quickly they can learn the function of specific genes. For example, researchers in the Arabidopsis Genome Initiative compared a list of 289 human disease genes to the whole Arabidospsis genome and found that nearly 50 percent had matches in the plant's genome. This kind of information can help scientists figure out more quickly what the specific genes do and how they go wrong.

Why sequence Arabidopsis, as opposed to a more obviously useful plant like rice or maize? Arabidopsis is a tiny plant which grows and reproduces rapidly, making it ideal for laboratory work. Scientists can easily vary environmental conditions such as pest attacks, salinity, and water availability to find out which of the plant's genes do what. In fact, researchers plan to figure out exactly what each of Arabidopis' 25,498 genes do by 2010. Another advantage is that Arabidopsis' genome is also only 115 million base pairs long, compared to around 400 million for rice.

Despite obvious surface differences between a mustard plant like Arabidopsis and grasses like rice and wheat, flowering plants are remarkably similar. As Stanford University biologist Virginia Walbot explains in Nature, "Most of developmental and physiological processes in Arabidopisis, as well as the genes controlling them, will have counterparts in crop plants." "Rice is wheat is maize," is a truism among plant physiologists. The International Rice Genome Sequencing Project, launched just two years ago, plans to complete sequencing the rice genome by 2004. Last August, Monsanto gave a boost to the rice project when it announced that it would allow researchers to have free access to its already completed draft sequence of the rice genome.

The next big genomic milestone will be the publication of the complete human genome. Both Celera Genomics and the International Human Genome Project submitted their human genome sequence manuscripts in the first week of December. Joint publication of those papers will most likely occur in early February. "It's a heck of an interesting manuscript," says the IHGP's Francis Collins.

The exact number of genes that it takes to make a human being will not be known for a couple more years, but the usual estimates of somewhere between 80,000 and 100,000 genes are way off the mark. It appears now that the human genome consists of between 30,000 and 40,000 genes. Consider that it takes 19,099 genes to make the tiny nematode worm, Caenorhabditis elegans. "We're twice a worm," says Collins.

This is fascinating because a full-grown C. elegans is made up of exactly 959 cells, while humans have trillions of cells. The difference between a human and worm is clearly not dependent on the difference in the number of genes, but on the number of proteins that go into making up an organism. For decades, the "Central Dogma" of biology has declared "one gene makes one protein."

Now, genomic science is about to overthrow the Central Dogma. Human beings are made up of some 120,000 different proteins, which means that on average one human gene must be translated into 3 or 4 different proteins. The difference between simpler organisms and complex organisms appears to be that the genes of more complex organisms can be read in various ways rather than in only a single way. It's findings like this that promise to make February an exciting month.

A co-author of the Arabidopsis paper in Nature, Athanasios Theologis told Science News, "Every ten years there are advances, and this is one of the greatest." Theologis added, "It's a beautiful time." And a beautiful way to end 2000.