Garma Garam
Hulchal: News & Analysis

Saddi Dharti Sadde Log
The land of five rivers
Our Culture & Heritage

Punjabi Millennium
A Saga of Sacrifice & Struggle

Sabhyachaar

Books
Literature
Fiction
Humor
Poetry
Art & Culture...


Faith and Religion 

Sikhism
Sufi and Bhakti Tradition 
Arya Samaj
Hinduism
Islam
Communalism & Secularism


Rasoi
Punjabi Delicacies
Exotic Recipes


Education

Institutions
Studying Abroad
Career...


Tourism

Destination Punjab
Links


Media

Newspapers 
Magazines 
Television
Online 
Radio

More
Health
InfoTech
Science
Environment
Sports
Agriculture
Business
Music
Films
Kidz & Youth
Fashion
  

At Your Service
Weather
Matrimonials 
Free e-mail
Free Web Pages 
Plus

Home

 

Agriculture  


AN INTERNATIONAL COLLABORATION TO SEQUENCE THE RICE GENOME




There is strong interest among many cereal biologists to sequence the rice genome. Given its relatively small size, it is a feasible undertaking given present technology. Nevertheless, with a genome size of approximately 400 Mb, the task is so great that it is unlikely that any one country can devote the resources to sequence the rice genome in the next ten years. In any case, an international effort will accelerate the process and insure public access to the data. Such a collaboration will greatly benefit from tools that have already been developed in key laboratories working on rice genomics and from parallel collaborative efforts on other genomes.

The purpose of this document is to summarize the current state of an international collaboration to sequence the rice genome and to form the basis for future decisions. Members of the Working Group solicit your comments and suggestions.

Why sequence rice?

Rice as a model cereal

Genome size: Oryza sativa ssp. japonica is reported to have a 2C value of 0.88 pg, three times the size of the Arabidopsis thaliana genome. The predicted gene density is one gene every 15 kb. As such, rice has the smallest genome of the major cereals.

Well-mapped genome: The rice molecular map with over 2300 markers has already been useful in helping align physical maps. Over 30,000 ESTs have been reported and many are mapped. At least 200 mapped SSTs have been published. A YAC library has been fingerprinted and ordered with mapped markers currently covers 52% of the rice genome. Several BAC libraries have been described. A recent report suggests that 92% of the genome is covered by ordered BACs in many contigs.

Molecular genetics: With the introduction of new methods for Agrobacterium tumefaciens transformation, rice is the easiest of all cereal plants to transform genetically. This tool permits geneticists to complement mutations or confer dominant phenotypes to verify gene function.

Synteny

While grass genomes differ markedly in the amount of total DNA, they share a common set of genes. Recent work indicates that the grass genomes - wheat, rye, barley, maize, sorghum, millet, and rice - have similar genetic maps over large blocks of the chromosomes. When examined in detail, local gene order has been found to be preserved, but the genes are separated by greater amounts of repetitive DNA - mostly retrotransposons - in the species with larger genomes. This syntenic relationship can be exploited, for instance, by geneticists who are interested in the map-based cloning of a gene controlling chromosome pairing which has been mapped in wheat but are faced with dealing with a genome 37 times larger than that of rice. By selecting tightly linked single copy makers in wheat, the wheat geneticists will be able to screen the homologous region in rice for their candidate homologue. The syntenic relations can be exploited in the other direction as well. For example, mapping data can be taken from maize where there is extensive work in both transmission and molecular genetics to predict the location of a homologue in rice.

Commercial Value

Rice, wheat, and maize account for approximately half of the world's food production. Rice itself is the principal food of half of the world's population. Over the last 30 years world rice production has doubled as the result of the introduction of new varieties and improved technology. However, the annual rate of rice production has slowed to the point that it is no longer keeping pace with the growth in the number of consumers. Rice production in the next fifty years faces even greater challenges. On the one hand, with a larger and more affluent population there will be greater demands for higher production and better quality rice. On the other hand, the same constraints mean that there will be less land, water, and labor to produce the crop. In short, there will be great demands on biotechnology to improve rice production.

Map-based sequence information: The objective of plant breeding is the selection of favorable combinations of genes. In recent years, plant breeding has been enhanced by molecular marker technology that permits one to screen larger populations with less progeny testing. Knowledge of the location of all genes in a genome extends molecular marker technology because it becomes possible to identify candidate genes controlling specific traits. The genes then become the markers and the process becomes more accurate and more efficient. For example, knowing the location and sequence of candidate genes makes it possible to design allele specific markers which readily lend themselves to automation.

Models

International alliances formed to sequence yeast, C. elegans, humans, and Arabidopsis provide examples of how to manage an international collaboration to sequence rice. These previously established efforts will provide examples for the present endeavor. It can be noted, that government agencies are fortunately already familiar with the writing and approval of memoranda of agreement in this area. Some lessons we can learn from other genome efforts are:

1) Shared tools and information. In other projects, it has proven useful for all groups to have access to and work from the same few libraries . All data - physical mapping information and sequences - should be released in a timely fashion. These are principles that the participants in the rice genome sequencing collaboration have already agreed to.

2) Scientists initiate the collaboration. Scientific rather than political decisions should dictate the specifics of the collaboration. Individual sequencing projects will be funded nationally, locally managed, and subject to oversight of their respective funding agencies. Nevertheless, a system of peer oversight should guide these projects.

3) Sequencing should be done in the most efficient manner based on the science. The effort should not be diluted by peripheral projects.

Rice Genome Workshop

On September 23, 1997, scientists interested in the genomic sequencing of rice met to participate in a workshop held in conjunction with the International Symposium on Plant Molecular Biology in Singapore. The meeting was chaired by Ben Burr, Brookhaven National Laboratory, NY, USA, and Mike Gale, John Innes Centre, Norwich, UK.

The participants who spoke were:

Dr. Takuji Sasaki, Rice Genome Program, NIAR/STAFF, Tsukuba, Japan

Dr. Moo Young Eun, National Institute of Agricultural Science and Technology, Suweon, Korea    

Dr. Rod Wing, Clemson University, Clemson, SC, USA

Dr. Guo-liang Wang, Institute of Molecular Agrobiology, Singapore

Dr. Michael Roberts, John Innes Centre, Norwich, UK

Dr. John McPherson, Washington University, St. Louis, MO, USA

Dr. Jo Messing, Waksman Institute, Rutgers University, Piscataway, NJ, USA

Dr. Andy Pereira, CPRP-DLO, Wageningen, The Netherlands

Dr. John Bennett, International Rice Research Institute, Los Banos, The Philippines

Dr. Apichart Vanavichit, Kasetsart University, Nakorn Pathom, Thailand

Dr. Cliff Gabriel, Office of Science and Technology Policy, Washington DC, USA

Dr. Zhi-Hong Xu, Chinese Academy of Sciences, Beijing, China

Dr. Gary Toenniesson, The Rockefeller Foundation, NY, USA


In addition, there was discussion from the floor.

At that meeting, the participants agreed to participate in an international collaboration to sequence the rice genome. Participants explicitly agreed to share materials, including libraries, and to the timely release to public databases of physical mapping information and annotated DNA sequences.


Furthermore, general agreement was reached on the initial steps in methodology:

1) The cultivar, Nipponbare, also known as GA3, will be sequenced. Seed from a single plant will be distributed by Dr. Sasaki for the purpose of making libraries. The primary reasons for choosing this cultivar are that more than 10,000 EST sequences from the strain have been released to DDBJ and that a physical map based on YACs that covers over 50% of the genome has been published. Sequencing other cultivars is strongly discouraged as genetic polymorphisms cannot be distinguished from sequencing errors. Moreover, groups not sequencing from one of the shared libraries would not benefit from the associated accumulated knowledge and the other advantages of collaboration.

2) The RGP will make a PAC library. Dr. Rod Wing will make three BAC libraries using partial digests of different enzymes to generate the inserts. 60,000 BAC clones will be isolated to provide a 20-fold coverage of the genome.

3) The BACs and PACs will be fingerprinted for the purposes of preparing contigs and checking the integrity (deletions or rearrangements) of the clones. The information generated will also be invaluable where repeated sequences make BAC and PAC end sequences ambiguous.

4) In parallel with fingerprinting, the BAC and PAC clones will be subjected to end-sequencing. This should provide an STS every 3 to 5 kb on average, allow genome sequencers to pick the clones with minimum overlap, and provide further information for the physical map.

It is important that none of these early steps delay large scale sequencing. Preparation of the PAC library is currently underway and preparation of the BAC libraries will begin shortly. Both types of libraries will be available before the end sequencing can begin. It is estimated that with the participation of several laboratories, end sequencing could be completed within six months. The analyses of fingerprinted Arabidopsis libraries are expected to be completed by the end of 1997. These results will indicate what we might expect for the rice project in term of speed, cost, and the degree of closure.

The Workshop concluded with the nomination of a provisional Working Group chosen to direct the collaboration and to decide future directions. This document will appear on Web sites viewed by rice researchers and comments are solicited. The next meeting of the Working Group will be held in conjunction with the Rice Genome Forum, February 5, 1998, in Tsukuba, Japan. Members of the Working Group are:

Dr. Takuji Sasaki, Japan

Dr. Zhi-Hong Xu, China

Dr. Moo Young Eun, S. Korea

Dr. Jo Messing, USA

Dr. Mike Bevan, Europe

Dr. Ben Burr, representing the Rockefeller Foundation


The Rockefeller Foundation has offered to facilitate administration of the collaboration.

Future Decisions

Membership in the International Rice Genome Sequencing Initiative

Any group willing to sequence large stretches of contiguous genomic DNA is welcome to join the collaborative effort as long as they are willing to follow the agreed upon guidelines. In Singapore there was some discussion about the minimum amount of sequence a group would have to contribute annually to maintain membership.


The Rice Genome Working Group

The Working Group is the body that will make decisions that pertain to the goals, strategies, and coordination of the collaborative effort. The Working Group will be responsible for planning the most efficient means of completing the project. Among its responsibilities will be assigning regions to be sequenced that will avoid duplication and maximize overall progress.

The Working Group is envisioned as being comprised of representatives of the major groups participating in rice genome sequencing. The current group is provisional and it is recognized that some of the major contributors to the effort might change. Rules for deciding membership in the Working Group need to be established.


Sequencing strateg:

It has been implicit in the discussions, but never stated, that once the BAC- and PAC-end sequencing is completed and the relevant fingerprinting data is available, the most efficient sequencing strategy of complete BACs or PACs will be from random subclone libraries. It will be useful to standardize this technology to insure high quality libraries that are completely randomized with non-chimeric inserts of a uniform size. Sequencing in a specific region of the genome should not start until a sufficient number of tiled BACs or PACs are available to ensure an unfragmented sequence.

In the Human Genome Project it has been found that assembly of shotgun sequences leads to contigs of about 30 kb. Sequence closure is the most difficult step in the sequencing process because it cannot be automated. Closure will be aided by restriction site information available from fingerprinting and possibly sequence information from overlapping BACs or PACs. Should ambiguities remain, they should be marked on the final sequence. The final product of this phase will be a single contiguous sequence representing the entire PAC or BAC.


Accuracy

The Rice Genome sequencing project, which will serve as a model for all other grasses, will cost an estimated $200M. Given the significant costs in material and manpower, it is imperative that the results be of the highest quality.

In part, this problem has been addressed by agreeing to sequence DNA from the same cultivar, if not the same plant, to minimize variation due to genetic polymorphism. The Human Genome Project has agreed to accept a standard of less than one error in 10,000 bp. While the level of accuracy is difficult to verify, this standard is achievable by a combination of high quality shotgun sequence reads, a seven-fold redundancy, and the requirement that every base be sequenced on both strands. Rice is expected to have 50% repetitive DNA. Because of this, the accuracy of final assembly of shotgun sequences will be dependent on the length and quality of individual sequence reads. The Working Group might wish to establish some guidelines here.


Annotation and Sequence Release

In other genomic sequencing efforts, it has been recognized that the most useful releases are large contiguous stretches of annotated sequence. A uniform standard of annotation must be agreed upon that checks the integrity of the sequence, assigns and identifies regions of homologies, and delineates potential open reading frames. This should not preclude individual groups from publishing unannotated sequences on their local web sites.

In Singapore the participants agreed to timely release of the sequence information. It might be useful for each participating group to agree to release of the complete annotated sequence of a BAC or PAC within three to six months after beginning to sequence the clone.


Rice Genome Database

An integrated database will facilitate collaboration and data sharing. Sequences will be released to one of the public databases, DDBJ, EMBO, or GenBank, but a central database for the project will be required to store and manage the annotation information. With ever expanding databases, annotation is never complete. It may be advisable to assign the task of periodic update of the annotation of rice genomic sequence to the centralized rice genome database.

The database should also be linked with other rice and cereal databases, serve as a means of coordinating sequencing work, and provide methods for submitting and using information.


Functional Genomics

To date at least 50% of newly discovered open reading frames do not have homologues with identifiable function. The use of populations with transposable element-induced knockout mutations has been a powerful tool for identifying the function of some of these unknown genes. While it is beyond the scope of this project, it should be recognized that a consortium of international laboratories has formed to develop knockout populations of rice for the purpose of discovering gene function. This consortium will provide useful tools for the downstream analysis of genomic sequence information.


Intellectual Property Rights

Intellectual property rights issues will be raised because of the obvious commercial interest in the sequence for rice and other cereals. In the Human Genome Project, as well as other international sequencing efforts, withholding data for patent application is recognized as being incompatible with the policy of immediate release. Patent issues are regarded as being downstream of data generation and release. These issues that must be confronted but are probably beyond the scope of the Working Group and should be discussed at a meeting called for that specific purpose.


Outreach

To be successful, this large sequencing effort needs the broad support of scientists working on rice and other cereals who will be the potential end-users of the sequence information. They must believe that the project is worthwhile, well-organized and credible. There are a number of ways that this support might be engendered. Roles for the general community to influence general strategies and policies should be considered. Outside scientists can serve as peer reviewers of individual projects. Timely release of finished, annotated sequence blocks, as well as the availability of mapped BACs and YACs, increases end-user support. Periodic progress reports, similar to the RGP's RICE GENOME newsletter, and internet access to a useful database, will engender awareness and utility of the project. Interested members of the community can begin to influence the project by commenting and making suggestions on this document.


Last modified on November 14, 1997 by B. Burr.