A good explanation of structural genomics and the Joint Center for Structural Genomics's role can be best seen by an article by by Alisa Zapp Machalek, of NIGMS. The full article can be read here and the following is just an excerpt of the entire article.
Beyond the unfolding of complete genetic sequences lies the challenge of identifying and deciphering all the proteins that make up living organisms. Structural genomics—a new field catapulted into feasibility by the success of gene-sequencing projects and advances in the tools of structural biology—approaches that task through the large-scale determination of three-dimensional protein structures.
A protein’s genetic sequence can provide clues about its function, but a protein’s structure can better illuminate its biological action and its role in health and disease. A solved, high-resolution structure maps all the protein’s atoms, exposes surface topology and inner architecture, reveals electrochemical properties, and presents a testing ground for possible molecular partners. It paves the way for advances in structure-based drug design and the development of new medical devices and materials.
Determining high-resolution protein structures is often difficult and time-consuming, however. The essential tools of structural biology—X-ray crystallography and nuclear magnetic resonance (NMR) spectroscopy—each have their drawbacks. The former requires crystallization of the proteins, a laborious task, and the latter, though it uses proteins in solution, is usually slower and is limited to solving the structures of small and medium-sized molecules.
Structural genomics focuses on cranking out, at industrial speed, thousands of carefully selected structures from which most others can be predicted computationally with a reasonable degree of accuracy.
This approach relies on a belief in nature’s economy—that the countless different proteins in nature fold into a limited number of shapes and that all natural protein structures are a subset or combination of these shapes.
The key to structural genomics is to group proteins into families of similar structures based on their sequences. Then, based on the known structure of at least one protein in a family and using a computational technique called homology modeling, a good guess can be made about the shapes of other proteins in the family. Estimates of the number of protein structure families range from 30,000 to 50,000—orders of magnitude smaller than the total number of proteins in nature.
Thinking Globally, Acting Locally
Currently, there is funding for structural genomics projects in the U.S., the European Union, Japan, China, Canada, and Israel. (In early April, representatives from four continents gathered in Virginia to discuss goals, progress, and policy issues; see "International Airing.")
Pharmaceutical companies and biotech start-ups are also committing to structural genomics, primarily to aid drug discovery. The publicly funded U.S. effort is spearheaded by NIGMS, which last September launched the Protein Structure Initiative and will spend $150 million over the next five years on seven structural genomics pilot research centers, including one co-funded by NIAID (see "The First Seven."). NIGMS expects to fund a few additional centers this September.
These pilot centers will develop new techniques to streamline and accelerate every step in structural genomics, from choosing which protein structures to solve to cloning and purifying the proteins, determining the structures, and depositing the data into the Protein Data Bank (PDB), an online database of macromolecular structures, maintained by the Research Collaboratory for Structural Bioinformatics.*
In five years, each of the centers will ramp up to a production level of 100 to 200 structures annually at a significantly reduced cost per structure. Using traditional techniques, it takes weeks to months—and an average of more than $100,000—to solve the structure of a single globular, soluble protein. More recalcitrant proteins, such as membrane proteins, are even more challenging.
One long-term goal of the NIGMS project is to develop a public library of nature’s protein shapes that integrates sequence, structural, and functional information. This library should enable researchers to use genetic sequences to predict the approximate structures—and possibly the function—of any protein.
To build this public resource, NIGMS is enlisting its pilot centers to determine the structures of one or two representative proteins from each of thousands of different structural families. Ten thousand unique protein structures should be solved over 10 years, which includes the current five-year scale-up phase, then five more years at full speed.
Currently, of the 15,000 structures that have been deposited in the PDB, less than 4,000 are of unique proteins, defined as those whose sequences are less than 90 percent identical. And the solved PDB structures represent only about 1,500 families. By determining 10,000 protein structures from almost as many families, the Protein Structure Initiative would more than triple the number of unique structures available and would provide more thorough coverage of structural families.
One catch at this early stage is that there are many different ways to group proteins into families. The five-year pilot period should provide time to determine whether any particular method is better than the others.
The project also seeks to identify new folds. Proteins with the same fold have similar overall shapes but no detectable sequence similarity. Such proteins have the same types of structural components connected in the same order. Studying folds could reveal the physical and chemical principles that determine how proteins form their three-dimensional structures.
Scientists estimate there are only a few thousand folds–considerably fewer than the number of structure families—and only 700 of these are represented in the PDB.
For the full article, please click here.
|
Back to the List of Important Definitions
|