JCSG Target Selection Strategy in PSI-2

JCSG is a large-scale center within the Protein Structure Initiative (PSI) Network:Target selection strategy at JCSG is aligned with the PSI main scientific mission of making the three-dimensional atomic-level structures of most proteins easily obtainable from knowledge of their corresponding DNA sequences. This mission is divided into a primary goal of determining structures of representatives of large protein families and a secondary goal of solving multiple representatives from specific families of significant, biomedical importance. As the PSI main scientific goal can be achieved only by synergistic development of high-throughput experimental structure determination methods and theoretical protein structure prediction algorithms, an important goal of target selection is to provide examples for the development and verification of the latter. Target selection at JCSG is closely coordinated with other PSI centers through the Target Selection Committee being part of the PSI-BIG4.

Understanding the Central Machinery of Life:Within these overall goals, JCSG focuses on determining structures of proteins from families with broad phylogenetic distribution, especially proteins conserved between prokaryotic and eukaryotic organisms. Such proteins perform most fundamental functions in living organisms and mutations or deletions of proteins from this group are usually lethal or lead to serious diseases. Building a full molecular catalog of structures from central machinery of life would significantly add to our understanding of life and the molecular mechanisms of fundamental biological processes. In addition, evolution of structures and functions of proteins performing such functions would allow us to understand mechanisms of change, adaptation and divergence in living organisms.

Aiming at the full structural coverage of a simple model organism:In the first phase of PSI, JCSG made significant progress in genome-wide structural coverage of the hyperthermophilic bacterium Thermotoga maritima. Because of efforts of JCSG and other structural genomics centers, T. maritima is now one of the genomes with the most complete structural coverage. Continuously updated report on the structural coverage of T. maritima is available on-line at http://ffas.burnham.org/ffas-cgi/cgi/tm_cov.pl.


We plan to continue our T. maritima effort and in collaboration with genome sequencing centers and individual research groups interested in this organism to complete the structural coverage by targeting homologs from other Thermotoga species.

JCSG target genomes: To fully optimize the power of high-throughput structure determination technology JCSG uses a strategy of targeting homologues from alternative model organisms to achieve structure coverage of targeted protein families. A constantly growing list of target genomes, currently includes over 100 Bacteria and Archea. The list of JCSG target genomes is available here.

Primary goal of PSI ? structural coverage of large protein families:The bioinformatics groups of the PSI centers, in collaboration with broad community of researchers from sequence and structure analysis fields, are working on the development of the structure-centric definition of a protein family. The first, community wide meeting on this topic would take place in Bethesda,MD on June 26/27 2006 http://www.nigms.nih.gov/News/Meetings/PSI-TargetSelection2006.htm. In the meantime, 1269 PFAM families without structural coverage have been selected as PSI structure determination targets and divided between the four large-scale PSI centers. JCSG was assigned 271 families from this group and currently solved 22 of them. An additional group of 397 new families have been identified by bioinformatics groups of the PSI centers and will be divided between the centers in May 2006. Many of the families from this group have homologs in both prokaryotes and eukaryotes and/or in T. maritima , thus also fit specific research goals of JCSG.

Secondary goal of PSI ? fine-grained structural coverage of specific protein families:Multiple proteins from selected protein families are targeted to provide more detailed information about the structural divergence within the family. The main reason for this is to gain more information about biomedically important protein families, but also to provide material for the improvement of modeling methods. JCSG aims at spending approximately 30% of its efforts on the fine grained structural coverage goals.


JCSG target selection strategy ? optimizing for success:Protein Structure Initiative revolutionized many aspects of structural biology research, among them access to data on structure determination, including information on failed attempts. This in turn allowed large scale data mining and learning to identify protein physicochemical features correlated with success in structure determination. Our analysis suggests that in most protein families only a small percentage of proteins could be successfully crystallized without extensive sequence modification. In selecting individual protein targets for structure determination, JCSG attempts to identify protein most likely to yield well diffracting crystals. Distribution of such proteins is relatively broad between different organisms and while some organisms have much higher percentage of crystallizable proteins, a large number of target genomes are needed to assure optimal choice of targets from every interesting protein family (see JCSG target genomes).

Target feasibility categories based on the analysis of TargetDB


Distribution of target feasibility categories in JCSG genome pool

Development of automated HT technologies for all steps, from gene to structure:
In the first phase of PSI, JCSG team developed a robust high-throughput pipeline for target selection, expression, purification, crystallization and structure determination through the development and application of a wide range of new technologies. The success of the JCSG pipeline is based on the development of high-throughput approaches for every experimental and computational step in the structural genomics process. This pipeline is now being improved and expanded.

JCSG production pipeline

Strategy:The T. maritima full genome analysis has enabled sufficient flow through the pipeline to facilitate its assembly and testing. This high volume of targets through the pipeline has not only helped the development of automation of the individual process stages but also global pipeline processes such as target tracking, process optimization, and information management.


Contact Webmaster JCSG Menu