JCSG Technologies

Our high throughput Structural Genomic Pipeline has to date delivered more that 1500 structures to the community. Targets are processed through an extensive combination of bioinformatics and biophysical analyses to efficiently characterize and optimize each target prior to selection for structure determination, and uses parallel processing methods at almost every step in the process. The pipeline can adapt to a wide range of targets from bacteria to human, including challenging targets, such as eukaryotic proteins, protein-protein and other macromolecular complexes. Processing such large numbers of targets and enormous amounts of associated data through the multiple stages of our experimental pipeline has resulted in development of innovative methods and tools at strategic stages in our gene-to-structure platform and led to functional characterization of countless targets. These resources, when feasible, have been converted to free-access, web-based tools and applications that include XtalPred, Structure Validation and Ligand Database servers and the TOPSAN annotation portal (www.topsan.org). We believe that these resources are of high value to the general scientific community and we welcome feedback and comments.

From the onset the JCSG has been committed to the development of new technologies and methodologies that facilitate high throughput structural biology and push frontiers of structural genomics. The areas of development include hardware, software, new experimental methods, and adaptation of existing technologies to advance genome research. In the hardware arena, our commitment is to the development of technologies that accelerate structure solution by increasing throughput rates at every stage of the production pipeline. Therefore, one major area of hardware development has been the implementation of robotics. In the software arena, we have developed enterprise resource software that track success, failures, and sample histories from target selection to PDB deposition, annotation and target management tools, and helper applications aimed at facilitating and automating multiple steps in the pipeline.

For more info, click on the individual components of the pipeline.

TARGET SELECTION [back to Index]

Genome Pool Strategy: Even closely homologous proteins often have different crystallization properties and propensities. This observation can be used to introduce an additional dimension into crystallization trials by simultaneous targeting multiple homologs in what we call a genome pool strategy. We show that this strategy works because protein physicochemical properties correlated with crystallization success have a surprisingly broad distribution within most protein families. There are also easy and difficult families where this distribution is tilted in one direction. This leads to uneven structural coverage of protein families, with more easy ones solved. Increasing the size of the genome pool can improve chances of solving the difficult ones. In contrast, our analysis does not indicate that any specific genomes are easy or difficult. Finally, we show that the group of proteins with known 3D structures is systematically different from the general pool of known proteins and we assess the structural consequences of these differences.

Publication: Lukasz Jaroszewski, Lukasz Slabinski, John Wooley, Ashley M. Deacon, Scott A. Lesley, Ian A. Wilson and Adam Godzik, "Genome Pool Strategy for Structural Coverage of Protein Families ", Structure 11, 1659-1667 (2008). Pubmed:19000818

XtalPred Server : XtalPred is a web server for prediction of protein crystallizability. The prediction is made by comparing several features of the protein with distributions of these features in TargetDB and combining the results into an overall probability of crystallization. XtalPred provides: (1) a detailed comparison of the protein's features to the corresponding distribution from TargetDB; (2) a summary of protein features and predictions that indicate problems that are likely to be encountered during protein crystallization; (3) prediction of ligands; and (4) (optional) lists of close homologs from complete microbial genomes that are more likely to crystallize.

WebSite: http://ffas.burnham.org/XtalPred-cgi/xtal.pl

Publication: Slabinski L, Jaroszewski L, Rychlewski L, Wilson IA, Lesley SA, & Godzik A. , "XtalPred: a web server for prediction of protein crystallizability ", Bioinformatics 23, 3403-3405 (2007) Pubmed: 17921170


The Polymerase Incomplete Primer Extension (PIPE) cloning: Successful protein expression, purification, and crystallization for challenging targets typically requires evaluation of a multitude of expression constructs. Often many iterations of truncations and point mutations are required to identify a suitable derivative for recombinant expression. Making and characterizing these variants is a significant barrier to success. We have developed a rapid and efficient cloning process and combined it with a protein microscreening approach to characterize protein suitability for structural studies. The Polymerase Incomplete Primer Extension (PIPE) cloning method was used to rapidly clone 448 protein targets and then to generate 2143 truncations from 96 targets with minimal effort. Proteins were expressed, purified, and characterized via a microscreening protocol, which incorporates protein quantification, liquid chromatography mass spectrometry and analytical size exclusion chromatography (AnSEC) to evaluate suitability of the protein products for X-ray crystallography. The results suggest that selecting expression constructs for crystal trials based primarily on expression solubility is insufficient. Instead, AnSEC scoring as a measure of protein polydispersity was found to be predictive of ultimate structure determination success and essential for identifying appropriate boundaries for truncation series. Overall structure determination success was increased by at least 38% by applying this combined PIPE cloning and microscreening approach to recalcitrant targets.

Publication:Heath E. Klock , Eric J. Koesema, Mark W. Knuth, Scott A. Lesley, "Combining the polymerase incomplete primer extension method for cloning and mutagenesis with microscreening to accelerate structural genomics efforts", Proteins: Structure, Function, and Bioinformatics, 71(2), 982 -994 (2008). Pubmed: 18004753

Microexpression System: Small-scale expression provides enhanced screening capability, as many more clones can be evaluated to identify targets, as well as truncations or mutations, which either fail to express, or express in the insoluble fraction. A low-cost, high-velocity incubating commercial shaker has been adapted for high-throughput E. coli expression screening to accurately predict large-scale protein behavior. Cultures (~750 µL) are grown in deep-well 96-well blocks to achieve optical densities (O.D.) up to 10-20, that enables evaluation of expression and solubility via small-scale purification by IMAC. Moreover, this screening strategy can be adapted for SeMet or 15N/13C-labeled expression. Of the soluble targets produced in the micro-expression device, 97% correlate with successful expression in large-scale fermentation. This device is suited for both nanocrystallization trials and NMR screening for protein folding.

Publication: Heath E. Klock , Eric J. Koesema, Mark W. Knuth & Scott A. Lesley, "Combining the polymerase incomplete primer extension method for cloning and mutagenesis with microscreening to accelerate structural genomics efforts ", Proteins: Structure, Function, and Bioinformatics 71, 982 -994 (2008). Pubmed:18004753

Cloning Robotics: A large number of expression clones must be generated within the pipeline to accommodate the number of targets, expression systems and variants for each gene targeted. Many options for creating such expression clones were evaluated, including recombinatorial (Gateway/Echo) and topoisomerase treated systems. To maximize flexibility and minimize cost, we chose to automate a conventional cloning approach. We developed a robotic platform, which incorporates liquid and plate handling, with thermocyclers and a plate reader, and demonstrated the capacity to provide up to 384 validated expression clones per week, which is sufficient to meet our pipeline needs. To date, over 2500 total expression clones have been generated with this system by a single operator.

Large-scale bacterial expression: Protein expression has primarily been performed in E. coli. To allow expression at a scale sufficientClick on image to load movie (MPG, 6MB)for crystallization trials, we developed a parallel fermentation system (GNFermentor), for parallel 96-culture high-density cell growth that produces 2-4 gof cell pellet. Pre-induction O.D. values vary only 5% between individual cultures, highlighting the importance of the tightly regulated expression system (arabinose) that we employ. To date, over 30,000 individual samples have been processed through this system demonstrating its robust nature.

Publication:Kreusch, A. and Lesley, S. A. , "High-Throughput Cloning, Expression, and Purification Technologies. ", Genomics, Proteomics, and Vaccines, ed. G. Grandi, Wiley Press, UK , 171-184 (2004).

Automated mammalian and insect cell expression : A system for moderate-scale expression and purification of proteins from mammalian and insect cells provides an effective and economic means for evaluating construct diversity. Expression of eukaryotic proteins is often benefited by the inclusion of mammalian and baculovirus expression platforms. These platforms are tedious, expensive and difficult to perform in parallel. The Protein Expression and Purification Platform (PEPP) developed at GNF provides a robust high-throughput vehicle for evaluating expression constructs for crystallization.

Automated affinity purification: Processing of the resulting cell pellets through affinity purification is performed with custom automation (GNFuge). Fermentation tubes are directly processed in the GNFuge, for the steps of lysis, removal of cell debris and affinity purification. The resulting affinity purified proteins can then be processed by secondary purification or can be advanced directly to crystallization screening.

Publication: Lesley, SA, "High-throughput proteomics: protein expression and purification in the postgenomic world.", Protein Expr. Purif. 22(2):159-64, 2001. Pubmed: 11437590

Secondary purification: Purification beyond affinity steps is achieved using standard commercial instrumentation, which has been configured for automated large-scale purification. By integrating a custom valve configuration and an air sensor with the Akta Purifyer systems (GE Healthcare), we can achieve automatic loading and processing of up to 12 samples, without the limitations on initial sample volume imposed by commercial autosamplers. With three such systems online, our demonstrated capacity for secondary purification is approximately 48-96 proteins per week at a 10-50mg scale.


Biophysical characterization of samples is a critical component of our pipeline process that provides guidance for target strategies, and metrics for evaluating the various pipeline components. However, performing such characterization on a large number of targets has serious implications on pipeline throughput. The JCSG has devoted significant effort towards developing HT approaches to protein characterization and the gathering and tracking of this information for thousands of samples. The volume of data is enormous and has emphasized the need for active target management to take advantage of such knowledge as it arises. These biophysical data are also of tremendous value to the scientific community and for collaborative functional studies.

Multiparametric Biophysical Protein Characterization: Biophysical parameters currently collected for each target are:
Parameter Methodology
Toxicity during expression Final optical density
Cofactor binding UV/Vis absorbance scan
Protein concentration Bradford
Protein purity SDS-PAGE
Isoelectric point IEF gel electrophoresis
Protein fingerprinting Tryptic Mass Spectrometry
Thermostability Differential Scanning Calorimetry
Polydispersity/Native Mw Analytic Size Exclusion Chromatography
Metal binding X-ray Absorption Fine-Structure Spectroscopy


Nano-drop crystallization and crystallization robotics: Nano-drop crystallization technologies were pioneered by members of the JCSG using custom robotics initially developed at Syrrx/GNF, and was the first center worldwide to apply such technology. developed . Despite many researchers being skeptical that nanoliter volumes would yield diffraction quality crystals, we have routinely utilized these technologies in PSI-1 to screen crystallization conditions and generate diffraction-quality crystals. Both custom and commercial instrumentation is currently in use for our crystallization trials. Currently, at GNF, our imaging is performed using two custom robotic platforms located in constant temperature 4C and 20C rooms with capacity for 1536 plates. Plates are assigned an imaging schedule and are automatically screened, typically at 7, 14 and 28 days. To date, over 3,000,000 images have been generated from these imagers. The TSRI facility utilizes a Veeco imager and plates are manually tracked for imaging. The new Robodesign platform to be installed at TSRI will have capacity for 4000 plates at up to 6 temperatures and will utilize a fully automated imaging schedule and image analysis software package.
A fully automated crystallization platform the CrystalMation crystallization robot from Rigaku Automation, is in use at The Scripps Research Institute (see below).

Publications:Santarsiero BD, Yegian DT, Lee CC, Spraggon G, Gu J, Scheibe D, Uber DC, Cornell EW, Nordmeyer RA, Kolbe WF, Jin J, Jones AL, Jaklevik JM, Schultz PG, Stevens RC (2002) "An approach to rapid protein crystallization using nanodroplets. "Journal of Applied Crystallography, 35, 278-281. Journal Link

Click on image to load movie (MPG, 13MB)...Click on image to load movie (MPG, 2.5MB)
Click on image to load movie (MPG, 14MB) ...

Rigaku Automation CrystalMation Plateform: We have extended our leadership role in crystallization methodology by aiding in development of a fully integrated, next-generation crystallization robot in collaboration with Rigaku. Our CrystalMation system is the largest, fully integrated, HT crystallization platform in the U.S. and covers all steps from custom screen making, automated crystallization trials, imaging and analysis, with high reliability and reproducibility. The system can set up 100 96-well crystallization plates every 8hrs with a total capacity of 4000 plates/month on the current imaging schedule. Many crystallization conditions are screened using only 120 µl of sample for 384 solution conditions at 2 temperatures. Analysis of crystallization results and crystal harvesting remains largely a manual task, yet we have screened >9,000,000 crystallization images and harvested >124,000 crystals.

CrystalMation™: Protein crystal growth automation process

CrystalTrak™ database application and automation control


To fulfill the demands of the JCSG HT structure determination pipeline, it was clear at the outset that an automated crystal screening capability would be a vital asset. The JCSG pipeline is currently producing in excess of 500 crystals per month for diffraction screening. X-ray screening forms a critical feedback loop, which is used by the CC to identify promising targets and crystallization conditions. Manual mounting and dismounting of crystal samples at the beam line is a labor-intensive task, which wastes significant beam time and is prone to human error. SDC has co-developed a completely automated crystal screening system in close collaboration with the core Structural Molecular Biology group at SSRL, which meets the needs of both JCSG and the wider structural biology community. The key features are:

Compact crystal cassette
: Secure crystal transport and storage is accomplished via a compact, cylindrical, aluminum crystal cassette, which holds 96 crystals. Crystals are mounted on standard Hampton Research sample pins. Two cassettes can fit inside a standard vapor shipping dewar and twenty cassettes can be held inside a Taylor-Wharton HC-35 storage dewar. JCSG crystals are shipped exclusively using these cassettes. This system has been very robust and reliable. Kits of cassettes with loading and handling tools have been fabricated and distributed to SSRL users.

WebSite: http://smb.slac.stanford.edu/public/facilities/hardware/cassette_kit/

Publication: Cohen AE, Ellis PJ, Miller MD, Deacon AM, Phizacherley RP. "An automated system to mount cryo-cooled protein crystals on a synchtrotron beam line, using compact sample cassettes and a small-scale robot. ",J Appl Crystallogr, 35: 720-726 (2002).


Stanford Auto-Mounter (SAM): The Stanford Auto-Mounter (SAM) has been been developed, which allows automated screening of crystals at the synchrotron. Individual crystals are mounted onto the beam line for screening using the SAM system. Three sample cassettes are held under liquid nitrogen in a dispensing dewar, which is located close to the goniometer, inside the experimental hutch. A commercial Epson ES553S 4-axis robot, outfitted with a pneumatically operated cryo-tong, removes samples from the cassette and places them on the goniometer. The SAM system also allows sorting of crystals from one cassette to another. Thus, the most promising crystals can be consolidated into a single cassette prior to data collection. The sorting facility is now in a prototype stage and will be developed into a full user system in the near future. SDC has fully integrated the Click on image to load movie (MPG, 35MB)SAM system with the existing macromolecular crystallography beam line environment by implementing a user-interface within the BLU-ICE data collection software. The system also communicates with the JCSG database via a "beam line report", which is an Excel spreadsheet describing the crystals in each shipment. By the end of 2008, more than 85% of SSRL PX experimenters were using the SAM system. hutch. A commercialEpson ES553S 4-axis robot, outfitted with a pneumatically operated cryo-tong, removes samples from the cassette and places them on the goniometer. The SAM system also allows sorting of crystals from one cassette to another. Thus, the most promising crystals can be consolidated into a single cassette prior to data collection. The sorting facility is now in a prototype stage and will be developed into a full user system in the near future. SDC has fully integrated the SAM system with the existing macromolecular crystallography beam line environment by implementing a user-interface within the BLU-ICE data collection software. The system also communicates with the JCSG database via a “beam line report”, which is an Excel spreadsheet describing the crystals in each shipment.

WebSite: http://smb.slac.stanford.edu/public/facilities/hardware/SAM/

Publication: Cohen AE, Ellis PJ, Miller MD, Deacon AM, Phizacherley RP. "An automated system to mount cryo-cooled protein crystals on a synchtrotron beam line, using compact sample cassettes and a small-scale robot. ",J Appl Crystallogr, 35: 720-726 (2002).

Sample visualization and loop alignment system: Reliable centering of the sample with the X-ray beam is an essential step for automatic screening and requires good sample illumination and imaging. A high-quality visualization system was developed by SDC on BL11-1 at SSRL and replicated on all other beam lines. The system is composed of a Navitar 12x lens system, with a large depth of field. The lens system is coupled to an Optronics CCD camera and images are digitized via an Axis 2400 www-based image server. A bright, diffuse backlight provides high contrast images for loop alignment. However, the long working distance creates shadows inside the loop, which sometimes make it difficult to visualize the actual crystal. In the future, we plan to upgrade the lighting system. SDC has developed a software protocol, which uses standard edge detection techniques to align the sample and its loop with the X-ray beam. Since a fairly large beam (0.25x0.25mm) is used for crystal screening, this approximate alignment of the actual crystal is adequate for automated screening. The entire alignment procedure takes ~30 seconds with >95% reliability. Each crystal is mounted and aligned with the X-ray beam. A visual JPEG image of the crystal and a corresponding diffraction image (typically 15 seconds exposure) are collected at two crystal orientations, 90° apart. A cassette of 96 crystals can be screened without human intervention in ~5 hours.


The majority of the JCSG data collection has been conducted on the macromolecular crystallography beam lines at SSRL. The SSRL storage ring, the Stanford Positron Electron Asymmetric Ring (SPEAR), was recently upgraded to 3rd generation synchrotron capabilities and now offers increased brightness and higher operating ring current. All protein crystallography beam lines have benefited from the upgrade and typical exposure times have been significantly reduced. During the SPEAR-3 upgrade from April 2003 to March 2004 and also during shorter SSRL maintenance shutdowns, JCSG data were collected at the Advanced Light Source (ALS) and the Advanced Photon Source (APS). A program proposal provided time at APS (distributed over: SBC-CAT, BIO-CARS and NE-CAT) and a Memorandum of Understanding provided regular access at ALS. During these shutdown periods, the SAM system was used with an X-ray microsource generator to pre-screen crystals before trips to remote beamlines.

Automated MAD data collection with BLU-ICE: JCSG has contributed to the ongoing development of the BLU-ICE data collection software at SSRL. In addition to the new crystal screening capabilities (described above), BLU-ICE now supports completely automated execution of MAD data collection. Suitable energies for the MAD experiment are derived automatically from a Kramers-Kronig analysis of the fluorescence scan. The energies are imported directly into the Data Collection Tab in BLU-ICE. All wavelength changes are conducted automatically and the X-ray beam intensity is optimized at each change. In addition, hardware upgrades on the wiggler side-station beam lines now support MAD experiments. The experimental table is mounted on a reproducible slide that can track the deflection of the X-ray beam at different energies. A dose mode exposure time normalizes the beam intensity across all wavelengths and data collection is paused automatically if the storage ring beam is lost..

WebSite: http://smb.slac.stanford.edu/public/facilities/software/blu-ice/

Publication: McPhillips, TM, McPhillips SE, Chiu HJ, Cohen AE, Deacon AM, Ellis PJ, Garman E, Gonzales A, Sauter NK, Phizackerley RP, Soltis SM, Kuhn P. "Blu-Ice and the Distributed Control System: software for data acquisition and instrument control at macromolecular crystallography beamline ", Reference. Pubmed:12409628

Remote data collection: With the SAM system in full operation, the complete diffraction experiment can be initiated remotely. Thus, JCSG can capitalize on remote-access developments which were mainly funded through an NIH-NCRR that supports the Structural Molecular Biology activity at SSRL. The only time a staff member is required at the beam line is to change one of the three crystal cassettes, or if manual hardware maintenance is required. Live video feeds from the beam line are now incorporated into BLU-ICE, which further helps diagnose problems remotely. As a result, it is now possible to run and monitor the beamline from a remote location, such as an office or at home. These features greatly reduce the personnel requirements for JCSG data collection experiments. These facilities are also available to the general PX users at SSRL and have become very popular, such that >75% of PX experiments are now conducted remotely.

WebSite: http://smb.slac.stanford.edu/public/facilities/hardware/cassette_kit/

Publication: Soltis SM, Cohen AE, Deacon A, Eriksson T, Gonzlez A, McPhillips S, Chui H, Dunten P, Hollenbeck M, Mathews I, Miller M, Moorhead P, Phizackerley RP, Smith C, Song J, van dem Bedem H, Ellis P, Kuhn P, McPhillips T, Sauter N, Sharp K, Tsyba I, Wolf G. "New paradigm for macromolecular crystallography experiments at SSRL: automated crystal screening and remote data collection." Acta Crystallogr D Biol Crystallogr, 64: 1210-1221 (2008). Pubmed:19018097


SDC has developed tools to automate the analysis of crystallographic data. The system includes an electronic notebook, which records all diffraction experiments, and Xsolve, a Linux-based parallel processing environment.

Xsolve: Xsolve can execute all crystallographic data processing and MAD structure determination steps. Xsolve also prepares a standard set of files for upload to the Structure Solution Tracking System (SSTS), which provides a direct interface to the JCSG database. Xsolve allows parallel processing of structure determination tasks using a variety of established crystallographic applications. The Xsolve system has a flexible and open architecture so that new versions of applications can readily be upgraded and newly emerging programs can easily be incorporated. In this way, SDC can quickly capitalize on developments made by the wider crystallographic community. Xsolve performs all processing steps including initial indexing of a diffraction image, integration, scaling, phase determination, phase improvement and initial model building. The system has been optimized to provide high quality results for direct upload to the JCSG central database.

Customized scripts: SDC has also developed several in-house scripts to prototype new programs and allow rapid data processing at various remote synchrotron sources. These scripts are made available to regular users at SSRL. One script provides automatic data reduction and structure solution via XDS and Solve, and another provides an easy interface to structure determination via SHELX and Solve.

Molecular Replacement pipeline: The JCSG has also developed a highly parallelized Molecular Replacement (MR) pipeline that facilitates all steps in MR structure solution, including homology detection, model preparation, MR searches and automated refinement and rebuilding. Processed diffraction data are fed into the MR system directly from Xsolve. Search models are based on sequence alignments generated using the profile-profile alignment method implemented in the FFAS03 system. In collaboration with the research groups at Burnham and UCSD, the JCSG team has used improved alignment and modeling tools and massive computer power to push MR beyond the traditional limits. In general, MR solutions are seldom attempted (and are even less often successful) against templates with less than 35% sequence identity. To date, the JCSG MR pipeline was successfully applied to over 26 cases with less than 35% sequence identity, 10 cases with less than 30% and several cases where sequence identity was close to 15%. Our analysis shows that fold recognition models have a significantly higher success rate, especially when the unknown structure and the search model share less than 35% sequence identity. Using MOLREP and EPMR, 3 out of 26 MR targets under 35% sequence identity could only be solved with models derived from fold recognition methods and 6 showed significantly better statistics and behavior in subsequent refinement.

Publication: Schwarzenbacher R, Godzik A, Grzechnik SK, Jaroszewski L (2004). "The importance of alignment accuracy for molecular replacement. ", Acta Crystallogr D Biol Crystallogr, 60: 1229-1236. Pubmed: 15213384

Publication: Schwarzenbacher R, Godzik A, Jaroszewski L. (2008) "The JCSG MR pipeline: optimized alignments, multiple models and parallel searches. ", Acta Crystallogr D Biol Crystallogr., 64(Pt 1): 133-140. Pubmed:18094477


As the JCSG structure solution rate has increased, a bottleneck has developed at the model building and refinement stages. A collaboration with Anastassis Perrakis and the ARP/wARP development team is improving the initial models built by Xsolve and internal methods development effort at SDC is addressing subsequent model completion. A network of JCSG scientists was established to perform structure refinement. In order to ensure uniform quality standards for all JCSG structures, a formal internal Quality Control (QC) step was introduced prior to structure deposition in the PDB. From the early structures submitted for QC analysis, a detailed set of refinement guidelines was developed, which has standardized the refinement protocol for all JCSG structures. All JCSG refinement is carried out with the latest version of Refmac. TLS parameters, a riding hydrogen model and NCS restraints are evaluated for impact on the R-free. Experimental phase restraints are always included when available. Whatcheck, ADIT and PDB deposition tools and Molprobity are used to validate the structure. Missing atoms and unknown ligands are treated in a uniform way. Residue numbering is standardized and PDB REMARK cards are generated. Finally, before PDB deposition, all other crystals and datasets from the same target are checked for any “added value,” such as a new crystal or dataset with improved resolution or a bound ligand. Through the implementation of these refinement guidelines, both the quality and the refinement time for JCSG structures have improved and the PDB deposition process has been streamlined. QC has become an integral part of the pipeline and is no longer simply a stage related to the preparation of files for deposition to the PDB. As a result of these extensive efforts, the average quality of the JCSG structures is significantly better than the average for both the PDB as a whole and for the PSI structural genomics centers.

Validation Suite: Prior to deposition in the Protein Data Bank, the quality of JCSG crystal structures is validated using the JCSG Quality Control Server. This server processes the coordinates and data through a variety of validation tools including AutoDepInputTool (Yang et al., 2004) MolProbity (Davis et al., 2004), WHATIF 5.0 (Vriend, 1990), RESOLVE (Terwilliger, 2003), MOLEMAN2 (Kleywegt, 2000)  as well as several in-house scripts, and summarizes the results.

WebSite: http://smb.slac.stanford.edu/jcsg/QC


The final stages of the JCSG pipeline involve deposition of the coordiantes and associated data into the PDB. Coordinates of structures that passed the QC are combined with database-derived information about the history of the targets and specific protocols used in structure determination and parsed to two mmCIF files to deposit the coordinates directly with the PDB. The first mmCIF file contains all data needed to generate the release version of the PDB coordinate file. The second file contains the structure factors, the unmerged reflection intensities for all datasets used for refinement and phasing, and the experimental phases and density modified experimental phases. The structure deposition process is largely automated and uses mmCIF-writers (command line scripts) to generate the two mmCIF files directly from data captured in the JCSG database. The process still requires some manual oversight, mostly for checking completeness and internal consistency of the annotations. All data required to complete the PDB deposition are captured in the JCSG database, and software is currently under development to complete the automation.

WebSite: Link

Publication: author "Title of Paper ", Reference. Pubmed:xxx


In collaboration with UCSD, Burnham and ANL bioinformatics groups, JCSG has developed a unified protein structure and sequence analysis system that includes predictions about the function of proteins solved by the experimental pipeline. Elements of the system include structure similarity analysis performed by DALI, CE and FATCAT structure alignment programs, distant homology analysis performed by the FFAS profile-profile alignment program, and genome context and pathway analysis performed by the SEED system. These annotations are manually analyzed and subjected to internal discussions using a unique system of interactive annotation pages developed at JCSG. Through application of this system, functional annotations of over half of the proteins solved by JCSG, including several previously unannotated “hypothetical proteins,” have been established with high reliability and have now been entered into public databases. In addition, a functional annotation page has been created for each target, which instantly allows JCSG scientists to curate and update biological information generated during the structure determination process.

Protein Sequence Comparative Analysis System (PSCA)
: Access to target annotations can be accomplished through the PSCA system. Annotations from public databases, links, and preprocessed target information are available through a tabbed user interface. Data such as fold similarity, sequence similarity, domain organization or physicochemical properties are periodically precalculated, which highly speeds up access to a large collection of data for each target.

Wiki-based Collaborative Annotation: TOPSAN offers a combination of automatically generated, as well as comprehensive, expert-curated annotations, provided by JCSG personnel and members from the research community. TOPSAN, is an experiment in open, collaborative research on proteins whose structures are being determined by Protein Structure Initiative Centers. While built on a wiki platform, TOPSAN differs significantly from the conventional application of wiki technology. Instead of performing an encyclopedic distillation of established information, TOPSAN focuses on creating new knowledge by enabling instant collaborations among distributed participants. The immediate goal of TOPSAN is to enhance the impact of the vast numbers of structures being determined in structural genomics. But we also anticipate that TOPSAN as a model could facilitate the development of novel forms of continuous scientific communication and knowledge creation.

WebSite: http://www.topsan.org

[back to Index]

Public tracking system and website: The central JCSG database provides high-level tracking of targets and production metrics. Integral to this database is an extensive body of bioinformatics data on individual targets. The public tracking system provides access to the data contained in the JCSG database and allows the extraction and filtering of specific subsets according to user-defined criteria. The JCSG website is the main public outreach and data dissemination tool. The website also plays a crucial role as an internal data dissemination and communication tools between the JCSG cores as well as being one of the entry points for experimental data deposition in the JCSG database. Some of the innovative visualization tools available via de JCSG website include a graphical view of the complete history of every target in the JCSG pipeline.

Customized tracking lists: The public tracking interface at www.jcsg.org and the XML target list deposited weekly to TargetDB are generated automatically from the database; however, they highlight only a small fraction of the total data collected by JCSG. Users can register to obtain e-mail alerts on individual targets and create personalized views of the JCSG database that focus on groups of proteins of interest.

Structure Notes: JCSG structures are shared with the scientific community not only through deposition in the PDB, but also through publication of "structure notes." Structure notes are short papers describing the annotation, biology, structure and functional implications of each protein. The process of collecting all relevant data, from all stages of the JCSG pipeline has been streamlined through the central JCSG database, which includes information on the sequence, annotation, cloning, purification, crystallization, data collection, structure solution, tracing, refinement and structural evaluation. The structure note automatically captures any functional information in the JCSG annotation system (see above). The paper introduction, for example, includes annotation information, with a brief biological background taken and curated from the PFAM, Interpro, SwissProt, BRENDA, and SEED databases. Methodological and experimental data, as well as all crystallographic statistics, are automatically harvested from the JCSG database and assembled into purification, crystallization, structure solution and refinement paragraphs. The structure description and the preparation of figures are done manually using PYMOL. Structures are analyzed, compared and evaluated for biological significance using a plethora of structure analysis tools including structural homology searches (DALI, CE, FATCAT), and extensive literature searches.

Downloadable datasets: JCSG has created a unique repository of X-ray crystallographic datasets for the structures it has solved and deposited in PDB. This archive contains the experimental and analysis data from data collection, data reduction, phasing, density modification, model building and refinement. These datasets are availble as test data to the crystallographic methods development community.


A dedicated database was developed by the JCSG programming team. The computational development was carried out in parallel with the development of the physical production pipeline. Currently, the JCSG database connects all experimental elements in the pipeline. It interactively analyzes data at each stage and provides up to date information to facilitate the optimal course of action for each individual target.

WebSite: http://www.jcsg.org/datasets-info.shtml

Tracking Database: The central JCSG tracking database was developed from scratch in Oracle and contains >140 tables that describe 32 production stages and tracks >530 parameters. The interface, written mostly in Perl, include 1,800 custom scripts, 100 user-interfaces, and 30 different reports that are preparated daily in both XML and Excel formats and altogether comprise about 360,000 lines of code.

WebSite: http://www.jcsg.org

Publication: Godzik A, Canaves J, Grzechnik S, Jaroszewski L, Morse A, Ouyang J, Wang X, West B, Wooley J. "Challenges of structural genomics: bioinformatics. ", Biosilico 1: 36-41 (2003).

Laboratory Information Management System: The JCSG database contains a Laboratory Information Management Systems capable of tracking every step from target activation to structure solution, refinement and deposition. This system has submenus specifically taylored to the needs of each core. The LIMS systems collects information, tracks materials, provides data entry and visualization interfaces, and functions as central hub to directs the flow of information within JCSG.


The ability to mine data from a consistent process is invaluable for optimizing our pipeline. Since our targets are processed using similar methods and materials, often in parallel, more insightful comparisons can be made than from extracting equivalent data from the literature. Furthermore, the large number of targets processed, as well as their diverse nature, makes identification of general principles more valid.

Analysis of PCR amplification success rates: The feedback from analysis of success rates was used to improve the primer generation system. As a result, a scoring function that selects primers with optimal GC clamps within the specified melting temperature and length range was added to the system. In its present form, the optimized system is capable of generating primer sets with success rates as high as 98%.

Analysis of crystallization screen: An analysis of over 340,000 individual crystallization trials has led to the creation of a new minimal coarse screen (GNF96), which is highly effective in identifying targets which crystallize easily and providing leads for optimization.The realization that a significant number of coarse screen crystallization conditions never yielded any crystals, whereas in other cases proteins crystallized under many different conditions, led to the development of a minimal crystallization screen. Our large number of crystallization trials (>500,000) and our consistent processing approach allowed us to analyze and optimize our crystallization strategy. Redundancy in the commercial conditions, particularly in the high molecular weight PEGs, skews the statistics on relative efficacy of different crystallization conditions. In review of our Tier 1 screening using the 480 available screening conditions, we defined a small subset of 67 conditions which optimally samples crystallization space and would have encompassed 84% of the proteins which ultimately crystallized. This subset was expanded slightly to 96 conditions (GNF96) and forms our basic screen to test whether a particular protein construct will readily crystallize. Results to date from 340,000 individual crystallization trials show that the minimal coarse screen (GNF96) is highly effective in identifying targets which readily crystallize and in providing crystal leads for fine screen optimization.

WebSite: JCSG Screens are available form Qiagen: JCSG+ Suite and JCSGCore Suites

Publication:Lesley SA, Wilson IA.(2005) ,"Protein production and crystallization at the joint center for structural genomics. ", J Struct Funct Genomics. 6 (2-3), 71-9 . Pubmed: 16211502

Target Selection Protein Production Protein Purification CrystalMation: Automated Crystalization Crystal Imaging Difraction Screening XSolve: Dataprocesing & Structure Solution and Tracing Refinement, QC & Deposition Publication TOPSAN JCSG.org PSI KB & MR Tracking Database
Contact Webmaster JCSG Menu