Protein modeling workshop at UC Riverside

Sequence and structure databases – exercises

Contact: Lukasz Jaroszewski lukasz@sdsc.edu

(Time consuming steps are shown in red)

o       Exercise 1 – simple function prediction:

§         Find the sequence of conserved hypothetical protein NP_229443 from T. maritima in NCBI database http://www.ncbi.nlm.nih.gov/.

§         Display it in Fasta format, copy the sequence.

§         Submit it to Blast at NCBI.

§         Submit it to 2 iterations of Psi-blast at NCBI.

§         Check homoserine dehydrogenase description in Enzyme database http://ca.expasy.org/enzyme/ (use its EC code).

§         Do we have two different predictions? May we guess function of NP_229443 now?

o       Exercise 2 – distant homology prediction:

§         Find the sequence of CED-4 protein CED4_CAEEL from C. elegans in NCBI database http://www.ncbi.nlm.nih.gov/.

§         Display it in Fasta format, copy the sequence.

§         Submit it to Blast on NR - CDD recognizes known domains in CED-4 sequence.

§         Click on CARD domain and then click on NB-ARC domain.

§         Click on CDART summary link to see other proteins containing NB-ARC domain.

§         CARD domain structure is solved, but we will try to predict structure of NB‑ARC domain.

§         Get subsequence of NB‑ARC domain from CED-4 at NCBI.

§         Submit it to Psi-blast at NCBI.

§         Submit it to HHMER search at Pfam against Pfam A database http://pfam.wustl.edu/.

§         Submit it to Superfamily search against SCOP 1.59 database http://scop.mrc-lmb.cam.ac.uk/scop/.

o       Exercise 3 – ambiguity of sequence alignments:

§         Find the sequence of human Apaf-1 protein in NCBI http://www.ncbi.nlm.nih.gov/.

§         Display it in Fasta format, select and copy CARD domain subsequence.

§         Find sequence of CED-4 protein CED4_CAEEL from C. elegans.

§         Display it in Fasta format, copy the sequence.

§         Align both sequences with Blast2Seq http://www.ncbi.nlm.nih.gov/BLAST/ and needle algorithm from Emboss package http://csc-fserve.hh.med.ic.ac.uk/emboss.html and Align server at CGH http://xylian.igh.cnrs.fr/bin/align-guess.cgi.

§         Compare results.

o       Exercise 4 – structural comparison:

§         Find the structure of uronate isomerase from T. maritima in PDB database http://www.rcsb.org/pdb/.

§         Download its structure and get its chain A with (using WordPad).

§         Get its sequence by clicking Sequence details and then FASTA.

§         Submit it to Blast at NCBI.

§         Dowload structure of phosphotriesterase 1PSC from PDB http://www.rcsb.org/pdb/ and get its chain A (using WordPad), examine structures with Chime.

§         Align 1PSCA with uronate isomerase from thermotoga maritima using CE http://cl.sdsc.edu/ce.html.

§         Save structural alignment in PDB format and view it using Browser (Chime) or other PDB viewer.

§         Submit uronate isomerase from T. maritima to structure similarity search in Dali http://www.ebi.ac.uk/services/index.html

o       Exercise 5 – simple prediction of residues crucial for activity:

§         Find 1VHR structure in Structure database at NCBI http://www.ncbi.nlm.nih.gov/.

§         Click on dual-specificity phosphatase catalytic domain.

§         View 3D structure using All atoms option.

§         Examine most conserved tyrosine residues.

§         Which one seems to be the most probable phosphorylation target?

o       Exercise 6 – sequence based prediction of interaction regions:

§         Find Barstar-ribonuclease complex (1BRS) in NCBI Structure database http://www.ncbi.nlm.nih.gov/.

§         Open Barstar and Ribonuclease domains in two separate Cn3D windows (select All atoms).

§         Examine sequence conservation in both structures by viewing them in Tubes and Space Fill rendering styles.

§         May we guess what side of Barstar interacts with Ribonuclease. May we guess the approximate arrangement of the complex ?

§         View the complex by selecting Show everything in Barstar Window.

o       Exercise 7 – prediction of coiled-coils regions:

§         Find Kinesin-like protein C- [gi:1170621] from A. Thaliana in NCBI Protein database http://www.ncbi.nlm.nih.gov/.

§         Submit it to prediction of coiled-coils regions on Coils server from Expasy center http://www.ch.embnet.org/

§         Compare results with annotation of this protein in NCBI.

o        Exercise 8 – prediction of trans-membrane regions:

§         Find Potential calcium-transporting ATPase 10 from A. Thaliana [gi: 12643856] in NCBI Protein database http://www.ncbi.nlm.nih.gov/.

§         Display it in Fasta format, copy the sequence.

§         Submit it to prediction of transmembrane regions on TMHMM server at CBS http://www.cbs.dtu.dk/services/.

§         Compare results with annotation in the description of this protein in NCBI.

o       Exercise 9 – prediction of sub-cellular location:

§         Find Tocopherol cyclase from A. Thaliana [gi:24212569] in NCBI Protein database http://www.ncbi.nlm.nih.gov/.

§         Display it in Fasta format, copy the sequence.

§         Submit it to prediction of subcellular location on TargetP server at CBS http://www.cbs.dtu.dk/services/.

§         Compare results with annotation in the description of this protein in NCBI.

o       Exercise 10 – protein-protein docking:

§         Download example input files for GRAMM http://www.jcsg.org/lukasz/ucr/data/1brs.zip and unzip them.

§         Create two Windows shortcuts to gram executable with different command line parameters – one with “scan” and one with “coord”.

§         Make sure, that GRAMMDAT environment variable points to directory containing gramm program and datafiles.

§         Run gramm with “scan” command line parameter.

§         Run gramm with “coord” command line parameter.

§         Visually compare final gramm results (“R-L_1-5.pdb”) with real structure of the complex (1BRS.pdb) with Chime. Note, that original PDB file contains trimer of complexes.

§         NCBI.

o       Exercise 11 – finding unknown genes in human genome:

§         Find the sequence of human CD20 receptor [gi:23110989] in NCBI database http://www.ncbi.nlm.nih.gov/.

§         Display it in Fasta format, copy the sequence.

§         Click on Human Genome Resources at NCBI and then on Blast.

§         Submit it to Blast against human genome.

§         Cut and paste the sequence of non-trivial hit (around 70% sequence identity).

§         Check found fragment against known proteins (NR database).

§         Is it known protein?

o       Exercise 12 – simple homology modeling:

§         Download example input files from http://www.jcsg.org/lukasz/ucr/data/d1de3a_.zip and unzip them. They contain: template structure of d1rgea_ domain from SCOP database, and alignment of d1de3a_ sequence with d1rgea_ sequence. As required by Whatif, the alignment is saved in two separate files d1de3a_.txt and d1rgea_.txt.

§         Open Whatif webserver page and submit modeling request http://www.cmbi.kun.nl/gv/servers/WIWWWI/.

§         Compare the model with a real structure of d1de3a_ (download it from SCOP database)
http://scop.mrc-lmb.cam.ac.uk/scop/.

L