Protein modeling workshop at
UC Riverside
Sequence and structure databases exercises
Contact:
Lukasz Jaroszewski lukasz@sdsc.edu
(Time
consuming steps are shown in red)
o Exercise 1
simple function prediction:
§
Find the sequence of conserved hypothetical protein NP_229443 from
T. maritima in NCBI database http://www.ncbi.nlm.nih.gov/.
§
Display it in Fasta format, copy the sequence.
§
Submit it to Blast at NCBI.
§
Submit it to 2 iterations of Psi-blast at NCBI.
§
Check homoserine dehydrogenase description in Enzyme database http://ca.expasy.org/enzyme/ (use its
EC code).
§
Do we have two different predictions? May we guess function of NP_229443
now?
o
Exercise 2 distant homology prediction:
§
Find the sequence of CED-4 protein CED4_CAEEL from C. elegans in NCBI
database http://www.ncbi.nlm.nih.gov/.
§
Display it in Fasta format, copy the sequence.
§
Submit it to Blast on NR - CDD recognizes known domains in CED-4
sequence.
§
Click on CARD domain and then click on NB-ARC domain.
§
Click on CDART summary link to see other proteins containing NB-ARC
domain.
§
CARD
domain structure is solved, but we will try to predict structure of NB‑ARC
domain.
§
Get subsequence of NB‑ARC domain from CED-4
at NCBI.
§
Submit it to Psi-blast at NCBI.
§
Submit it to HHMER search at Pfam against Pfam A database http://pfam.wustl.edu/.
§
Submit it to Superfamily search against SCOP 1.59 database http://scop.mrc-lmb.cam.ac.uk/scop/.
o Exercise 3
ambiguity of sequence alignments:
§
Find the sequence of human Apaf-1 protein in NCBI http://www.ncbi.nlm.nih.gov/.
§
Display it in Fasta format, select and copy CARD domain
subsequence.
§
Find sequence of CED-4 protein CED4_CAEEL from C. elegans.
§
Display it in Fasta format, copy the sequence.
§
Align both sequences with Blast2Seq http://www.ncbi.nlm.nih.gov/BLAST/
and needle algorithm
from Emboss package http://csc-fserve.hh.med.ic.ac.uk/emboss.html
and Align server at CGH http://xylian.igh.cnrs.fr/bin/align-guess.cgi.
§
Compare results.
o Exercise 4
structural comparison:
§
Find the structure of uronate isomerase from T. maritima in
PDB database http://www.rcsb.org/pdb/.
§
Download its structure and get its chain A with (using WordPad).
§
Get its sequence by clicking Sequence details and then FASTA.
§
Submit it to Blast at NCBI.
§
Dowload structure of phosphotriesterase 1PSC from PDB http://www.rcsb.org/pdb/ and get its chain
A (using WordPad), examine structures with Chime.
§
Align 1PSCA with uronate isomerase from thermotoga maritima using
CE http://cl.sdsc.edu/ce.html.
§
Save structural alignment in PDB format and view it using Browser (Chime) or other
PDB viewer.
§
Submit uronate isomerase from T. maritima to structure
similarity search in Dali http://www.ebi.ac.uk/services/index.html
o Exercise 5
simple prediction of residues crucial for activity:
§
Find 1VHR structure in Structure database at NCBI http://www.ncbi.nlm.nih.gov/.
§
Click on dual-specificity phosphatase catalytic domain.
§
View 3D structure using All atoms option.
§
Examine most conserved tyrosine residues.
§
Which one seems to be the most probable phosphorylation target?
o Exercise 6
sequence based prediction of interaction regions:
§
Find Barstar-ribonuclease complex (1BRS) in NCBI Structure database http://www.ncbi.nlm.nih.gov/.
§
Open Barstar and Ribonuclease domains in two separate Cn3D
windows (select All atoms).
§
Examine sequence conservation in both structures by viewing them in Tubes and Space
Fill
rendering styles.
§
May we guess what side of Barstar interacts with Ribonuclease.
May we guess the approximate arrangement of the complex ?
§
View the complex by selecting Show everything in Barstar Window.
o Exercise 7
prediction of coiled-coils regions:
§
Find Kinesin-like protein C- [gi:1170621] from A. Thaliana
in NCBI Protein database http://www.ncbi.nlm.nih.gov/.
§
Submit it to prediction of coiled-coils regions on Coils server from Expasy center http://www.ch.embnet.org/
§
Compare results with annotation of this protein in NCBI.
o Exercise 8 prediction of trans-membrane
regions:
§
Find Potential calcium-transporting ATPase 10 from A. Thaliana [gi:
12643856] in NCBI Protein database http://www.ncbi.nlm.nih.gov/.
§
Display it in Fasta format, copy the sequence.
§
Submit it to prediction of transmembrane regions on TMHMM server at CBS http://www.cbs.dtu.dk/services/.
§
Compare results with annotation in the description of this protein in
NCBI.
o Exercise 9
prediction of sub-cellular location:
§
Find Tocopherol cyclase from A. Thaliana [gi:24212569] in NCBI
Protein database http://www.ncbi.nlm.nih.gov/.
§
Display it in Fasta format, copy the sequence.
§
Submit it to prediction of subcellular location on TargetP server at CBS http://www.cbs.dtu.dk/services/.
§
Compare results with annotation in the description of this protein in
NCBI.
o Exercise 10
protein-protein docking:
§
Download example input files for GRAMM http://www.jcsg.org/lukasz/ucr/data/1brs.zip
and unzip them.
§
Create two Windows shortcuts to gram executable with different command
line parameters one with scan and one with coord.
§
Make sure, that GRAMMDAT environment variable points to directory
containing gramm program and datafiles.
§
Run gramm with scan command line parameter.
§
Run gramm with coord command line parameter.
§
Visually compare final gramm results (R-L_1-5.pdb) with real structure
of the complex (1BRS.pdb) with Chime. Note, that original PDB file contains
trimer of complexes.
§
NCBI.
o Exercise 11
finding unknown genes in human genome:
§
Find the sequence of human CD20 receptor [gi:23110989] in NCBI
database http://www.ncbi.nlm.nih.gov/.
§
Display it in Fasta format, copy the sequence.
§
Click on Human Genome Resources at NCBI and then on Blast.
§
Submit it to Blast against human
genome.
§
Cut and paste the sequence of non-trivial hit (around 70% sequence
identity).
§
Check found fragment against known proteins (NR database).
§
Is it known protein?
o Exercise 12
simple homology modeling:
§
Download example input files from http://www.jcsg.org/lukasz/ucr/data/d1de3a_.zip
and unzip them. They contain: template structure of d1rgea_ domain from SCOP
database, and alignment of d1de3a_ sequence with d1rgea_ sequence. As required
by Whatif, the alignment is saved in two separate files d1de3a_.txt and
d1rgea_.txt.
§
Open Whatif webserver page and submit modeling request http://www.cmbi.kun.nl/gv/servers/WIWWWI/.
§
Compare the model with a real structure of d1de3a_ (download it from SCOP
database)
http://scop.mrc-lmb.cam.ac.uk/scop/.