Table of Contents
Goals
- Predict specificity of peptide recognition domain from the primary amino acid sequence.
- Analyze PDZ, WW and then SH3 domains
Strategy
Status
- [wiki:/Log Status Log]
Team
- Shirley Hui
- Gary Bader
Tools/Resources
Domains
- [wiki:/PDZ PDZ Domain]
Databases
[http://www.ensembl.org/ Ensembl]
- Software system which produces and maintains automatic annotation on selected eukaryotic genomes.
[http://www.ebi.ac.uk/interpro/ InterPro]
- Database of protein families, domains and functional sites in which identifiable features found in known proteins can be applied to unknown protein sequences.
[http://www.biomart.org/ BioMart]
- Query-oriented data management system that simplifies the task of creation and maintenance of advanced query interfaces backed by a relational database. It is particularly suited for providing the 'data mining' like searches of complex descriptive (e.g. biological) data.
Sequence Alignment
Multiple
Hierarhical Methods
[http://www.compbio.dundee.ac.uk/Software/Amps/amps.html/ AMPS] 1990
- Calculates Z-scores through pairwise sequences comparison with randomization
- Generates alignments without having to generate trees
[http://www.ebi.ac.uk/clustalw/ ClustalW] 1997
- Uses a series of different pair-score matrices
- Biases location of gaps based on secondary structure mask
- Allows for realigning to refine the alignment
- Can infer phylogeny
- Problems:
- Time required to complete first all against all comparison to create guide tree
[http://www.drive5.com/muscle/ MUSCLE] 2004
- MUltiple Sequence Comparison by Log-Expectation
- Uses a quick hashing comparison based on identical matches
[http://www.biophys.kyoto-u.ac.jp/~katoh/programs/align/mafft/ MAFFT] 2005
- Calculates guide tree faster by using fast Fourier transform method on AA properites to identify regions of similarity
- Uses these regions to guide dynamic programming alignment of the sequences
Non Hierarchical Methods
[http://www.ncbi.nlm.nih.gov/BLAST/ PSI-BLAST] 1997
- Searches a database with a single sequence
- High scoring sequences are built into a multiple alignment which is used to derive a search profile for subsequent search of the database
- Repeat until no new sequences are added to the profile or a specified number of iterations have been performed
[http://tcoffee.vital-it.ch/cgi-bin/Tcoffee/tcoffee_cgi/index.cgi T-Coffee] 2000
- Builds a library of pairwise alignments for the sequences of interest
- Uses library to inform hierarchical method to find a multiple alignment that preserves consistency between the pairwise alignments
- Can align sequences of varying lengths
[http://baboon.math.berkeley.edu/amap/ AMAP] 2007
- Multiple sequence alignment by sequence annealing
Probabilistic Methods
[http://probcons.stanford.edu/ Probcons] 2005
[http://probalign.njit.edu/probalign/login ProbAlign] 2006
- Estimates amino acid posterior probabilities using a partition function of the alignments.
- Computes the maximum expected accuracy alignment after applying the probability consistency transformation of Probcons.
- Improvements best seen with datasets of variable and long length sequences.
Viewers
[http://www.jalview.org/ JalView]
- Multiple alignment viewer/editor written in Java
Background Literature
[http://www.connotea.org/rss/user/s2hui?download=view Literature List on Connotea]
Textbook
[http://www.baderlab.org/DomainSpecificityPredictionProject/Reading Molecular Biology of the Cell]