Sunday, September 23, 2012

This same cutoff is still used


This same cutoff is still used by the Protein Information Resource (PIR). A protein family comprises proteins with the same function in different organisms (orthologous sequences) but may also include proteins in the same organism (paralogous sequences) derived from gene duplication and rearrangements. Sony Vaio VGN-FW139E/H Battery
If a multiple sequence alignment of a protein family reveals a common level of similarity throughout the lengths of the proteins, PIR refers to the family as a homeomorphic family. The aligned region is referred to as a homeomorphic domain, and this region may comprise several smaller homology domains that are shared with other families. Sony Vaio VGN-FW145E/W Battery
Families may be further subdivided into subfamilies or grouped into superfamilies based on respective higher or lower levels of sequence similarity. The SCOP database reports 1296 families and the CATH database (version 1.7 beta), reports 1846 families.
When the sequences of proteins with the same function are examined in greater detail, some are found to share high sequence similarity. Sony Vaio VGN-FW15T Battery
They are obviously members of the same family by the above criteria. However, others are found that have very little, or even insignificant, sequence similarity with other family members. In such cases, the family relationship between two distant family members A and C can often be demonstrated by finding an additional family member B Sony Vaio VGN-FW17/B Battery
that shares significant similarity with both A and C. Thus, B provides a connecting link between A and C. Another approach is to examine distant alignments for highly conserved matches.
At a level of identity of 50%, proteins are likely to have the same three-dimensional structure, Sony Vaio VGN-FW17T/H Battery
and the identical atoms in the sequence alignment will also superimpose within approximately 1 Å in the structural model. Thus, if the structure of one member of a family is known, a reliable prediction may be made for a second member of the family, and the higher the identity level, the more reliable the prediction. Sony Vaio VGN-FW17W Battery
Protein structural modeling can be performed by examining how well the amino acid substitutions fit into the core of the three-dimensional structure.
family (structural context)
as used in the FSSP database (Families of structurally similar proteins) and the DALI/FSSP Web site, Sony Vaio VGN-FW19 Battery
refers to two structures that have a significant level of structural similarity but not necessarily significant sequence similarity.
fold
a term with similar meaning to structural motif, but in general refers to a somewhat larger combination of secondary structural units in the same configuration. Sony Vaio VGN-FW21E Battery
Thus, proteins sharing the same fold have the same combination of secondary structures that are connected by similar loops. An example is the Rossman fold comprising several alternating α helices and parallel β strands. In the SCOP, CATH, and FSSP databases, Sony Vaio VGN-FW21J Battery
the known protein structures have been classified into hierarchical levels of structural complexity with the fold as a basic level of classification.
homologous domain (sequence context)
refers to an extended sequence pattern, generally found by sequence alignment methods, that indicates a common evolutionary origin among the aligned sequences. Sony Vaio VGN-FW21L Battery
A homology domain is generally longer than motifs. The domain may include all of a given protein sequence or only a portion of the sequence. Some domains are complex and made up of several smaller homology domains that became joined to form a larger one during evolution. Sony Vaio VGN-FW21M Battery
A domain that covers an entire sequence is called the homeomorphic domain by PIR (Protein Information Resource).
module
a region of conserved amino acid patterns comprising one or more motifs and considered to be a fundamental unit of structure or function. Sony Vaio VGN-FW21Z Battery
The presence of a module has also been used to classify proteins into families.
motif (sequence context)
refers to a conserved pattern of amino acids that is found in two or more proteins. In the Prosite catalog, a motif is an amino acid pattern that is found in a group of proteins that have a similar biochemical activity, and that often is near the active site of the protein. Sony Vaio VGN-FW31M Battery
Examples of sequence motif databases are the Prosite catalog (http://www.expasy.ch/prosite) and the Stanford Motifs Database (http://dna.stanford.edu/emotif/).
motif (structural context)
refers to a combination of several secondary structural elements produced by the folding of adjacent sections of the polypeptide chain into a specific three-dimensional configuration. Sony Vaio VGN-FW31ZJ Battery
An example is the helix-loop-helix motif. Structural motifs are also referred to as supersecondary structures and folds.
position-specific scoring matrix (sequence context, also known as weight or scoring matrix)
represents a conserved region in a multiple sequence alignment with no gaps. Sony Vaio VGN-FW32J Battery
Each matrix column represents the variation found in one column of the multiple sequence alignment.
Position-specific scoring matrix—3D (structural context) represents the amino acid variation found in an alignment of proteins that fall into the same structural class. Sony Vaio VGN-FW31J Battery
Matrix columns represent the amino acid variation found at one amino acid position in the aligned structures.
primary structure
refers to the linear amino acid sequence of a protein, which chemically is a polypeptide chain composed of amino acids joined by peptide bonds. Sony Vaio VGN-FW41E/H Battery
profile (sequence context)
a scoring matrix that represents a multiple sequence alignment of a protein family. The profile is usually obtained from a well-conserved region in a multiple sequence alignment. The profile is in the form of a matrix with each column representing a position in the alignment and each row one of the amino acids. Sony Vaio VGN-FW41J/H Battery
Matrix values give the likelihood of each amino acid at the corresponding position in the alignment. The profile is moved along the target sequence to locate the best scoring regions by a dynamic programming algorithm. Gaps are allowed during matching and a gap penalty is included in this case as a negative score when no amino acid is matched. Sony Vaio VGN-FW41M/H Battery
A sequence profile may also be represented by a hidden Markov model, referred to as a profile HMM (hidden markov model).
profile (structural context)
a scoring matrix that represents which amino acids should fit well and which should fit poorly at sequential positions in a known protein structure. Sony Vaio VGN-FW41ZJ/H Battery
Profile columns represent sequential positions in the structure, and profile rows represent the 20 amino acids. As with a sequence profile, the structural profile is moved along a target sequence to find the highest possible alignment score by a dynamic programming algorithm. Gaps may be included and receive a penalty. Sony Vaio VGN-FW50B Battery
The resulting score provides an indication as to whether or not the target protein might adopt such a structure.
quaternary structure
the three-dimensional configuration of a protein molecule comprising several independent polypeptide chains. Sony Vaio VGN-FW51B/W Battery
secondary structure
refers to the interactions that occur between the C, O, and NH groups on amino acids in a polypeptide chain to form α-helices, β-sheets, turns, loops, and other forms, and that facilitate the folding into a three-dimensional structure.
superfamily
a group of protein families of the same or different lengths that are related by distant yet detectable sequence similarity. Sony Vaio VGN-CS11S/P Battery
Members of a given superfamily thus have a common evolutionary origin. Originally, Dayhoff defined the cutoff for superfamily status as being the chance that the sequences are not related of 10 6, on the basis of an alignment score (Dayhoff et al. 1978). Sony Vaio VGN-CS11S/Q Battery
Proteins with few identities in an alignment of the sequences but with a convincingly common number of structural and functional features are placed in the same superfamily. At the level of three-dimensional structure, superfamily proteins will share common structural features such as a common fold, Sony Vaio VGN-CS11S/W Battery
but there may also be differences in the number and arrangement of secondary structures. The PIR resource uses the term homeomorphic superfamilies to refer to superfamilies that are composed of sequences that can be aligned from end to end, representing a sharing of single sequence homology domain, a region of similarity that extends throughout the alignment. Sony Vaio VGN-CS11Z/R Battery
This domain may also comprise smaller homology domains that are shared with other protein families and superfamilies. Although a given protein sequence may contain domains found in several superfamilies, thus indicating a complex evolutionary history, sequences will be assigned to only one homeomorphic superfamily based on the presence of similarity throughout a multiple sequence alignment. Sony Vaio VGN-CS11Z/T Battery
The superfamily alignment may also include regions that do not align either within or at the ends of the alignment. In contrast, sequences in the same family align well throughout the alignment.
supersecondary structure a term with similar meaning to a structural motif. Sony Vaio VGN-CS31S/P Battery
Tertiary structure is the three-dimensional or globular structure formed by the packing together or folding of secondary structures of a polypeptide chain.
Secondary structure prediction is a set of techniques in bioinformatics that aim to predict the local secondary structures of proteins and RNA sequences based only on knowledge of their primary structure — Sony Vaio VGN-CS31S/R Battery
amino acid or nucleotide sequence, respectively. For proteins, a prediction consists of assigning regions of the amino acid sequence as likely alpha helices, beta strands (often noted as "extended" conformations), or turns. The success of a prediction is determined by comparing it to the results of the DSSP Sony Vaio VGN-CS31S/T Battery
algorithm applied to the crystal structure of the protein; for nucleic acids, it may be determined from the hydrogen bonding pattern. Specialized algorithms have been developed for the detection of specific well-defined patterns such as transmembrane helices and coiled coilsin proteins, or canonical microRNA structures in RNA.[1] Sony Vaio VGN-CS31S/V Battery
The best modern methods of secondary structure prediction in proteins reach about 80% accuracy[2] ; this high accuracy allows the use of the predictions in fold recognition and ab initio protein structure prediction, classification of structural motifs, and refinement of sequence alignments. Sony Vaio VGN-CS31S/W Battery
The accuracy of current protein secondary structure prediction methods is assessed in weekly benchmarks such as LiveBench and EVA.
Early methods of secondary structure prediction, introduced in the 1960s and early 1970s,[3] Sony Vaio VGN-CS21S/P Battery
focused on identifying likely alpha helices and were based mainly onhelix-coil transition models.[4] Significantly more accurate predictions that included beta sheets were introduced in the 1970s and relied on statistical assessments based on probability parameters derived from known solved structures. Sony Vaio VGN-CS21S/R Battery
These methods, applied to a single sequence, are typically at most about 60-65% accurate, and often underpredict beta sheets.[1] The evolutionary conservation of secondary structures can be exploited by simultaneously assessing manyhomologous sequences in a multiple sequence alignment, Sony Vaio VGN-CS21S/T Battery
by calculating the net secondary structure propensity of an aligned column of amino acids. In concert with larger databases of known protein structures and modern machine learning methods such as neural nets and support vector machines, these methods can achieve up 80% overall accuracy in globular proteins.[5] Sony Vaio VGN-CS21S/V Battery
The theoretical upper limit of accuracy is around 90%,[5] partly due to idiosyncrasies in DSSP assignment near the ends of secondary structures, where local conformations vary under native conditions but may be forced to assume a single conformation in crystals due to packing constraints. Sony Vaio VGN-CS21S/W Battery
Limitations are also imposed by secondary structure prediction's inability to account for tertiary structure; for example, a sequence predicted as a likely helix may still be able to adopt a beta-strand conformation if it is located within a beta-sheet region of the protein and its side chains pack well with their neighbors. Sony Vaio VGN-CS21Z/Q Battery
Dramatic conformational changes related to the protein's function or environment can also alter local secondary structure.
The Chou-Fasman method was among the first secondary structure prediction algorithms developed and relies predominantly on probability parameters determined from relative frequencies of each amino acid's appearance in each type of secondary structure. Sony VAIO VGN-NW21EF/S Battery
The original Chou-Fasman parameters, determined from the small sample of structures solved in the mid-1970s, produce poor results compared to modern methods, though the parameterization has been updated since it was first published. The Chou-Fasman method is roughly 50-60% accurate in predicting secondary structures.[1] Sony VAIO VGN-NW21JF Battery
The GOR method, named for the three scientists who developed it — Garnier, Osguthorpe, and Robson — is an information theory-based method developed not long after Chou-Fasman. It uses a more powerful probabilistic techniques of Bayesian inference.[7] Sony VAIO VGN-NW21MF Battery
The method is a specific optimized application of mathematics and algorithms developed in a series of papers by Robson and colleagues, eg.[8] and [9]). The GOR method is capable of continued extension by such principles, and has gone through several versions. The GOR method takes into account not only the probability of each amino acid having a particular secondary structure, Sony VAIO VGN-NW21MF/W Battery
but also the conditional probability of the amino acid assuming each structure given the contributions of its neighbors (it does not assume that the neighbors have that same structure). Sony VAIO VGN-SR51MF/P Battery
The approach is both more sensitive and more accurate than that of Chou and Fasman because amino acid structural propensities are only strong for a small number of amino acids such as proline and glycine. Sony VAIO VGN-NW21ZF Battery
Weak contributions from each of many neighbors can add up to strong effect overall. The original GOR method was roughly 65% accurate and is dramatically more successful in predicting alpha helices than beta sheets, which it frequently mispredicted as loops or disorganized regions.[1] Sony VAIO VGN-NW31EF/W Battery
Later GOR methods considered also pairs of amino acids, significantly improving performance. The major difference from the following technique is perhaps that the weights in an implied network of contributing terms are assigned a priori, from statistical analysis of proteins of known structure, not by feedback to optimize agreement with a training set of such. Sony VAIO VGN-NW31JF Battery
Neural network methods use training sets of solved structures to identify common sequence motifs associated with particular arrangements of secondary structures. These methods are over 70% accurate in their predictions, although beta strands are still often underpredicted due to the lack of three-dimensional structural information thatSony VAIO VGN-NW320F/B Battery
would allow assessment of hydrogen bonding patterns that can promote formation of the extended conformation required for the presence of a complete beta sheet.[1]
Support vector machines have proven particularly useful for predicting the locations of turns, which are difficult to identify with statistical methods. Sony VAIO VGN-NW320F/TC Battery
The requirement of relatively small training sets has also been cited as an advantage to avoid overfitting to existing structural data.[11]
Extensions of machine learning techniques attempt to predict more fine-grained local properties of proteins, such as backbone dihedral angles in unassigned regions. Sony VAIO VGN-NW35E Battery
Both SVMs[12] and neural networks[13] have been applied to this problem.[10] More recently, real-value torsion angles can be accurately predicted by SPINE-X and successfully employed for ab initio structure prediction.[14]
It is reported that in addition to the protein sequence, secondary structure formation depends on other factors. Sony VAIO VGN-NW380F/S Battery
For example, it is reported that secondary structure tendencies depend also on local environment,[15] solvent accessibility of residues,[16] protein structural class,[17] and even the organism from which the proteins are obtained.[18] Based on such observations, some studies have shown that secondary structure prediction can be improved Sony VAIO VGN-NW380F/T Battery
by addition of information about protein structural class,[19] residue accessible surface area[20][21] and also contact number information.[22]
Sequence covariation methods rely on the existence of a data set composed of multiple homologous RNA sequences with related but dissimilar sequences. Sony VAIO VGN-NW50JB Battery
These methods analyze the covariation of individual base sites in evolution; maintenance at two widely separated sites of a pair of base-pairing nucleotides indicates the presence of a structurally required hydrogen bond between those positions. The general problem of pseudoknot prediction has been shown to be NP-complete.[23] Sony VAIO VGN-NW51FB/N Battery
The practical role of protein structure prediction is now more important than ever. Massive amounts of protein sequence data are produced by modern large-scaleDNA sequencing efforts such as the Human Genome Project. Despite community-wide efforts in structural genomics, Sony VAIO VGN-NW51FB/W Battery
the output of experimentally determined protein structures—typically by time-consuming and relatively expensive X-ray crystallography or NMR spectroscopy—is lagging far behind the output of protein sequences.
The protein structure prediction remains an extremely difficult and unresolved undertaking. Sony VAIO VGN-NW70JB Battery
The two main problems are calculation of protein free energy andfinding the global minimum of this energy. A protein structure prediction method must explore the space of possible protein structures which is astronomically large. These problems can be partially bypassed in "comparative" or homology modeling and fold recognition methods, Sony VAIO VGN-NW71FB/N Battery
in which the search space is pruned by the assumption that the protein in question adopts a structure that is close to the experimentally determined structure of another homologous protein. On the other hand, the de novo or ab initio protein structure prediction methods must explicitly resolve these problems. Sony VAIO VGN-NW71FB/W Battery
The progress and challenges in protein structure prediction has been reviewed in Zhang 2008.
Ab initio- or de novo- protein modelling methods seek to build three-dimensional protein models "from scratch", i.e., based on physical principles rather than (directly) on previously solved structures. Sony VAIO VGN-NW91FS Battery
There are many possible procedures that either attempt to mimic protein folding or apply some stochastic method to search possible solutions (i.e., global optimization of a suitable energy function). These procedures tend to require vast computational resources, and have thus only been carried out for tiny proteins. Sony VAIO VGN-NW91GS Battery
To predict protein structure de novo for larger proteins will require better algorithms and larger computational resources like those afforded by either powerful supercomputers (such as Blue Gene or MDGRAPE-3) or distributed computing (such as Folding@home, the Human Proteome Folding Project and Rosetta@Home). Sony VAIO VGN-NW91VS Battery
Although these computational barriers are vast, the potential benefits of structural genomics (by predicted or experimental methods) make ab initio structure prediction an active research field.[24]
As an intermediate step towards predicted protein structures, contact map predictions have been proposed. Sony VAIO VGN-SR19VN Battery
As of 2009, a 50-residue protein could be simulated atom-by-atom on a supercomputer for 1 millisecond.[25] As of 2012, comparable stable-state sampling could be done on a standard desktop with a new graphics card and more sophisticated algorithms.[26] Sony VAIO VGN-SR19VN Battery
Comparative protein modelling uses previously solved structures as starting points, or templates. This is effective because it appears that although the number of actual proteins is vast, there is a limited set of tertiary structural motifs to which most proteins belong. It has been suggested that there are only around 2,000 distinct protein folds in nature, though there are many millions of different proteins. Sony VAIO VGN-SR19VRN Battery
Accurate packing of the amino acid side chains represents a separate problem in protein structure prediction. Methods that specifically address the problem of predicting side-chain geometry include dead-end elimination and the self-consistent mean field methods. Sony VAIO VGN-SR19XN Battery
The side chain conformations with low energy are usually determined on the rigid polypeptide backbone and using a set of discrete side chain conformations known as "rotamers." The methods attempt to identify the set of rotamers that minimize the model's overall energy.
These methods use rotamer libraries, which are collections of favorable conformations for each residue type in proteins. Sony VAIO VGN-SR19XN Battery
Rotamer libraries may contain information about the conformation, its frequency, and the standard deviations about mean dihedral angles, which can be used in sampling.[29] Rotamer libraries are derived from structural bioinformatics or other statistical analysis of side-chain conformations in known experimental structures of proteins, Sony VAIO VGN-SR210J/S Battery
such as by clustering the observed conformations for tetrahedral carbons near the staggered (60°, 180°, -60°) values.
Rotamer libraries can be backbone-independent, secondary-structure-dependent, or backbone-dependent. Backbone-independent rotamer libraries make no reference to backbone conformation, Sony VAIO VGN-SR21M/S Battery
and are calculated from all available side chains of a certain type (for instance, the first example of a rotamer library, done by Ponder and Richards at Yale in 1987).[30] Secondary-structure-dependent libraries present different dihedral angles and/or rotamer frequencies for -helix, -sheet, or coil secondary structures. Sony VAIO VGN-SR21RM/H Battery
Backbone-dependent rotamer libraries present conformations and/or frequencies dependent on the local backbone conformation as defined by the backbone dihedral angles  and , regardless of secondary structure.[33]
The modern versions of these libraries as used in most software are presented as multidimensional distributions of probability or frequency, Sony VAIO VGN-SR21RM/S Battery
where the peaks correspond to the dihedral-angle conformations considered as individual rotamers in the lists. Some versions are based on very carefully curated data and are used primarily for structure validation,while others emphasize relative frequencies in much larger data sets and are the form used primarily for structure prediction, such as the Dunbrack rotamer libraries.Sony VAIO VGN-SR220J/B Battery
Side-chain packing methods are most useful for analyzing the protein's hydrophobic core, where side chains are more closely packed; they have more difficulty addressing the looser constraints and higher flexibility of surface residues, which often occupy multiple rotamer conformations rather than just one.Sony VAIO VGN-SR51B/P Battery,Sony VAIO VGN-SR51B/S Battery,Sony VAIO VGN-SR51MF Battery

No comments:

Post a Comment