Thursday 28 March 2013

QSLiMFinder at Cold Spring Habor Laboratory "Systems Biology: Networks" 2013

This month saw another successful "Systems Biology: Networks" meeting held at Cold Spring Habor Laboratory, New York. SLiMSuite was well represented with two posters, which you can now view online if you like:

1. Palopoli N & Edwards RJ. Improved computational prediction of Short Linear Motifs using specific protein-protein interaction data.
Short Linear Motifs (SLiMs) are short segments of proteins that mediate numerous domain-motif interactions (DMI). In spite of the crucial role that they play in many biological pathways, their features and diversity remain understudied. The limited size and degenerate nature of SLiMs hinder their identification by pure de novo prediction methods, which must deal with a very large motif search space entirely determined by the parameters used to build the motifs.

The most successful methods are built on an explicit model of convergent evolution for detecting over-represented motifs in unrelated proteins that share a common attribute. We have previously presented SLiMFinder[1] which accounts for the motif search space to statistically model the probability of observing a given prediction by chance. SLiMFinder greatly benefits from the incorporation of prior knowledge that reduces the sequence search space and increases sensitivity.

More recently we have extended the standard algorithm to develop QSLiMFinder, a query-focused method of SLiM discovery. In QSLiMFinder the search space is not built from the whole set of proteins but rather from one specific query protein or region thereof. By only looking at all putative motifs in the query that may be shared by the rest, the motif space is significantly reduced and the sensitivity is increased. Moreover, DMI data can be used to focus on a specific query region rather than in the complete protein. A major plus of QSLiMFinder is its ability to incorporate this information from three-dimensional structures of interacting proteins, like those in the database of 3D Interaction Domains (3DID)[2] or as predicted from structural data[3].

A thorough comparative benchmark of the SLiMFinder and QSLiMFinder performances on datasets of known motifs has confirmed that the latter typically returns motifs with higher significance and produces more results that are enriched against expectation. As expected, QSLiMFinder improves sensitivity by ‘zooming-in’ in the region of interest and paves the way to mine interaction data for novel SLiMs.
1. Edwards RJ, Davey NE, Shields DC. (2007) SLiMFinder: a probabilistic method for identifying over-represented, convergently evolved, short linear motifs in proteins. PLoS One; 2(10):e967.
2. Stein A, Ceol A, Aloy P. (2011) 3did: identification and classification of domain-based interactions of known three-dimensional structure. Nucleic Acids Res; 39:D718-723.
3. Stein A, Aloy P. (2010) Novel peptide-mediated interactions derived from high-resolution 3-dimensional structures. PLoS Comput Biol. 6(5):e1000789.

2. Edwards RJ & Palopoli N. Computational prediction of short linear motifs mediating host-pathogen protein-protein interactions.
Short Linear Motifs (SLiMs) are short functional protein sequences that act as ligands to mediate transient protein-protein interactions (PPI) in critical biological pathways and signaling networks. SLiMs are short (3-15aa), generally tolerate considerable sequence variation and typically have fewer than five residues critical for function. These features result in a degree of evolutionary plasticity not seen in domains and SLiMs often add new functions to proteins by convergent evolution. This is particularly prevalent in viruses, which often exploit SLiMs to manipulate the molecular machinery of host cells[1].

In recent years, the numbers of tools and algorithms for SLiM discovery has increased dramatically. Of these, SLiMFinder[2], which exploits a statistical model of convergent evolution to predict novel over-represented motifs with high specificity, repeatedly performs well in comparative studies. The size and degeneracy of SLiMs presents a challenge for computational identification, making it difficult to differentiate biological signal from stochastic patterns. SLiMs generally occur in structurally disordered regions of proteins and exhibit evolutionary conservation relative to other disordered residues, which can be exploited by SLiMFinder to reduce the sequence search space and improve predictions. We have recently developed QSLiMFinder (“Query SLiMFinder”), an extended version of the algorithm that can incorporate specific interaction data to restrict the motif search space and improve both the sensitivity and biological relevance of predictions. Whereas SLiMFinder can ask the general question of which motifs are enriched in a set of proteins that interact with a common partner[3], QSLiMFinder can specifically ask which of the motifs present in a viral protein are enriched in the set of host proteins that interact with the same host partner. By applying this to combined interactomes of host-host and host-pathogen PPI, it should be possible to identify novel candidates for viral mimicry of host SLiMs.

1. Davey NE, TravĂ© G, Gibson TJ (2011) How viruses hijack cell regulation. Trends Biochem. Sci. 36 (3): 159–69.
2. Edwards RJ, Davey NE, Shields DC. (2007) SLiMFinder: a probabilistic method for identifying over-represented, convergently evolved, short linear motifs in proteins. PLoS One; 2(10):e967.
3. Edwards RJ, Davey NE, O'Brien K & Shields DC (2012): Interactome-wide prediction of short, disordered protein interaction motifs in humans. Molecular Biosystems 8: 282-95.