Help Page


Table of Contents:

0.- Introduction
1.- Submiting a Job
2.- Analysing the Results
3.- Improving the Results
4.- Examples & Tutorial
5.-References
6.-Contact Us


0.-Introduction   [Top]

This server is intended for the interactive prediction of the mapping between the members of two families of interacting proteins (i.e. to predict which ligand within one family interacts with which receptor within the other family). The idea behind the method is based on previous works (1, 2, 3) which demonstrate that interacting protein families tend to have similar phylogenetic trees. The extension implemented in this server to predict the mapping between the members of two families is based on the idea that the right mapping would be the one which produces the highest similarity between the two trees. Since it is imposible to explore all possible mappings, current approaches use heuristics to avoid exploring of the complete space of solutions (3). So, these approaches do not ensure the best solution and they can be trapped within a local minimum. This is why it is important to interactively explore the proposed solution(s) and to explore changes in this mapping order to improve it.

The operation with this server involves two main steps:

i) Initial submition of a job. The user gives as input the two protein families (either multiple sequence alignments or phylogenetic trees) and the server produces an initial mapping with an associated set of scores. The results are returned by email. The user can stop here and analyse/process these raw results as they are, or he/she can interactively inspect and change them in the second step.

ii) Interactive analysis of the results. In this stage, the user sends to the server the files obtained in the previous step. A new interface is presented where the user can visualize and interactively change the proposed mapping.



1.-Submiting a Job   [Top]

Although submiting a Job to TAG_TSEMA is a very simple and nearly automatic process, you ougth to provide some basic information about your job, such as:
- JOB NAME: Give a name to your Job. This compulsory information will help you identifying the job you have submited once it finishes, as it will be included in the subject of forthcoming mails.
- EMAIL: Valid email address where results (and any other forthcoming message regarding your job) will be sent to as attachments.
NOTE FOR WINDOWS USERS: Please, ensure that the returning results file is saved as *.tar.gz instead of the default *.tar.tar
- TREE/MSA for Family I/II: Path to the file with the information on distances between the proteins within both families.
Please, notice that the input can be a Newick Tree and/or a Multiple Sequence Alignment. In case a MSA is submited, the server will convert it to a Newick Tree using Clustalw (4).

IMPORTANT NOTE: Please, notice that as the biggest difference with our previous server, TSEMA, when submiting a job to TAG_TSEMA input sequences can be classifigrouped into classes at will by the user (i.e. subfamilies, organisms, known interacting groups...) Only members of the same classes are mapped together during the automatic heuristic searcMonteCarlo search performed here in. To assign a sequence to a classes the name of the sequence MUST be as follows: NAME_CLASS for example CDK2_HUMAN, BRCA1_MYGROUP1... If no class is given to a sequence, UNDEF (standing for Undefined) class is automatically assigned.

Advanced users might want to change the default paramethers such as:
- NUMBER OF MONTECARLO RUNS: The number of times a MonteCarlo heuristic search is performed (a million iterations each). Please, remember that this is a CPU time consuming process, so please change this paramether responsibily.
- SUBMITED DATA TYPE:Although the detection of the submited data-type is done automaticaly, you can also force the type in case you receive unexpected errors regarding problems with formats.
- SCORING FUNCTION:The default scoring function for measuring the similarity between the trees (distance matrices) is Pearson's T Correlation Coefficient. However, you can also use Pearson's R or RMSD (Root Mean Square Deviation) as alternative scoring function. Take a look at the references for more information on the methodology for measuring similarities between phylogenetic trees (1, 2, 3).


2.-Analysing the Results   [Top]

Once your job has finished, the results will be returned to you by email. Now you can either use these raw results, or submit them to the analysis part of the server to visualize and interactively change them.
As done for the Job Submition, there is also information that must be provided, such as:
- ANALYSIS NAME: Give a name to your Job. This compulsory information will help you to identify the job you have submited once it finishes, as it will be included in the subject of any forthcoming mail regarding it.
- EMAIL: Valid email address where messages regarding the job will be sent.
- RESULTS FILE: You must provide the file that has been sent sent by TAG_TSEMA to your email in the previous step. It is highly recommended to submit your file compressed as a *.tar.gz file to avoid network overloading.
NOTE FOR WINDOWS USERS: Please, ensure that the returning results file is saved as *.tar.gz instead of the default *.tar.tar

Advanced users might want to change the default paramethers such as:
- TOLERANCE: This parameter controls the percentage of solutions (ordered by score) that are rejected to calculate the coincidence. Restrictive analysis are less noisy, but might be incomplete in cases with high promiscuity.
- SCORING FUNCTION:The default scoring function is Pearson's Correlation Coefficient. However, you can also use RMSD (Root Mean Square Deviation) as alternative scoring function (1, 2, 3).


3.-Improving the Results   [Top]

Results from the server can be improved during a human supervised proccess.
In this interface, the user can interactively change some of the pairings predicted by the automatic procces described previously and see how the scores change accordingly.

The "coincidence table" shows in how many of the montecarlo runs these pair of proteins are linked in the final mappings. Pairs of proteins with a high value in this table may or may not be linked in the final overall best mapping reported. So, a good starting point could be to force some of these pairings to see how this new mapping looks like in the trees, which new correlation it produces, and how it affects the rest of scores.
Meaning of the scores:

Reliability: The *Reliability* score for a pair A-B (Rel_AB) indicates in how many of the 500 trees A is linked to B over the total number of pairings for A (in percentage). Note that the total number of pairings for A may be different from 500 since in some solutions A might not be linked to any protein. For the same reason, Rel_AB could be different from Rel_BA, since the number of pairings for both proteins could be different. This is why there are two values of reliability for each pair. /Rel_AB=100%/ would mean that in all the solutions where A was linked, it was linked to B. This score gives an idea of the consistence of a given link.
Segregation: The *Segregation* score (Seg) for a pair gives an idea of the difference between the Reliability of that pair and the next best reliability.
Seg_AB= (Rel_AB-2nd_best_Rel_A)*100/SUM_i(Rel_Ai)
If Rel_AB is not the highest Rel_Ai, the highest Rel_Ai is used instead of "2nd_best_Rel_A", and hence Rel would be negative. A high value of this parameter indicates, not only that that pair appears in many simulations, but that it is far from the next most frequent pair for that protein as well. A negative value indicates that that pair is not the most frequent one for that protein. There is a color scale for these two scores, from red (worst) to blue (best).
Notice that both scores can be applied not only to the coincidence matrix, but also to its transpose (Results are 2-Dimensional). .

This GUI is divided in 3 different areas:
1.- Mapping Correction: This area provides a graphical interface to easily change the interacting partners. The GUI also provides information about the reliability and the segregation (for both FamilyI and FamilyII) of the different protein pairs, and a control panel to recalculate the improved distance correlation or, if desired, to undo changes.
2.- Tree Representation: A downloadable image shows the mappings over a representation of both trees. A color scale shows the reliability of the mapping taking into account the whole stack of solutions.
3.- Correlation Plots: Two plots are shown in this area. The first one shows the actual and the previous correlation calculated, and the second one the differences between the actual mapping and the one suggested by the server as default. Please, notice that a color scheme indicates whether a specific position appears, remains, or disappears from the plot.


4.- Examples & Tutorial   [Top]

This page contains some examples for the user to get familiar with the server. For each one of the examples, the user can either take the original multiple sequence alignments of the families (or trees) and start the whole process from the beginning, or just take the precomputed raw results and run the second part of the server only (interactive analysis).
IMPORTANT: Novel users should start with this TUTORIALto get familiar with TAG_TSEMA's interface and calculations.


5.-References   [Top]

Izarzugaza JM, Juan D, Pons C, Valencia A, Pazos F.
"TAG_TSEMA: interactive prediction of protein pairings between interacting families."
In Press


1.-Izarzugaza JM, Juan D, Pons C, Ranea JA, Valencia A, Pazos F. - "TSEMA: interactive prediction of protein pairings between interacting families." , Nucleic Acids Res. 2006 Jul 1;34 (Web Server Issue); W315-9
2.-Goh, Bogan, Joachimiak, Walther and Cohen - "Coevolution of Proteins with their Interaction Partners", J.Mol.Biol. 2000
3.-Pazos and Valencia - "Similarity of phylogenetic trees as indicator of protein-protein interaction" , Protein Eng. 2001
4.-Ramani and Marcotte - "Exploiting the Co-evolution of Interacting Proteins to Discover Interaction Specificity", J.Mol.Biol 2003
5.-Thompson, Higgins and Gibson - "CLUSTAL W: Improving the sensitivity of progressive multiple sequence alignment through sequence weightin position specific gap penalties and weight matrix choice", Nucleic Acids Res. 1994


6.-Contact Us   [Top]

Jose Maria González-Izarzugaza (jmgonzalez(AT)cnio(DOT)es)
Spanish National Cancer Research Center - Centro Nacional de Investigaciones Oncológicas (CNIO)
Structural Bioinformatics Group - Grupo de Bioinformática Estructural
C/Melchor Fernádez Almagro, 3
28029 Madrid (Spain)
Phone: + (34) 917 328 000
Fax: + (34) 912 246 980