Frequently Asked Questions about MemPype
- What is MemPype?
- Why should I use MemPype?
- Which predictors are included in MemPype and what are they used for?
- How can I submit a prediction?
- How should I interpret the MemPype result page?
- The prediction is taking a long time, do I have to wait online?
- How can I retrieve the results of my old prediction?
- Why did you called it MemPype?
- Are there limitations in using the server?
- Why should I register?
What is MemPype?
MemPype is a Python-based pipeline that integrates several tools aimed at predicting the topology and the subcellular localization of eukaryotic membranes proteins.
Why should I use MemPype?
MemPype predicts the topology and the subcellular localization of a membrane protein starting from its residue sequence.
More specifically, MemPype predicts:
- the presence of a Signal peptide and the correposponding cleavage site;
- the presence of a GPI-anchor and the corresponding cleavage site;
- the presence and the position of transmembrane helices along the sequence;
- the orientation of the membrane protein with respect to the lipid bilayer;
- the subcellular localization in a three class model, comprising cell membrane, organelle membranes, and internal membranes.
Moreover, annotated proteins sharing high similarity with the query sequence are retrieved from SwissProt (> 50% sequence identity, > 50% sequence coverage, E-value < 1 E-05) and the corresponding experimental annotations concerning subcellular localization and topology are extracted.
Which predictors are included in the pipeline and what are they used for?
MemPype predicion pipeline is curremtly composed by four prediction methods:
- SPEP1 is a Neural Network-based tool for predicting the presence of an N-terminal signal peptide and the position of the clevage site. The rate of false positive and false negative prediction for Eukaryotes are as high as 4% and 3%, respectively (Fariselli et al.,2003).
- Pred-GPI2 is a tool based on Support Vector Machines and Hidden Markov Models for predicting the presence of a GPI-anchor and the position of the omega (cleavage) site (Pierleoni et al., 2009). The system is able to give high accuracy predictions that discriminate up to 89% of the known GPI-anchored proteins with a false positive rate equal to 0.15%.
- ENSEMBLE 3.03 is a new version of the ENSEMBLE tool for predicting the topology of all-alpha membrane proteins (Martelli et al., 2003). In its present version, the training database has been updated. ENSEMBLE3.0 predicts the correct location of alpha-helices along the sequence for 91% of proteins and the correct topology for 86% of proteins. Concerning the classification between transmembrane and globular proteins, the false positive and the false negative rates are below 2%.
- MemLoci4 is a Support Vector Machine based predictor specifically trained to discriminate the localization of membrane proteins in three classes: cell membrane, internal membranes, organelle membranes. On this three-class discrimination problem, MemLoci reaches an accuracy of 70% and a generalized correlation coefficient as high as 0.50.
References
- Fariselli P, Finocchiaro G, Casadio R - SPEPlip: the detection of signal peptide and lipoprotein cleavage sites - Bioinformatics 19:2498-2499 (2003)
- Martelli PL, Fariselli P, Casadio R - An ENSEMBLE machine learning approach for the prediction of all-alpha membrane proteins - Bioinformatics 19:I205-I211 (2003)
- Pierleoni A, Martelli PL, Casadio R - PredGPI: a GPI anchor predictor - BMC Bioinformatics 9:392 (2008)
- Pierleoni A, Martelli PL, Casadio R - MemLoci: predicting subcellular localization of membrane proteins in Eukaryotes - Bioinformatics, DOI: 10.1093/bioinformatics/BTR108 (2011)
How can I submit a prediction?
You can submit one or more protein sequences (up to 5, for registered users) for prediction in the prediction page.
The webserver requires you to submit sequences in FASTA format, like this one:
>sp|O97148|MTH_DROME G-protein coupled receptor Mth MKTLLVLRISTVILVVLVIQKSYADILECDYFDTVDISAAQKLQNGSYLFEGLLVPAILT GEYDFRILPDDSKQKVARHIRGCVCKLKPCVRFCCPHDHIMDNGVCYDNMSDEELAELDP FLNVTLDDGSVSRRHFKNELIVQWDLPMPCDGMFYLDNREEQDKYTLFENGTFFRHFDRV TLRKREYCLQHLTFADGNATSIRIAPHNCLIVPSITGQTVVMISSLICMVLTIAVYLFVK KLQNLHGKCFICYMVCLFMGYLFLLLDLWQISISFCKPAGFLGYFFVMAAFFWLSVISLH LWNTFRGSSHKANRFLFEHRFLAYNTYAWGMAVVLTGITVLADNIVENQDWNPRVGHEGH CWIYTQAWSAMLYFYGPMVFLIAFNITMFILTAKRILGVKKDIQNFAHRQERKQKLNSDK QTYTFFLRLFIIMGLSWSLEIGSYFSQSNQTWANVFLVADYLNWSQGIIIFILFVLKRST WRLLQESIRGEGEEVNNSEEEISLENTTTRNVLL
Sequences must be at least 50 residues long.
Please avoid using non-standard residue codes such as: B, U, Z, O, J, X and *
You can choose a name for the job submission for further reference.
How shoud I interpret MemPype result page?
MemPype webserver reports prediction outputs at several levels of detail.
For each submitted protein three levels of annotation are reported:
Prediction summary
This section summarizes the prediction results, reporting whether transmembrane helices are predicted, their number, and the membrane to which the protein is associated.
Prediction summary:
Cell Membrane, 7 Transmembrane helices
Detailed prediction results
This section details the output of the predictors included in the pipeline. In particular the following features are reported:
- the subcellular localization in the three class partition (cell membrane, organelle membranes, and internal membranes). For each predicted localization a likelihood score is given ranging from -100% to 100%. Positive scores means the protein is predicted to be located in the given comparment. The higher the value the higher the likelihood is. The highest scoring localization is choosen as the final prediction.
The three localization classes are derived from a grouping of all the membranes available in eukaryotes, and corresponds to:- Cell membrane: Cell membrane
- Organelle membranes: mitochondrial or plastidial membranes
- Internal membranes: endoplasmic reticulum, nucleus, golgi apparatus, vesicles, vacuoles, lysosomes, peroxisome, microsomes, and endosome membranes
- the presence of the signal peptide and the position of the cleavage site
- the position of the transmembrane helices and topological domains
- the presence of GPI-anchors and the position of the cleavage (or omega) site. Please note that the position of the cleaved peptide is indicated here, and the GPI-anchor is attached to the last remaining residue of the mature protein.
By clicking on view on sequence link, a new page is open, where the predicted region is highlighted on the sequence.
Detailed prediction results
Predicted membrane localization (MemLoci): Internal membranes
Cell membrane score: 15%Internal membranes score: 95%Organellar membranes: -1%Predicted sequence features:
Prediction Presence Start End Detail Signal peptide (Spep): NO - - not present Cytoplasmic region (Ensemble): - 1 2265 view on sequence Transmembrane region (Ensemble): YES 2266 2292 view on sequence Non cytoplasmic region (Ensemble): - 2293 2305 view on sequence Transmembrane region (Ensemble): YES 2306 2326 view on sequence Cytoplasmic region (Ensemble): - 2327 2332 view on sequence Transmembrane region (Ensemble): YES 2333 2370 view on sequence Non cytoplasmic region (Ensemble): - 2371 2385 view on sequence Transmembrane region (Ensemble): YES 2386 2421 view on sequence Cytoplasmic region (Ensemble): - 2422 2438 view on sequence Transmembrane region (Ensemble): YES 2439 2462 view on sequence Non cytoplasmic region (Ensemble): - 2463 2564 view on sequence Transmembrane region (Ensemble): YES 2565 2589 view on sequence Cytoplasmic region (Ensemble): - 2590 2749 view on sequence GPI anchor (PredGPI): NO - - not present - the subcellular localization in the three class partition (cell membrane, organelle membranes, and internal membranes). For each predicted localization a likelihood score is given ranging from -100% to 100%. Positive scores means the protein is predicted to be located in the given comparment. The higher the value the higher the likelihood is. The highest scoring localization is choosen as the final prediction.
Annotation of similar proteins in SwissProt
The most similar proteins endowed with experimental annotations in SwissProt are retrieved with a BLAST search, considering only the hits with E-value lower < 1E-5, sequence identity > 50% and sequence coverage > 50%. Up to 25 higly similar proteins, are retrieved and the corresponding experimental annotation are analyzed and reported in tag cloud diagrams. Tag relevance is color-coded ranging from red (for unfrequent tags) to green (for tags shared by many similar proteins). Up to five types of annotation can be reported: the SwissProt localization and topology annotations, plus the associated GO terms divided by the three GO domains.
By pointing the mouse over each tag, its numeric occurence is reported, and by clicking on the tag a new page is opened, linking to the corresponding similar proteins in UniProtKB.Annotation of similar proteins in SwissProt
6 similar entries found
SwissProt experimental localization
(6 annotated entries)
SwissProt experimental topology
(1 annotated entries)
GO experimental cellular compartment
(1 annotated entries)
GO experimental molecular function
(2 annotated entries)
GO experimental biological process
(3 annotated entries)
Color legend:
Percent of similar proteins carrying an annotation
The prediction is taking a long time, do I have to wait online?
Normally each prediction should be completed in less than 30 seconds. However, if the webserver is overloaded, you might wait longer to get the prediction results. In this case you can bookmark your result page, and come back later. You can also simply take note of the provided job code and use it to retrieve the prediction later.
How can I retrieve the results of my old predictions?
A list of the latest 10 anonymously submitted predictions is always displayed for convenience at the recent jobs page. In the same page you can provide an old job code to retrieve prediction results. Your results are for a week after the predicion completion.
If you are a registered user, you can retrieve your predictions from the my jobs page. Results for registered users are available at least for a month after the predicion completion, and will be kept online as long as possible.
Why did you called it MemPype?
MemPype is a PIPEline of predictors for MEMbrane protein localization, and is completely implemented in PYthon (even the web framework, currently using web2py).
Are there limitations in using the server?
This web site is free and open to all users and there is no login requirement. However, to avoid abuses and ensure a fair usage for every user, anonymous users are allowed for a maximum of 10 requests (one sequence per request) per hour. A registration procedure is also available. Registered users are allowed to submit up to 30 request (with up to 5 sequences per request) per hour
If you need faster access to the webserver please contact us.
Why should I register?
MemPype predictions can be accessed without any registration.
However registered users have some benefits:
All the predictions are free of charge. Registration is intended to mantain a fair usage policy and contact you if necessary.