Biocomputing group - University of Bologna

Frequently Asked Questions about MemPype

What is MemPype?

MemPype is a Python-based pipeline that integrates several tools aimed at predicting the topology and the subcellular localization of eukaryotic membranes proteins.

Why should I use MemPype?

MemPype predicts the topology and the subcellular localization of a membrane protein starting from its residue sequence.

More specifically, MemPype predicts:

  1. the presence of a Signal peptide and the correposponding cleavage site;
  2. the presence of a GPI-anchor and the corresponding cleavage site;
  3. the presence and the position of transmembrane helices along the sequence;
  4. the orientation of the membrane protein with respect to the lipid bilayer;
  5. the subcellular localization in a three class model, comprising cell membrane, organelle membranes, and internal membranes.

Moreover, annotated proteins sharing high similarity with the query sequence are retrieved from SwissProt (> 50% sequence identity, > 50% sequence coverage, E-value < 1 E-05) and the corresponding experimental annotations concerning subcellular localization and topology are extracted.

Which predictors are included in the pipeline and what are they used for?

MemPype predicion pipeline is curremtly composed by four prediction methods:

  • SPEP1 is a Neural Network-based tool for predicting the presence of an N-terminal signal peptide and the position of the clevage site. The rate of false positive and false negative prediction for Eukaryotes are as high as 4% and 3%, respectively (Fariselli et al.,2003).
  • Pred-GPI2 is a tool based on Support Vector Machines and Hidden Markov Models for predicting the presence of a GPI-anchor and the position of the omega (cleavage) site (Pierleoni et al., 2009). The system is able to give high accuracy predictions that discriminate up to 89% of the known GPI-anchored proteins with a false positive rate equal to 0.15%.
  • ENSEMBLE 3.03 is a new version of the ENSEMBLE tool for predicting the topology of all-alpha membrane proteins (Martelli et al., 2003). In its present version, the training database has been updated. ENSEMBLE3.0 predicts the correct location of alpha-helices along the sequence for 91% of proteins and the correct topology for 86% of proteins. Concerning the classification between transmembrane and globular proteins, the false positive and the false negative rates are below 2%.
  • MemLoci4 is a Support Vector Machine based predictor specifically trained to discriminate the localization of membrane proteins in three classes: cell membrane, internal membranes, organelle membranes. On this three-class discrimination problem, MemLoci reaches an accuracy of 70% and a generalized correlation coefficient as high as 0.50.

References

  1. Fariselli P, Finocchiaro G, Casadio R - SPEPlip: the detection of signal peptide and lipoprotein cleavage sites - Bioinformatics 19:2498-2499 (2003)
  2. Martelli PL, Fariselli P, Casadio R - An ENSEMBLE machine learning approach for the prediction of all-alpha membrane proteins - Bioinformatics 19:I205-I211 (2003)
  3. Pierleoni A, Martelli PL, Casadio R - PredGPI: a GPI anchor predictor - BMC Bioinformatics 9:392 (2008)
  4. Pierleoni A, Martelli PL, Casadio R - MemLoci: predicting subcellular localization of membrane proteins in Eukaryotes - Bioinformatics, DOI: 10.1093/bioinformatics/BTR108 (2011)

How can I submit a prediction?

You can submit one or more protein sequences (up to 5, for registered users) for prediction in the prediction page.

The webserver requires you to submit sequences in FASTA format, like this one:

>sp|O97148|MTH_DROME G-protein coupled receptor Mth
MKTLLVLRISTVILVVLVIQKSYADILECDYFDTVDISAAQKLQNGSYLFEGLLVPAILT
GEYDFRILPDDSKQKVARHIRGCVCKLKPCVRFCCPHDHIMDNGVCYDNMSDEELAELDP
FLNVTLDDGSVSRRHFKNELIVQWDLPMPCDGMFYLDNREEQDKYTLFENGTFFRHFDRV
TLRKREYCLQHLTFADGNATSIRIAPHNCLIVPSITGQTVVMISSLICMVLTIAVYLFVK
KLQNLHGKCFICYMVCLFMGYLFLLLDLWQISISFCKPAGFLGYFFVMAAFFWLSVISLH
LWNTFRGSSHKANRFLFEHRFLAYNTYAWGMAVVLTGITVLADNIVENQDWNPRVGHEGH
CWIYTQAWSAMLYFYGPMVFLIAFNITMFILTAKRILGVKKDIQNFAHRQERKQKLNSDK
QTYTFFLRLFIIMGLSWSLEIGSYFSQSNQTWANVFLVADYLNWSQGIIIFILFVLKRST
WRLLQESIRGEGEEVNNSEEEISLENTTTRNVLL

Sequences must be at least 50 residues long.
Please avoid using non-standard residue codes such as: B, U, Z, O, J, X and *

You can choose a name for the job submission for further reference.

How shoud I interpret MemPype result page?

MemPype webserver reports prediction outputs at several levels of detail.

For each submitted protein three levels of annotation are reported:

  • Prediction summary

    This section summarizes the prediction results, reporting whether transmembrane helices are predicted, their number, and the membrane to which the protein is associated.

    Prediction summary:

    Cell Membrane, 7 Transmembrane helices

  • Detailed prediction results

    This section details the output of the predictors included in the pipeline. In particular the following features are reported:

    1. the subcellular localization in the three class partition (cell membrane, organelle membranes, and internal membranes). For each predicted localization a likelihood score is given ranging from -100% to 100%. Positive scores means the protein is predicted to be located in the given comparment. The higher the value the higher the likelihood is. The highest scoring localization is choosen as the final prediction.
      The three localization classes are derived from a grouping of all the membranes available in eukaryotes, and corresponds to:
      • Cell membrane: Cell membrane
      • Organelle membranes: mitochondrial or plastidial membranes
      • Internal membranes: endoplasmic reticulum, nucleus, golgi apparatus, vesicles, vacuoles, lysosomes, peroxisome, microsomes, and endosome membranes
    2. the presence of the signal peptide and the position of the cleavage site
    3. the position of the transmembrane helices and topological domains
    4. the presence of GPI-anchors and the position of the cleavage (or omega) site. Please note that the position of the cleaved peptide is indicated here, and the GPI-anchor is attached to the last remaining residue of the mature protein.

    By clicking on view on sequence link, a new page is open, where the predicted region is highlighted on the sequence.

    Detailed prediction results

    Predicted membrane localization (MemLoci): Internal membranes

    Cell membrane score: 15%
    Internal membranes score: 95%
    Organellar membranes: -1%

    Predicted sequence features:

    PredictionPresenceStartEndDetail
    Signal peptide (Spep):NO--not present
    Cytoplasmic region (Ensemble):-12265view on sequence
    Transmembrane region (Ensemble):YES22662292view on sequence
    Non cytoplasmic region (Ensemble):-22932305view on sequence
    Transmembrane region (Ensemble):YES23062326view on sequence
    Cytoplasmic region (Ensemble):-23272332view on sequence
    Transmembrane region (Ensemble):YES23332370view on sequence
    Non cytoplasmic region (Ensemble):-23712385view on sequence
    Transmembrane region (Ensemble):YES23862421view on sequence
    Cytoplasmic region (Ensemble):-24222438view on sequence
    Transmembrane region (Ensemble):YES24392462view on sequence
    Non cytoplasmic region (Ensemble):-24632564view on sequence
    Transmembrane region (Ensemble):YES25652589view on sequence
    Cytoplasmic region (Ensemble):-25902749view on sequence
    GPI anchor (PredGPI):NO--not present
  • Annotation of similar proteins in SwissProt

    The most similar proteins endowed with experimental annotations in SwissProt are retrieved with a BLAST search, considering only the hits with E-value lower < 1E-5, sequence identity > 50% and sequence coverage > 50%. Up to 25 higly similar proteins, are retrieved and the corresponding experimental annotation are analyzed and reported in tag cloud diagrams. Tag relevance is color-coded ranging from red (for unfrequent tags) to green (for tags shared by many similar proteins). Up to five types of annotation can be reported: the SwissProt localization and topology annotations, plus the associated GO terms divided by the three GO domains.
    By pointing the mouse over each tag, its numeric occurence is reported, and by clicking on the tag a new page is opened, linking to the corresponding similar proteins in UniProtKB.

    Annotation of similar proteins in SwissProt

    6 similar entries found

    SwissProt experimental localization

    (6 annotated entries)

    SwissProt experimental topology

    (1 annotated entries)

    GO experimental cellular compartment

    (1 annotated entries)

    GO experimental molecular function

    (2 annotated entries)

    Color legend:

    Percent of similar proteins carrying an annotation

The prediction is taking a long time, do I have to wait online?

Normally each prediction should be completed in less than 30 seconds. However, if the webserver is overloaded, you might wait longer to get the prediction results. In this case you can bookmark your result page, and come back later. You can also simply take note of the provided job code and use it to retrieve the prediction later.

How can I retrieve the results of my old predictions?

A list of the latest 10 anonymously submitted predictions is always displayed for convenience at the recent jobs page. In the same page you can provide an old job code to retrieve prediction results. Your results are for a week after the predicion completion.

If you are a registered user, you can retrieve your predictions from the my jobs page. Results for registered users are available at least for a month after the predicion completion, and will be kept online as long as possible.

Why did you called it MemPype?

MemPype is a PIPEline of predictors for MEMbrane protein localization, and is completely implemented in PYthon (even the web framework, currently using web2py).

Are there limitations in using the server?

This web site is free and open to all users and there is no login requirement. However, to avoid abuses and ensure a fair usage for every user, anonymous users are allowed for a maximum of 10 requests (one sequence per request) per hour. A registration procedure is also available. Registered users are allowed to submit up to 30 request (with up to 5 sequences per request) per hour

If you need faster access to the webserver please contact us.

Why should I register?

MemPype predictions can be accessed without any registration.
However registered users have some benefits:

  • a repository of their job requests
  • results are guaranteed to be online for at least a month
  • up to 5 sequences per job request
  • up to 30 submission per hour
  • queue priority versus anonymous requests
  • All the predictions are free of charge. Registration is intended to mantain a fair usage policy and contact you if necessary.