Introduction

Peptide sequences derived from MS/MS spectra, whether via database searching, spectral library matching, or de novo sequence analysis, need to be mapped to the reference proteome in order to determine the protein content of the sample being analyzed.

Proteotypic peptides are of special interest in confidently identifying proteoforms as well as for generating assays for targeted experiments such as SRM. Naturally occurring variants in protein sequences exacerbate this mapping issue by increasing the likelihood that a given peptide sequence is shared among different protein forms.

While the nascent PEFF format allows for the representation of such variants, software is needed to efficiently map observed sequences to all possible variants.

ProteoMapper is a set of software tools to perform this mapping.

Publication

Technical Note in Journal of Proteome Research : Mendoza L. et.al: J. Proteome Res. DOI: 10.1021/acs.jproteome.8b00544

Basic Information

There are two components to ProteoMapper: an indexer ("clips"), and a mapper ("promast"). A protein sequence database in either FASTA or PEFF format must first be indexed by the indexer. Once the index is built, the mapper can quickly and efficiently map all locations of the input peptide sequence(s) to the proteome. Multiple parallel indices are supported, and input can be in the form of a pepXML file, a simple text file with peptide sequences, or a single sequence via the command-line. There are also options to map using wildcards as well as fuzzy mapping (where one or more amino acids and their positions within the peptide sequence are unknown).

More information can be found in our published technical note (see above), in the poster presented at the 66th ASMS Conference in San Diego: Fast and Efficient Mapping of Peptide Sequences and their Variants to Proteome Databases Using Full Inverted Indices, as well as in this early presentation (pptx).

Availability

ProteoMapper can be downloaded and run locally (see below).

It is a standard component of the Trans-Proteomic Pipeline (TPP), as of version 5.2.0.

An online version of this tool performs a mapping of an input peptide sequence or list of sequences to databases of various species used by PeptideAtlas.

A web services API is also available at PeptideAtlas.

Download

The software is open-source and freely available. It is written in the perl language, and other than that, no other requirements are needed. However, the XML::Parser and XML::Twig modules are required for reading and updating pepXML files.

The following .zip file contains the indexer, clips (Create Lookup Index of Protein Segments), and the mapper, promast (PROtein MApping and Search Tool).

Usage

You can get a full usage statement and options by simply typing the name of the command on the command-line, and hitting [ return ].

Note: depending on your local system set-up, you may need to adjust the first line of each of these programs to point to the correct path to perl.

Indexing

To index a protein database:
clips.pl myproteinfile.fasta
To index using a segment size of 6, and excluding PEFF variants:
clips.pl -s 6 -V anotherfile.peff

Mapping

Note: all examples below assume that an index file has already been generated (i.e. myproteinfile.fasta.pep.idx)

To find the mapping of a specific peptide sequence (e.g. PEPTIDER):
promast.pl myproteinfile.fasta PEPTIDER
To get original sequence context in output:
promast.pl -c myproteinfile.fasta PEPTIDER
Read input from a text file (one peptide per line):
promast.pl myproteinfile.fasta mypeptidelist.txt
Fuzzy mapping with 3 unknown amino acids, leaving out unmappable sequences, and restricting results to those that have a mass +/- 0.1 Daltons from the input sequence:
promast.pl -f 3 -U -m 0.1 myproteinfile.fasta PEPTIDER

Tutorials





Last update: 3 March 2023, Luis Mendoza