Single molecule peptide sequencing

One of the greatest breakthroughs of the past decade was the development of the next generation sequencing. Sequencing of DNA of course. It is relatively easy to sequence DNA  – the polymerase is doing it for you – simply add fluorescently labeled nucleotides. For RNA sequencing, we simply convert it into DNA. We now even have a method for in situ sequencing of RNA. But proteins pose a challenge. Now, maybe, this challenge can be overcome with a new-old method to sequence peptides.

There are no protein polymerases that can “read” the amino-acid sequence of a protein, and copy it with labeled amino-acids of our choice for easy detection; or a reverse ribosome that translates a protein sequence into RNA [developing such  proteins through synthbio will be very useful!].

So, biochemists turn to (bio)chemistry. One can detect a particular protein out of the total proteome through its enzymatic activity, or by immunodetection using an antibody that detects a specific protein. The Edman reaction (published in 1950) can be used to accurately determine the amino-acid sequence of any protein, providing it is pure, and that you can spare several days to do that (it takes ~1-1.5 hour per amino-acid).  This, of course, is useful when studying one or a few proteins that are known in advance. To detect unknown proteins in a sample, and to get a full proteome map, we turn to mass spectrometry (MS): the proteins are chopped into peptides, ionized and their mass and other parameters are measured. Based on the unique characteristics of the peptide, the composition of amino-acids is known and a search through the database finds a match.  Using SILAC and similar methods can also provide some comparative quantitative data. MS is mostly used for bulk analysis of proteins of whole cell populations, although recent advances are getting us closer to single-cell proteomics.

The new approach to sequence peptides, developed in Edward Marcotte’s lab and published in Nature Biotechnology,  offers a combination of old and new. The old: they are using Edman reaction to sequence the peptides. The new: they do it on glass, use fluorescent labeling and TIRF imaging to get parallel sequencing of hundreds to millions of molecules.

peptide seq

The peptide sequencing pipeline. Adapted from Swaminathan et al. (2018).


As can be seen in the figure above, the proteins are digested to short peptides and the cysteines & lysines are labeled with a fluoreophore (they screened 25 dyes to find dyes that can survive the Edman reaction and repeated imaging). The peptides are immobilized to the glass surface. The Edman reaction is performed in the microscope chamber (adapted to withstand the harsh chemicals), and the molecules are imaged after each cycle, which removes a single amino acid from the N-terminus. The N and C-terminus are known (since the digestion can be amino-acid specific), and Cys & lys positions are determined based on the cycle that reduced fluorescence. Thus, we get a partial sequence that, they claim, is sufficient for identification of the protein source in most cases (depends on the peptide, and the protein source, of course).

Another cool feature is that this can be performed on a very low number of molecules (they show for zeptomole amount of peptides – that’s just a few hundred molecules!). So, in principle, this can detect, and quantify, a very minute amount of protein. Certainly single cell, probably a single large protein complex  – this can be a very useful way to get accurate analysis of variation in single ribosome composition from the same cell, for instance.

Unfortunately, in this paper they analyze a mixture of only three different peptides, or show how a partial sequence of a single peptide can be identified through database search of the proteome of a simple organism. They hope to add at least one or two more colors to the system to get more accurate sequences.

If this technology gets developed, this can really revolutionize proteomics, just like Next Generation Sequencing has revolutionized genomics and transcriptomics.




Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s