Search: in
Byte pair encoding
Byte pair encoding in Encyclopedia Encyclopedia
  Tutorials     Encyclopedia     Videos     Books     Software     DVDs  
       





Byte pair encoding

Byte pair encoding[1] or digram coding[2] is a simple form of data compression in which the most common pair of consecutive bytes of data is replaced with a byte that does not occur within that data. A table of the replacements is required to rebuild the original data. The algorithm was first described publicly by Philip Gage in a February 1994 article "A New Algorithm for Data Compression" in the C Users Journal.[3]

Byte pair encoding example

Suppose we wanted to encode the data

aaabdaaabac 

The byte pair "aa" occurs most often, so it will be replaced by a byte that is not used in the data, "Z". Now we have the following data and replacement table:

ZabdZabac Z=aa 

Then we repeat the process with byte pair "ab", replacing it with Y:

ZYdZYac Y=ab Z=aa 

We could stop here, as the only literal byte pair left occurs only once. Or we could continue the process and use recursive byte pair encoding, replacing "ZY" with "X":

XdXac X=ZY Y=ab Z=aa 

This data cannot be compressed further by byte pair encoding because there are no pairs of bytes that occur more than once.

To decompress the data, simply perform the replacements in the reverse order.

References

ar: ja:BPE zh:






Source: Wikipedia | The above article is available under the GNU FDL. | Edit this article



Search for Byte pair encoding in Tutorials
Search for Byte pair encoding in Encyclopedia
Search for Byte pair encoding in Videos
Search for Byte pair encoding in Books
Search for Byte pair encoding in Software
Search for Byte pair encoding in DVDs
Search for Byte pair encoding in Store




Advertisement




Byte pair encoding in Encyclopedia
Byte_pair_encoding top Byte_pair_encoding

Home - Add TutorGig to Your Site - Disclaimer

©2011-2013 TutorGig.info All Rights Reserved. Privacy Statement