This is an old revision of the document!
Run Length Encoding (RLE) Data Compression Algorithm
Run-length encoding (RLE) algorithm is a lossless compression of runs of data, where repeated sequences of data are stored as a representation of a single data value and its count (how many times it repeats consecutively). RLE encoding is commonly used in JPEG, TIFF, and PDF files, to name a few examples.
For anybody interested in editing the wiki page, here is the dokuwiki user guide: https://www.dokuwiki.org/wiki:syntax#basic_text_formatting -Ash
The RLE encoding algorithm first takes a file that contains a run of data in the following way:
Original:
aaaabbcdefgggg
Suppose this data sequence is contained within a .txt file.
The RLE algorithm will encode this data by replacing the repeated characters with a count number and a single value.
4a2b1c1d1e1f5g
There are 4 a's (0x61), 2 b's(0x62), 1 c(0x63), 1 d(0x64), 1 e(0x65), 1 f (0x66), and 5 g's (0x67).
Encoded:
04 61 02 62 01 63 01 64 01 65 01 66 05 67 ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ 4 a 2 b 1 c 1 d 1 e 1 f 5 g
Our task is to ask questions on Discord or in class, document our findings on this wiki page, and implement the RLE encoding/decoding algorithms in our own encoder and decoder programs.
Header Format:
byte 0: 0x72
byte 1: 0x6c
byte 2: 0x65
byte 3: 0x58
byte 4: 0x20
byte 5: 0x52
byte 6: 0x4c
byte 7: 0x45
byte 8: 0x00 (reserved)
byte 9: 0x01 (version)
byte 10: 0x01 (stride value)
byte 11: 0xArgv source file name length (not including NULL terminator)
byte 12: source file name + through byte 11 value minus 1
Byte 8 - Reserved for a future project. Byte 9 - Version number, this week is version 1. Byte A - Width of comparison. 1 byte for this week's project. Byte B - 'bob' would be 3.
In the case of 'bob', byte C would be 0x62('b'), byte D would be 0x6F('o'), and byte E would be 0x62('b').
encode program will take two arguments:
./encode INFILE OUTFILE argv[0] argv[1] argv[2] ./encode sample0.txt sample0.txt.rle
decode program will take one argument:
./decode INFILE argv[0] arfv[1] ./decode sample0.txt.rle
All programs should be able to both encode/decode with interoperability in mind such that person_a's decoder must be able to decode a file that has been encoded by person_b's encoder, and vise versa.
eg.
some_file = "Hello World" person_a -> encode(some_file) person_b -> decode(some_file) print(some_file) >> Hello World