This is an old revision of the document!
Run Length Encoding (RLE) Data Compression Algorithm
Run-length encoding (RLE) algorithm is a lossless compression of runs of data, where repeated sequences of data are stored as a representation of a single data value and its count (how many times it repeats consecutively). RLE encoding is commonly used in JPEG, TIFF, and PDF files, to name a few examples.
For anybody interested in editing the wiki page, here is the dokuwiki user guide: https://www.dokuwiki.org/wiki:syntax#basic_text_formatting -Ash
The RLE encoding algorithm first takes a file that contains a run of data in the following way:
Original:
aaaabbcdefgggg
Suppose this data sequence is contained within a .txt file.
The RLE algorithm will encode this data by replacing the repeated characters with a count number and a single value.
4a2b1c1d1e1f5g
There are 4 a's (0x61), 2 b's(0x62), 1 c(0x63), 1 d(0x64), 1 e(0x65), 1 f (0x66), and 5 g's (0x67).
Encoded:
04 61 02 62 01 63 01 64 01 65 01 66 05 67 ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ 4 a 2 b 1 c 1 d 1 e 1 f 5 g
Please reference the image below to find the hexadecimal value of the ASCII symbols:
Our task is to ask questions on Discord or in class, document our findings on this wiki page, and implement the RLE encoding/decoding algorithms in our own encoder and decoder programs.
While fetching data from the source file, use the function feof() from stdlib.h to ensure the end of file has been found; This will help you avoid any false positives resulting in unpredictable behavior.
Example:
while ( !feof( in ) ) { data = fgetc( in ); fprintf ( out, "%c", data); }
Header Format:
byte 0: 0x72
byte 1: 0x6c
byte 2: 0x65
byte 3: 0x58
byte 4: 0x20
byte 5: 0x52
byte 6: 0x4c
byte 7: 0x45
byte 8: 0x00 (reserved)
byte 9: 0x01 (version)
byte 10: 0x01 (stride value)
byte 11: 0xArgv The length of the source file name, not including NULL terminator
(how many characters in Argv - 1)
byte 12: The name of the source file, not including the NULL terminator
Byte 8 - Reserved for a future project.
Byte 9 - Version number, this week is version 1.
Byte A - Width of comparison. 1 byte for this week's project.
Byte B - 'bob' would be 3.
In the case of 'bob', byte C would be 0x62('b'), byte D would be 0x6F('o'), and byte E would be 0x62('b').
Encode program will take two arguments:
./encode INFILE OUTFILE argv[0] argv[1] argv[2] ./encode sample0.txt sample0.txt.rle
Encoder will output a file with a name equivalent to the second argument.
Decode program will take one argument:
./decode INFILE argv[0] arfv[1] ./decode sample0.txt.rle
Decode will output a file of similar name without the .rle extension after running your decoder.
Program should be able to encode a file such as sample0.txt and successfully output sample0.txt.rle in a similar manner to the file given to you when you initially grabbed the project. Checksums should be identical to each other. Similar to encoding, decoding should be identical with an input of sample0.txt.rle for example, it should output a file sample0.txt with a checksum equivalent to that of sample0.txt when you first grabbed the project.
Output some data after a successful run of the program with information of how many characters were encoded and how many characters were printed into the output file with a comparison percentage in regards to file compression.
Encoded: xxx bytes Decoded: xxx bytes File compression: %
All programs should be able to both encode/decode with interoperability in mind such that person_a's decoder must be able to decode a file that has been encoded by person_b's encoder, and vise versa.
eg.
some_file = "Hello World" person_a -> encode(some_file) person_b -> decode(some_file) print(some_file) >> Hello World