Table of Contents

BACKGROUND

Run Length Encoding (RLE) Data Compression Algorithm

Run-length encoding (RLE) algorithm is a lossless compression of runs of data, where repeated sequences of data are stored as a representation of a single data value and its count (how many times it repeats consecutively). RLE encoding is commonly used in JPEG, TIFF, and PDF files, to name a few examples.

For anybody interested in editing the wiki page, here is the dokuwiki user guide: https://www.dokuwiki.org/wiki:syntax#basic_text_formatting -Ash

ALGORITHM: RLE

The RLE encoding algorithm first takes a file that contains a run of data in the following way:

Original:

aaaabbcdefgggg

Suppose this data sequence is contained within a .txt file.

The RLE algorithm will encode this data by replacing the repeated characters with a count number and a single value.

4a2b1c1d1e1f4g

There are 4 a's (0x61), 2 b's(0x62), 1 c(0x63), 1 d(0x64), 1 e(0x65), 1 f (0x66), and 5 g's (0x67).

Encoded:

04 61 02 62 01 63 01 64 01 65 01 66 05 67
^  ^  ^  ^  ^  ^  ^  ^  ^  ^  ^  ^  ^  ^
4  a  2  b  1  c  1  d  1  e  1  f  5  g

NOTE: While this example is in readable characters in our eyes, outside of sample0 in the project other samples will not be as readable to you and I. However, rest be assured that the same algorithm will still apply throughout all the samples. The program's job is still to identify the characters and the amount of characters and output that in hex to the output file in the case of the encoder.

Please reference the image below to find the hexadecimal value of the ASCII symbols:

NOTE: Use chars as they will give a hex-byte value as opposed to any kind of string or integer values that they may be converted to from other data types.

SPECIFICATIONS

Our task is to ask questions on Discord or in class, document our findings on this wiki page, and implement the RLE encoding/decoding algorithms in our own encoder and decoder programs. The output of the encoder should be in hex. In order to see if it does use either xxd in lab46 or xxd within vim. You can also use your own program or command if you have some other way you prefer to view the files in hex. The encoder should be able to encode all characters according to the algorithm. This includes any newlines that you might see.

To use xxd within lab46:

USERNAME@lab46:~$ xxd FILENAME

To use xxd inside vim:

:%!xxd

The decoder should be able to take the output of the encoder and give back the file initially inputted to the encoder. The decoder should make use of the header to find the filename alongside the contents of the file to decode the file. Furthermore, the decoder must remove the header after finding necessary data in the input file (i.e the filename) and write only the decoded data to the new file.

NOTE: The encoder and decoder are two separate programs.

While fetching data from the source file, use the function feof() from stdlib.h to ensure the end of file has been found; This will help you avoid any false positives resulting in unpredictable behavior.

Example:

while ( !feof( in ) )
{
     data = fgetc( in );
     fprintf ( out, "%c", data);
}

DATA HEADER SPECIFICATIONS

Header Format:

byte 0: 0x72
byte 1: 0x6c
byte 2: 0x65
byte 3: 0x58
byte 4: 0x20
byte 5: 0x52
byte 6: 0x4c
byte 7: 0x45
byte 8: 0x00 (reserved)
byte 9: 0x01 (version)
byte 10: 0x01 (stride value)
byte 11: 0xArgv The length of the source file name, not including NULL terminator
(how many characters in Argv - 1)
byte 12: The name of the source file, not including the NULL terminator

Info

Byte 8 - Reserved for a future project.
Byte 9 - Version number, this week is version 1.
Byte A - Width of comparison. 1 byte for this week's project.
Byte B - 'bob' would be 3.

In the case of 'bob', byte C would be 0x62('b'), byte D would be 0x6F('o'), and byte E would be 0x62('b').

PROGRAM

Encode program will take two arguments:

./encode INFILE  OUTFILE
 argv[0] argv[1] argv[2]

./encode sample0.txt sample0.txt.rle

Encoder will output a file with a name equivalent to the second argument.

Decode program will take two arguments:

./decode INFILE OUTFILE
 argv[0] argv[1] argv[2]

./decode sample0.txt.rle sample0.txt

The decoder should be able to read the header and find out the filename if a second argument is not given. If a second argument is given it will use the second argument instead of taking the filename from the header. Decode will output a file of similar name without the .rle extension after running your decoder.

OUTPUT SPECIFICATIONS

Program should be able to encode a file such as sample0.txt and successfully output sample0.txt.rle in a similar manner to the file given to you when you initially grabbed the project. Checksums should be identical to each other. Similar to encoding, decoding should be identical with an input of sample0.txt.rle for example, it should output a file sample0.txt with a checksum equivalent to that of sample0.txt when you first grabbed the project.

Output some data after a successful run of the program with information of how many characters were encoded and how many characters were printed into the output file with a comparison percentage in regards to file compression.

Encoded: xxx bytes
Decoded: xxx bytes
File compression: %

VERIFICATION

All programs should be able to both encode/decode with interoperability in mind such that person_a's decoder must be able to decode a file that has been encoded by person_b's encoder, and vise versa.

eg.

some_file = "Hello World"
person_a -> encode(some_file)
person_b -> decode(some_file)

print(some_file)

>> Hello World

Verify your program's capabilities by running make check as long as you have the makefile and verify file in your rle0 directory.

NOTE: This is not the full make check output, this is pending a full successful make check so if one is achieved feel free to delete this. Verification adds the in/ to the input file name argument, as well as the out/ for the output file name argument.

USERNAME@lab46:~/src/fall2022/rle0$ make check
=================================================
= PHASE 0: Raw -> Encode data verification test =
=================================================
in/sample0.txt -> o0/sample0.txt.rle: OK
in/sample1.txt -> o0/sample1.txt.rle: OK
in/sample2.bmp -> o0/sample2.bmp.rle: OK
in/sample3.wav -> o0/sample3.wav.rle: OK

=================================================
= PHASE 1: Decode -> Raw data verification test =
=================================================
in/sample0.txt.rle -> o1/sample0.txt: OK
in/sample1.txt.rle -> o1/sample1.txt: OK
in/sample2.bmp.rle -> o1/sample2.bmp: OK
in/sample3.wav.rle -> o1/sample3.wav: OK

================================================
= PHASE 2: Raw -> Encode -> Decode -> Raw test =
================================================
in/sample0.txt -> m2/sample0.txt.rle -> o2/sample0.txt: OK
in/sample1.txt -> m2/sample1.txt.rle -> o2/sample1.txt: OK
in/sample2.bmp -> m2/sample2.bmp.rle -> o2/sample2.bmp: OK
in/sample3.wav -> m2/sample3.wav.rle -> o2/sample3.wav: OK

=============================================
= PHASE 3: Decode -> Raw -> Encode Raw test =
=============================================
in/sample0.txt.rle -> m3/sample0.txt -> o3/sample0.txt.rle: OK
in/sample1.txt.rle -> m3/sample1.txt -> o3/sample1.txt.rle: OK
in/sample2.bmp.rle -> m3/sample2.bmp -> o3/sample2.bmp.rle: OK
in/sample3.wav.rle -> m3/sample3.wav -> o3/sample3.wav.rle: OK