User Tools

Site Tools


notes:discrete:fall2022:projects:rle1

BACKGROUND

Version 2 of our RLE algorithm.

This time we will be encoding and decoding similar data focusing on more than one byte.


Run Length Encoding (RLE) Data Compression Algorithm

Run-length encoding (RLE) algorithm is a lossless compression of runs of data, where repeated sequences of data are stored as a representation of a single data value and its count (how many times it repeats consecutively). RLE encoding is commonly used in JPEG, TIFF, and PDF files, to name a few examples.

SPECIFICATIONS

Please reference the image below to find the hexadecimal value of the ASCII symbols:

*Our task is to ask questions on Discord or in class and document our findings on this wiki page collaboratively, regarding the functionality of this project.

*For anybody interested in editing the wiki page, here is the dokuwiki user guide: https://www.dokuwiki.org/wiki:syntax#basic_text_formatting -Ash

DATA HEADER SPECIFICATIONS

(mostly the same as rle0)

Header Format:

byte 0: 0x72
byte 1: 0x6c
byte 2: 0x65
byte 3: 0x58
byte 4: 0x20
byte 5: 0x52
byte 6: 0x4c
byte 7: 0x45
byte 8: 0x00 (reserved)
byte 9: 0x02 (version)
byte 10: 0x(stride value) – changes depending on input
byte 11: 0xArgv The length of the source file name, not including NULL terminator
(how many characters in Argv - 1)
byte 12: The name of the source file, not including the NULL terminator

CUSTOMISATION: variable stride

The stride will determine the workings of the the encoding/decoding process. In rle0, the stride was always 1, thus we had ababcc→ 1a1b1a1b2c, where we compressed runs of a single byte. For rle1, the stride given will determine how many bytes are in each run, for example a stride of two will mean we check for runs of two, which conducts as ababcc→ 2ab1cc. Since the strides are chars, they will hold any value between 1-255, and so each of these values should be valid for run length.

PROGRAM

Encode program will take two arguments:

./encode INFILE  STRIDE
 argv[0] argv[1] argv[2]

./encode sample0.txt 2

For ENCODE, the second argument no longer indicates the destination file name, the destination file name is provided by byte 12 of the header ( the name of the original file), and appending .rle at the end.

For ENCODE, the second argument is now a STRIDE indicator, with a min of 0 and max of 255. No default stride, if no stride is indicated, display error. Store stride on byte 10 of the header.

DECODE program will take only one argument, the encoded .rle file:

./decode INFILE 
 argv[0] argv[1]

./decode sample0.txt.rle

The decoder should be able to read the header and find out the original filename. Decode will output a file with the name of the original file, without the .rle.

The decoder should be able to read the header and find out the version of the encoded file. Depending on the version it should decode the provided file accordingly, making it backward-compatible.

The decoder should be able to read the header and find out the stride and should decode the provided file accordingly.

OUTPUT SPECIFICATIONS

Just like what we did in rle0, the rle1's encoder should output the original file's length, the encoded file length, followed by the compression ratio. The output is not strict, as long as you have the correct input file length, output file length, and compression ratio. The compression ratio can be calculated by the following equation: 'Output Length / Input Length'. Make sure in your code to set the type of the variable holding the compression ratio to 'double' or 'float'. You may also want to take a look at the C output format specifiers for a prettier output.

Example:

       Input File length: 100 
       Output File length: 40
       Compression ratio: .40 or 40%

VERIFICATION

NOTE Verification using ./check may not work. If this is the case, then run manual checks. To do this, run the sample and use the stride that is in the header of the sample output you are comparing to (Byte 10). The stride used is also in the file name.

To run the check file provided to you when you grabbed the project, run ./check (linux-based system, may vary on different OS).

You can also manually verify by encoding/decoding a file, and checking if it has the same md5sum as the file with or without the .rle extension depending on if you encoded or decoded. To do this, run your encoder/decoder, and enter md5sum (output file), again, linux-based system, may vary on a different OS.

notes/discrete/fall2022/projects/rle1.txt · Last modified: 2022/10/06 15:38 by bolsen1