Corning Community College
CSCS2330 Discrete Structures
To continue to explore the realm of algorithmic encoding/decoding of information, potentially achieving data compression in ideal scenarios, and collaboratively authoring and documenting the project and its specifications.
To assist with consistency across all implementations, data files for use with this project are available on lab46 via the grabit tool. Be sure to obtain it and ensure your implementation properly works with the provided data.
lab46:~/src/SEMESTER/DESIG$ grabit DESIG PROJECT
You will want to go here to edit and fill in the various sections of the document:
Version 2 of our RLE algorithm.
This time we will be encoding and decoding similar data focusing on more than one byte.
Run Length Encoding (RLE) Data Compression Algorithm
Run-length encoding (RLE) algorithm is a lossless compression of runs of data, where repeated sequences of data are stored as a representation of a single data value and its count (how many times it repeats consecutively). RLE encoding is commonly used in JPEG, TIFF, and PDF files, to name a few examples.
Please reference the image below to find the hexadecimal value of the ASCII symbols:
*Our task is to ask questions on Discord or in class and document our findings on this wiki page collaboratively, regarding the functionality of this project.
*For anybody interested in editing the wiki page, here is the dokuwiki user guide: https://www.dokuwiki.org/wiki:syntax#basic_text_formatting -Ash
(mostly the same as rle0)
Header Format:
byte 0: 0x72
byte 1: 0x6c
byte 2: 0x65
byte 3: 0x58
byte 4: 0x20
byte 5: 0x52
byte 6: 0x4c
byte 7: 0x45
byte 8: 0x00 (reserved)
byte 9: 0x02 (version)
byte 10: 0x(stride value) – changes depending on input
byte 11: 0xArgv The length of the source file name, not including NULL terminator
(how many characters in Argv - 1)
byte 12: The name of the source file, not including the NULL terminator
The stride will determine the workings of the the encoding/decoding process. In rle0, the stride was always 1, thus we had ababcc→ 1a1b1a1b2c, where we compressed runs of a single byte. For rle1, the stride given will determine how many bytes are in each run, for example a stride of two will mean we check for runs of two, which conducts as ababcc→ 2ab1cc. Since the strides are chars, they will hold any value between 1-255, and so each of these values should be valid for run length.
Encode program will take two arguments:
./encode INFILE STRIDE argv[0] argv[1] argv[2] ./encode sample0.txt 2
For ENCODE, the second argument no longer indicates the destination file name, the destination file name is provided by byte 12 of the header ( the name of the original file), and appending .rle
at the end.
For ENCODE, the second argument is now a STRIDE indicator, with a min of 0 and max of 255. No default stride, if no stride is indicated, display error. Store stride on byte 10 of the header.
DECODE program will take only one argument, the encoded .rle
file:
./decode INFILE argv[0] argv[1] ./decode sample0.txt.rle
The decoder should be able to read the header and find out the original filename. Decode will output a file with the name of the original file, without the .rle
.
The decoder should be able to read the header and find out the version of the encoded file. Depending on the version it should decode the provided file accordingly, making it backward-compatible.
The decoder should be able to read the header and find out the stride and should decode the provided file accordingly.
Just like what we did in rle0, the rle1's encoder should output the original file's length, the encoded file length, followed by the compression ratio. The output is not strict, as long as you have the correct input file length, output file length, and compression ratio. The compression ratio can be calculated by the following equation: 'Output Length / Input Length'. Make sure in your code to set the type of the variable holding the compression ratio to 'double' or 'float'. You may also want to take a look at the C output format specifiers for a prettier output.
Example:
Input File length: 100 Output File length: 40 Compression ratio: .40 or 40%
NOTE Verification using ./check may not work. If this is the case, then run manual checks. To do this, run the sample and use the stride that is in the header of the sample output you are comparing to (Byte 10). The stride used is also in the file name.
To run the check file provided to you when you grabbed the project, run ./check (linux-based system, may vary on different OS).
You can also manually verify by encoding/decoding a file, and checking if it has the same md5sum as the file with or without the .rle extension depending on if you encoded or decoded. To do this, run your encoder/decoder, and enter md5sum (output file), again, linux-based system, may vary on a different OS.
To be successful in this project, the following criteria (or their equivalent) must be met:
Let's say you have completed work on the project, and are ready to submit, you would do the following (assuming you have a program called uom0.c):
lab46:~/src/SEMESTER/DESIG/PROJECT$ make submit
You should get some sort of confirmation indicating successful submission if all went according to plan. If not, check for typos and or locational mismatches.
I'll be evaluating the project based on the following criteria:
117:rle1:final tally of results (117/117) *:rle1:used grabit to obtain project by the Sunday prior to duedate [13/13] *:rle1:clean compile, no compiler messages [13/13] *:rle1:implementation passes verification tests [26/26] *:rle1:adequate modifications to code from template [26/26] *:rle1:program operations conform to project specifications [26/26] *:rle1:code tracked in lab46 semester repo [13/13]