Corning Community College
CSCS2330 Discrete Structures
To explore the realm of algorithmic encoding/decoding of information, potentially achieving data compression in ideal scenarios, and collaboratively authoring and documenting the project and its specifications.
To assist with consistency across all implementations, data files for use with this project are available on lab46 via the grabit tool. Be sure to obtain it and ensure your implementation properly works with the provided data.
lab46:~/src/SEMESTER/DESIG$ grabit DESIG PROJECT
You will want to go here to edit and fill in the various sections of the document:
Run Length Encoding (RLE) Data Compression Algorithm
Run-length encoding (RLE) algorithm is a lossless compression of runs of data, where repeated sequences of data are stored as a representation of a single data value and its count (how many times it repeats consecutively). RLE encoding is commonly used in JPEG, TIFF, and PDF files, to name a few examples.
For anybody interested in editing the wiki page, here is the dokuwiki user guide: https://www.dokuwiki.org/wiki:syntax#basic_text_formatting -Ash
The RLE encoding algorithm first takes a file that contains a run of data in the following way:
Original:
aaaabbcdefgggg
Suppose this data sequence is contained within a .txt file.
The RLE algorithm will encode this data by replacing the repeated characters with a count number and a single value.
4a2b1c1d1e1f4g
There are 4 a's (0x61), 2 b's(0x62), 1 c(0x63), 1 d(0x64), 1 e(0x65), 1 f (0x66), and 5 g's (0x67).
Encoded:
04 61 02 62 01 63 01 64 01 65 01 66 05 67 ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ 4 a 2 b 1 c 1 d 1 e 1 f 5 g
NOTE: While this example is in readable characters in our eyes, outside of sample0 in the project other samples will not be as readable to you and I. However, rest be assured that the same algorithm will still apply throughout all the samples. The program's job is still to identify the characters and the amount of characters and output that in hex to the output file in the case of the encoder.
Please reference the image below to find the hexadecimal value of the ASCII symbols:
NOTE: Use chars as they will give a hex-byte value as opposed to any kind of string or integer values that they may be converted to from other data types.
Our task is to ask questions on Discord or in class, document our findings on this wiki page, and implement the RLE encoding/decoding algorithms in our own encoder and decoder programs. The output of the encoder should be in hex. In order to see if it does use either xxd in lab46 or xxd within vim. You can also use your own program or command if you have some other way you prefer to view the files in hex. The encoder should be able to encode all characters according to the algorithm. This includes any newlines that you might see.
To use xxd within lab46:
USERNAME@lab46:~$ xxd FILENAME
To use xxd inside vim:
:%!xxd
The decoder should be able to take the output of the encoder and give back the file initially inputted to the encoder. The decoder should make use of the header to find the filename alongside the contents of the file to decode the file. Furthermore, the decoder must remove the header after finding necessary data in the input file (i.e the filename) and write only the decoded data to the new file.
NOTE: The encoder and decoder are two separate programs.
While fetching data from the source file, use the function feof() from stdlib.h to ensure the end of file has been found; This will help you avoid any false positives resulting in unpredictable behavior.
Example:
while ( !feof( in ) ) { data = fgetc( in ); fprintf ( out, "%c", data); }
Header Format:
byte 0: 0x72
byte 1: 0x6c
byte 2: 0x65
byte 3: 0x58
byte 4: 0x20
byte 5: 0x52
byte 6: 0x4c
byte 7: 0x45
byte 8: 0x00 (reserved)
byte 9: 0x01 (version)
byte 10: 0x01 (stride value)
byte 11: 0xArgv The length of the source file name, not including NULL terminator
(how many characters in Argv - 1)
byte 12: The name of the source file, not including the NULL terminator
Byte 8 - Reserved for a future project.
Byte 9 - Version number, this week is version 1.
Byte A - Width of comparison. 1 byte for this week's project.
Byte B - 'bob' would be 3.
In the case of 'bob', byte C would be 0x62('b'), byte D would be 0x6F('o'), and byte E would be 0x62('b').
Encode program will take two arguments:
./encode INFILE OUTFILE argv[0] argv[1] argv[2] ./encode sample0.txt sample0.txt.rle
Encoder will output a file with a name equivalent to the second argument.
Decode program will take two arguments:
./decode INFILE OUTFILE argv[0] argv[1] argv[2] ./decode sample0.txt.rle sample0.txt
The decoder should be able to read the header and find out the filename if a second argument is not given. If a second argument is given it will use the second argument instead of taking the filename from the header. Decode will output a file of similar name without the .rle extension after running your decoder.
Program should be able to encode a file such as sample0.txt and successfully output sample0.txt.rle in a similar manner to the file given to you when you initially grabbed the project. Checksums should be identical to each other. Similar to encoding, decoding should be identical with an input of sample0.txt.rle for example, it should output a file sample0.txt with a checksum equivalent to that of sample0.txt when you first grabbed the project.
Output some data after a successful run of the program with information of how many characters were encoded and how many characters were printed into the output file with a comparison percentage in regards to file compression.
Encoded: xxx bytes Decoded: xxx bytes File compression: %
All programs should be able to both encode/decode with interoperability in mind such that person_a's decoder must be able to decode a file that has been encoded by person_b's encoder, and vise versa.
eg.
some_file = "Hello World" person_a -> encode(some_file) person_b -> decode(some_file) print(some_file) >> Hello World
Verify your program's capabilities by running make check as long as you have the makefile and verify file in your rle0 directory.
NOTE: This is not the full make check output, this is pending a full successful make check so if one is achieved feel free to delete this. Verification adds the in/ to the input file name argument, as well as the out/ for the output file name argument.
USERNAME@lab46:~/src/fall2022/rle0$ make check ================================================= = PHASE 0: Raw -> Encode data verification test = ================================================= in/sample0.txt -> o0/sample0.txt.rle: OK in/sample1.txt -> o0/sample1.txt.rle: OK in/sample2.bmp -> o0/sample2.bmp.rle: OK in/sample3.wav -> o0/sample3.wav.rle: OK ================================================= = PHASE 1: Decode -> Raw data verification test = ================================================= in/sample0.txt.rle -> o1/sample0.txt: OK in/sample1.txt.rle -> o1/sample1.txt: OK in/sample2.bmp.rle -> o1/sample2.bmp: OK in/sample3.wav.rle -> o1/sample3.wav: OK ================================================ = PHASE 2: Raw -> Encode -> Decode -> Raw test = ================================================ in/sample0.txt -> m2/sample0.txt.rle -> o2/sample0.txt: OK in/sample1.txt -> m2/sample1.txt.rle -> o2/sample1.txt: OK in/sample2.bmp -> m2/sample2.bmp.rle -> o2/sample2.bmp: OK in/sample3.wav -> m2/sample3.wav.rle -> o2/sample3.wav: OK ============================================= = PHASE 3: Decode -> Raw -> Encode Raw test = ============================================= in/sample0.txt.rle -> m3/sample0.txt -> o3/sample0.txt.rle: OK in/sample1.txt.rle -> m3/sample1.txt -> o3/sample1.txt.rle: OK in/sample2.bmp.rle -> m3/sample2.bmp -> o3/sample2.bmp.rle: OK in/sample3.wav.rle -> m3/sample3.wav -> o3/sample3.wav.rle: OK
To be successful in this project, the following criteria (or their equivalent) must be met:
Let's say you have completed work on the project, and are ready to submit, you would do the following (assuming you have a program called uom0.c):
lab46:~/src/SEMESTER/DESIG/PROJECT$ make submit
You should get some sort of confirmation indicating successful submission if all went according to plan. If not, check for typos and or locational mismatches.
I'll be evaluating the project based on the following criteria:
78:rle0:final tally of results (78/78) *:rle0:used grabit to obtain project by the Sunday prior to duedate [13/13] *:rle0:clean compile, no compiler messages [13/13] *:rle0:implementation passes verification tests [13/13] *:rle0:adequate modifications to code from template [13/13] *:rle0:program operations conform to project specifications [13/13] *:rle0:code tracked in lab46 semester repo [13/13]