User Tools

Site Tools


notes:discrete:fall2022:projects:rle2

BACKGROUND

Version 3 of our RLE algorithm.

This time we will be focusing on conditional encoding and decoding of runs of similar data greater than one.


Run Length Encoding (RLE) Data Compression Algorithm

Run-length encoding (RLE) algorithm is a lossless compression of runs of data, where repeated sequences of data are stored as a representation of a single data value and its count (how many times it repeats consecutively). RLE encoding is commonly used in JPEG, TIFF, and PDF files, to name a few examples.

SPECIFICATIONS

If a control byte is present inside your data but does not include encoded data with it, you will have to make modifications to your encode/decode process. If this happens during your encoder, the output should be: control_byte 01 01 control_byte. Your decoder should be able to handle if this happens and properly decode that as “control_byte” with control byte being whatever control byte belongs to that file.

REFERENCES

STRIDES and CONTROL BYTES for given files:
sample0.txt: Stride: 1, Control Byte: 0x62(B)
sample1.txt: Stride: 30, Control Byte: 0xff(255)
sample2.bmp: Stride: 6, Control Byte: 0x25(37)
sample3.wav: Stride: 73, Control Byte: 0x17(23)
sample5.txt: Stride: 4, Control Byte: 0x29(41)

Please reference the image below to find the hexadecimal value of the ASCII symbols:

*Our task is to ask questions on Discord or in class and document our findings on this wiki page collaboratively, regarding the functionality of this project.

*For anybody interested in editing the wiki page, here is the dokuwiki user guide: https://www.dokuwiki.org/wiki:syntax#basic_text_formatting -Ash

OUTPUT SPECIFICATIONS

Determine a unified means of output so that all submissions have an identical format.

PROGRAM

Arguments

./encode INFILE  OUTPATH STRIDE  CONTROL
 argv[0] argv[1] argv[2] argv[3] argv[4]

./encode sample0.txt some/other/directory/ 2 5
./decode INFILE  OUTPATH
 argv[0] argv[1] argv[2]

./decode sample0.txt some/other/directory/

NOTE Control value is input as a decimal 0-255. When running encode for this project, the control value should be input as '61' for a, '62' for b, etc.. So, in order to do this, you will need to do some conversion within rle2 in order to convert the 61 decimal into hexadecimal. One way to do this is possibly sscanf. The same is true of the stride.

NOTE To convert a string argument into a usable decimal, set your desired variable equal to atoi() and put the string argument you wish to convert in the function. This works with unsigned chars as well as ints. If you atoi to an unsigned char, it is easy to put it into the header array with a simple assignment.

Explanation

Input-file→the name of the file that is being encoded/decoded

Outpath→The path to the output file NOT including the filename (ie '~/rle2/x.rle' → '~/rle/')

Stride→How long the chain of bundled characters is (ie. 1ab is 2, 1abc is 3, etc)

Control→A value chosen to signal that the following bits are compressed, and thus will need to be decoded. Preferably the least common bit, as false positives are a possibility

DATA HEADER SPECIFICATIONS

(mostly the same as rle0)

Header Format:

byte 0: 0x72
byte 1: 0x6c
byte 2: 0x65
byte 3: 0x58
byte 4: 0x20
byte 5: 0x52
byte 6: 0x4c
byte 7: 0x45
byte 8: 0x(control byte)
byte 9: 0x03 (version)
byte 10: 0x(stride value) – changes depending on input
byte 11: 0xArgv The length of the source file name, not including NULL terminator
(how many characters in Argv - 1)
byte 12: The name of the source file, not including the NULL terminator

EXAMPLES

VERIFICATION

Eval script is inside, however, it doesn't seem to work and is seg faulting when ran. To manually verify, you can check the checksums of your output files against the given output file. Example: ./decode sample0.txt.rle should output the same checksum as sample0.txt inside the data file. You can do this for all given files.

Derive a set of tests that all submissions should perform to ascertain correctness (state the tests, the inputs, and the expected outputs). In conjunction with conforming output specifications, all submissions should match (this is the basis for writing a verification script that can automate the process).

Which, being said: once output specifications and verification tests have been established, anyone writing a verification script to automate this can be eligible to receive bonus points.

PSEUDOCODE

notes/discrete/fall2022/projects/rle2.txt · Last modified: 2022/11/03 02:53 by dkienenb