Table of Contents

Corning Community College

CSCS2330 Discrete Structures

PROJECT: Run-Length Encoding (RLE2)

OBJECTIVE

To continue to explore the realm of algorithmic encoding/decoding of information, potentially achieving data compression in ideal scenarios, and collaboratively authoring and documenting the project and its specifications.

OVERVIEW

In rle0, you implemented the fixed-encoding, single byte stride.

In rle1, you implemented the fixed-encoding, variable but consistent throughout encoding byte stride.

In rle2, we explore the implementation of a control byte, one byte out of the 256 available bytes that will be used to kick off a conditional encoding sequence of a variable but consistent throughout encoding byte stride.

GRABIT

To assist with consistency across all implementations, data files for use with this project are available on lab46 via the grabit tool. Be sure to obtain it and ensure your implementation properly works with the provided data.

lab46:~/src/SEMESTER/DESIG$ grabit DESIG PROJECT

EDIT

You will want to go here to edit and fill in the various sections of the document:

BACKGROUND

Version 3 of our RLE algorithm.

This time we will be focusing on conditional encoding and decoding of runs of similar data greater than one.


Run Length Encoding (RLE) Data Compression Algorithm

Run-length encoding (RLE) algorithm is a lossless compression of runs of data, where repeated sequences of data are stored as a representation of a single data value and its count (how many times it repeats consecutively). RLE encoding is commonly used in JPEG, TIFF, and PDF files, to name a few examples.

SPECIFICATIONS

If a control byte is present inside your data but does not include encoded data with it, you will have to make modifications to your encode/decode process. If this happens during your encoder, the output should be: control_byte 01 01 control_byte. Your decoder should be able to handle if this happens and properly decode that as “control_byte” with control byte being whatever control byte belongs to that file.

REFERENCES

STRIDES and CONTROL BYTES for given files:
sample0.txt: Stride: 1, Control Byte: 0x62(B)
sample1.txt: Stride: 30, Control Byte: 0xff(255)
sample2.bmp: Stride: 6, Control Byte: 0x25(37)
sample3.wav: Stride: 73, Control Byte: 0x17(23)
sample5.txt: Stride: 4, Control Byte: 0x29(41)

Please reference the image below to find the hexadecimal value of the ASCII symbols:

*Our task is to ask questions on Discord or in class and document our findings on this wiki page collaboratively, regarding the functionality of this project.

*For anybody interested in editing the wiki page, here is the dokuwiki user guide: https://www.dokuwiki.org/wiki:syntax#basic_text_formatting -Ash

OUTPUT SPECIFICATIONS

Determine a unified means of output so that all submissions have an identical format.

PROGRAM

Arguments
./encode INFILE  OUTPATH STRIDE  CONTROL
 argv[0] argv[1] argv[2] argv[3] argv[4]

./encode sample0.txt some/other/directory/ 2 5
./decode INFILE  OUTPATH
 argv[0] argv[1] argv[2]

./decode sample0.txt some/other/directory/

NOTE Control value is input as a decimal 0-255. When running encode for this project, the control value should be input as '61' for a, '62' for b, etc.. So, in order to do this, you will need to do some conversion within rle2 in order to convert the 61 decimal into hexadecimal. One way to do this is possibly sscanf. The same is true of the stride.

NOTE To convert a string argument into a usable decimal, set your desired variable equal to atoi() and put the string argument you wish to convert in the function. This works with unsigned chars as well as ints. If you atoi to an unsigned char, it is easy to put it into the header array with a simple assignment.

Explanation

Input-file→the name of the file that is being encoded/decoded

Outpath→The path to the output file NOT including the filename (ie '~/rle2/x.rle' → '~/rle/')

Stride→How long the chain of bundled characters is (ie. 1ab is 2, 1abc is 3, etc)

Control→A value chosen to signal that the following bits are compressed, and thus will need to be decoded. Preferably the least common bit, as false positives are a possibility

DATA HEADER SPECIFICATIONS

(mostly the same as rle0)

Header Format:

byte 0: 0x72
byte 1: 0x6c
byte 2: 0x65
byte 3: 0x58
byte 4: 0x20
byte 5: 0x52
byte 6: 0x4c
byte 7: 0x45
byte 8: 0x(control byte)
byte 9: 0x03 (version)
byte 10: 0x(stride value) – changes depending on input
byte 11: 0xArgv The length of the source file name, not including NULL terminator
(how many characters in Argv - 1)
byte 12: The name of the source file, not including the NULL terminator

EXAMPLES

VERIFICATION

Eval script is inside, however, it doesn't seem to work and is seg faulting when ran. To manually verify, you can check the checksums of your output files against the given output file. Example: ./decode sample0.txt.rle should output the same checksum as sample0.txt inside the data file. You can do this for all given files.

Derive a set of tests that all submissions should perform to ascertain correctness (state the tests, the inputs, and the expected outputs). In conjunction with conforming output specifications, all submissions should match (this is the basis for writing a verification script that can automate the process).

Which, being said: once output specifications and verification tests have been established, anyone writing a verification script to automate this can be eligible to receive bonus points.

PSEUDOCODE

 

SUBMISSION

To be successful in this project, the following criteria (or their equivalent) must be met:

Submit Tool Usage

Let's say you have completed work on the project, and are ready to submit, you would do the following (assuming you have a program called uom0.c):

lab46:~/src/SEMESTER/DESIG/PROJECT$ make submit

You should get some sort of confirmation indicating successful submission if all went according to plan. If not, check for typos and or locational mismatches.

RUBRIC

I'll be evaluating the project based on the following criteria:

156:rle2:final tally of results (156/156)
*:rle2:used grabit to obtain project by the Sunday prior to duedate [13/13]
*:rle2:clean compile, no compiler messages [13/13]
*:rle2:implementation passes verification tests [39/39]
*:rle2:adequate modifications to code from template [39/39]
*:rle2:program operations conform to project specifications [39/39]
*:rle2:code tracked in lab46 semester repo [13/13]

Pertaining to the collaborative authoring of project documentation

Additionally