User Tools

Site Tools


Sidebar

projects

wcp1 (due 20220824)
ntr0 (due 20220831)
pct1 (bonus; due 20220831)
uom0 (due 20220831)
wcp2 (due 20220831)
pct2 (due 20220907)
pct3 (bonus; due 20220907)
rle0 (due 20220907)
wcp3 (due 20220907)
bdb0 (due 20220914)
wcp4 (due 20220914)
pct4 (due 20220915)
cnv0 (due 20220921)
wcp5 (due 20220921)
pct5 (bonus; due 20220922)
gfo0 (due 20220928)
lmr0 (due 20220928)
wcp6 (due 20220928)
pct6 (due 20220929)
pct7 (bonus; due 20221005)
rle1 (due 20221005)
wcp7 (due 20221005)
bwp1 (bonus; due 20221019)
pct8 (due 20221019)
wcp8 (due 20221019)
cnv1 (due 20221020)
pct9 (bonus; due 20221026)
wcp9 (due 20221026)
lmr1 (due 20221027)
gfo1 (due 20221102)
pctA (due 20221102)
wcpA (due 20221102)
rle2 (due 20221104)
pctB (bonus; due 20221109)
wcpB (due 20221109)
cnv2 (due 20221110)
pctC (due 20221116)
wcpC (due 20221116)
lmr2 (due 20221117)
bwp2 (bonus; due 20221201)
pctD (bonus; due 20221201)
wcpD (bonus; due 20221201)
pctE (bonus; due 20221207)
wcpE (bonus; due 20221207)
gfo2 (due 20221208)
EoCE (due 20221219)
haas:fall2022:discrete:projects:rle0

Corning Community College

CSCS2330 Discrete Structures

PROJECT: Run-Length Encoding (RLE0)

OBJECTIVE

To explore the realm of algorithmic encoding/decoding of information, potentially achieving data compression in ideal scenarios, and collaboratively authoring and documenting the project and its specifications.

GRABIT

To assist with consistency across all implementations, data files for use with this project are available on lab46 via the grabit tool. Be sure to obtain it and ensure your implementation properly works with the provided data.

lab46:~/src/SEMESTER/DESIG$ grabit DESIG PROJECT

EDIT

You will want to go here to edit and fill in the various sections of the document:

BACKGROUND

Run Length Encoding (RLE) Data Compression Algorithm

Run-length encoding (RLE) algorithm is a lossless compression of runs of data, where repeated sequences of data are stored as a representation of a single data value and its count (how many times it repeats consecutively). RLE encoding is commonly used in JPEG, TIFF, and PDF files, to name a few examples.

For anybody interested in editing the wiki page, here is the dokuwiki user guide: https://www.dokuwiki.org/wiki:syntax#basic_text_formatting -Ash

ALGORITHM: RLE

The RLE encoding algorithm first takes a file that contains a run of data in the following way:

Original:

aaaabbcdefgggg

Suppose this data sequence is contained within a .txt file.

The RLE algorithm will encode this data by replacing the repeated characters with a count number and a single value.

4a2b1c1d1e1f4g

There are 4 a's (0x61), 2 b's(0x62), 1 c(0x63), 1 d(0x64), 1 e(0x65), 1 f (0x66), and 5 g's (0x67).

Encoded:

04 61 02 62 01 63 01 64 01 65 01 66 05 67
^  ^  ^  ^  ^  ^  ^  ^  ^  ^  ^  ^  ^  ^
4  a  2  b  1  c  1  d  1  e  1  f  5  g

NOTE: While this example is in readable characters in our eyes, outside of sample0 in the project other samples will not be as readable to you and I. However, rest be assured that the same algorithm will still apply throughout all the samples. The program's job is still to identify the characters and the amount of characters and output that in hex to the output file in the case of the encoder.

Please reference the image below to find the hexadecimal value of the ASCII symbols:

NOTE: Use chars as they will give a hex-byte value as opposed to any kind of string or integer values that they may be converted to from other data types.

SPECIFICATIONS

Our task is to ask questions on Discord or in class, document our findings on this wiki page, and implement the RLE encoding/decoding algorithms in our own encoder and decoder programs. The output of the encoder should be in hex. In order to see if it does use either xxd in lab46 or xxd within vim. You can also use your own program or command if you have some other way you prefer to view the files in hex. The encoder should be able to encode all characters according to the algorithm. This includes any newlines that you might see.

To use xxd within lab46:

USERNAME@lab46:~$ xxd FILENAME

To use xxd inside vim:

:%!xxd

The decoder should be able to take the output of the encoder and give back the file initially inputted to the encoder. The decoder should make use of the header to find the filename alongside the contents of the file to decode the file. Furthermore, the decoder must remove the header after finding necessary data in the input file (i.e the filename) and write only the decoded data to the new file.

NOTE: The encoder and decoder are two separate programs.

While fetching data from the source file, use the function feof() from stdlib.h to ensure the end of file has been found; This will help you avoid any false positives resulting in unpredictable behavior.

Example:

while ( !feof( in ) )
{
     data = fgetc( in );
     fprintf ( out, "%c", data);
}
  • The in_file should be taken from the user as argv[1].
  • The out_file should also be taken from the user as argv[2].

DATA HEADER SPECIFICATIONS

Header Format:

byte 0: 0x72
byte 1: 0x6c
byte 2: 0x65
byte 3: 0x58
byte 4: 0x20
byte 5: 0x52
byte 6: 0x4c
byte 7: 0x45
byte 8: 0x00 (reserved)
byte 9: 0x01 (version)
byte 10: 0x01 (stride value)
byte 11: 0xArgv The length of the source file name, not including NULL terminator
(how many characters in Argv - 1)
byte 12: The name of the source file, not including the NULL terminator

Info

Byte 8 - Reserved for a future project.
Byte 9 - Version number, this week is version 1.
Byte A - Width of comparison. 1 byte for this week's project.
Byte B - 'bob' would be 3.

In the case of 'bob', byte C would be 0x62('b'), byte D would be 0x6F('o'), and byte E would be 0x62('b').

PROGRAM

Encode program will take two arguments:

./encode INFILE  OUTFILE
 argv[0] argv[1] argv[2]

./encode sample0.txt sample0.txt.rle

Encoder will output a file with a name equivalent to the second argument.

Decode program will take two arguments:

./decode INFILE OUTFILE
 argv[0] argv[1] argv[2]

./decode sample0.txt.rle sample0.txt

The decoder should be able to read the header and find out the filename if a second argument is not given. If a second argument is given it will use the second argument instead of taking the filename from the header. Decode will output a file of similar name without the .rle extension after running your decoder.

OUTPUT SPECIFICATIONS

Program should be able to encode a file such as sample0.txt and successfully output sample0.txt.rle in a similar manner to the file given to you when you initially grabbed the project. Checksums should be identical to each other. Similar to encoding, decoding should be identical with an input of sample0.txt.rle for example, it should output a file sample0.txt with a checksum equivalent to that of sample0.txt when you first grabbed the project.

Output some data after a successful run of the program with information of how many characters were encoded and how many characters were printed into the output file with a comparison percentage in regards to file compression.

Encoded: xxx bytes
Decoded: xxx bytes
File compression: %

VERIFICATION

All programs should be able to both encode/decode with interoperability in mind such that person_a's decoder must be able to decode a file that has been encoded by person_b's encoder, and vise versa.

eg.

some_file = "Hello World"
person_a -> encode(some_file)
person_b -> decode(some_file)

print(some_file)

>> Hello World

Verify your program's capabilities by running make check as long as you have the makefile and verify file in your rle0 directory.

NOTE: This is not the full make check output, this is pending a full successful make check so if one is achieved feel free to delete this. Verification adds the in/ to the input file name argument, as well as the out/ for the output file name argument.

USERNAME@lab46:~/src/fall2022/rle0$ make check
=================================================
= PHASE 0: Raw -> Encode data verification test =
=================================================
in/sample0.txt -> o0/sample0.txt.rle: OK
in/sample1.txt -> o0/sample1.txt.rle: OK
in/sample2.bmp -> o0/sample2.bmp.rle: OK
in/sample3.wav -> o0/sample3.wav.rle: OK

=================================================
= PHASE 1: Decode -> Raw data verification test =
=================================================
in/sample0.txt.rle -> o1/sample0.txt: OK
in/sample1.txt.rle -> o1/sample1.txt: OK
in/sample2.bmp.rle -> o1/sample2.bmp: OK
in/sample3.wav.rle -> o1/sample3.wav: OK

================================================
= PHASE 2: Raw -> Encode -> Decode -> Raw test =
================================================
in/sample0.txt -> m2/sample0.txt.rle -> o2/sample0.txt: OK
in/sample1.txt -> m2/sample1.txt.rle -> o2/sample1.txt: OK
in/sample2.bmp -> m2/sample2.bmp.rle -> o2/sample2.bmp: OK
in/sample3.wav -> m2/sample3.wav.rle -> o2/sample3.wav: OK

=============================================
= PHASE 3: Decode -> Raw -> Encode Raw test =
=============================================
in/sample0.txt.rle -> m3/sample0.txt -> o3/sample0.txt.rle: OK
in/sample1.txt.rle -> m3/sample1.txt -> o3/sample1.txt.rle: OK
in/sample2.bmp.rle -> m3/sample2.bmp -> o3/sample2.bmp.rle: OK
in/sample3.wav.rle -> m3/sample3.wav -> o3/sample3.wav.rle: OK
 

SUBMISSION

To be successful in this project, the following criteria (or their equivalent) must be met:

  • Project must be submit on time, by the deadline.
    • Late submissions will lose 33% credit per day, with the submission window closing on the 3rd day following the deadline.
  • All code must compile cleanly (no warnings or errors)
    • Compile with the -Wall and –std=gnu18 compiler flags
    • all requested functionality must conform to stated requirements (either on this document or in a comment banner in source code files themselves).
  • Executed programs must display in a manner similar to provided output
    • output formatted, where applicable, must match that of project requirements
  • Processing must be correct based on input given and output requested
  • Output, if applicable, must be correct based on values input
  • Code must be nicely and consistently indented
  • Code must be consistently written, to strive for readability from having a consistent style throughout
  • Code must be commented
    • Any “to be implemented” comments MUST be removed
      • these “to be implemented” comments, if still present at evaluation time, will result in points being deducted.
      • Sufficient comments explaining the point of provided logic MUST be present
  • No global variables (without instructor approval), no goto statements, no calling of main()!
  • Track/version the source code in your lab46 semester repository
  • Submit a copy of your source code to me using the submit tool (make submit on lab46 will do this) by the deadline.

Submit Tool Usage

Let's say you have completed work on the project, and are ready to submit, you would do the following (assuming you have a program called uom0.c):

lab46:~/src/SEMESTER/DESIG/PROJECT$ make submit

You should get some sort of confirmation indicating successful submission if all went according to plan. If not, check for typos and or locational mismatches.

RUBRIC

I'll be evaluating the project based on the following criteria:

78:rle0:final tally of results (78/78)
*:rle0:used grabit to obtain project by the Sunday prior to duedate [13/13]
*:rle0:clean compile, no compiler messages [13/13]
*:rle0:implementation passes verification tests [13/13]
*:rle0:adequate modifications to code from template [13/13]
*:rle0:program operations conform to project specifications [13/13]
*:rle0:code tracked in lab46 semester repo [13/13]

Pertaining to the collaborative authoring of project documentation

  • each class member is to participate in the contribution of relevant information and formatting of the documentation
    • minimal member contributions consist of:
      • near the class average edits (a value of at least four productive edits)
      • near the average class content change average (a value of at least 256 bytes (absolute value of data content change))
      • near the class content contribution average (a value of at least 1kiB)
      • no adding in one commit then later removing in its entirety for the sake of satisfying edit requirements
    • adding and formatting data in an organized fashion, aiming to create an informative and readable document that anyone in the class can reference
    • content contributions will be factored into a documentation coefficient, a value multiplied against your actual project submission to influence the end result:
      • no contributions, co-efficient is 0.50
      • less than minimum contributions is 0.75
      • met minimum contribution threshold is 1.00

Additionally

  • Solutions not abiding by spirit of project will be subject to a 50% overall deduction
  • Solutions not utilizing descriptive why and how comments will be subject to a 25% overall deduction
  • Solutions not utilizing indentation to promote scope and clarity or otherwise maintaining consistency in code style and presentation will be subject to a 25% overall deduction
  • Solutions not organized and easy to read (assume a terminal at least 90 characters wide, 40 characters tall) are subject to a 25% overall deduction
haas/fall2022/discrete/projects/rle0.txt · Last modified: 2022/08/28 21:39 by 127.0.0.1