User Tools

Site Tools


Sidebar

projects

wcp1 (due 20240828)
btt0 (due 20240904)
wcp2 (due 20240904)
pct0 (bonus; due 20240905)
pct1 (bonus; due 20240905)
pct2 (due 20240905)
abc0 (due 20240906)
msi0 (due 20240911)
pct3 (bonus; due 20240911)
wcp3 (due 20240911)
msi1 (due 20240918)
pct4 (due 20240918)
wcp4 (due 20240918)
dsr0 (due 20240926)
pct5 (bonus; due 20240926)
wcp5 (due 20240926)
gfo0 (due 20241002)
pct6 (due 20241002)
pnc0 (due 20241002)
wcp6 (due 20241002)
dsr1 (due 20241009)
pct7 (bonus; due 20241009)
wcp7 (due 20241009)
bwp1 (bonus; due 20241016)
pct8 (due 20241016)
pnc1 (due 20241016)
wcp8 (due 20241016)
pct9 (bonus; due 20241023)
pnc2 (due 20241023)
wcp9 (due 20241023)
gfo1 (due 20241030)
mag0 (due 20241030)
pctA (due 20241030)
wcpA (due 20241030)
mag1 (due 20241106)
pctB (bonus; due 20241106)
wcpB (due 20241106)
mag2 (due 20241113)
pctC (due 20241113)
wcpC (due 20241113)
pctD (bonus; due 20241120)
wcpD (bonus; due 20241120)
bwp2 (bonus; due 20241204)
gfo2 (due 20241204)
pctE (bonus; due 20241204)
wcpE (bonus; due 20241204)
EoCE (due 20241216)
haas:fall2024:discrete:projects:rle1

Corning Community College

CSCS2330 Discrete Structures

PROJECT: Run-Length Encoding (RLE1)

OBJECTIVE

To continue to explore the realm of algorithmic encoding/decoding of information, potentially achieving data compression in ideal scenarios, and collaboratively authoring and documenting the project and its specifications.

GRABIT

To assist with consistency across all implementations, data files for use with this project are available on lab46 via the grabit tool. Be sure to obtain it and ensure your implementation properly works with the provided data.

lab46:~/src/SEMESTER/DESIG$ grabit DESIG PROJECT

EDIT

You will want to go here to edit and fill in the various sections of the document:

BACKGROUND

Version 2 of our RLE algorithm.

This time we will be encoding and decoding similar data focusing on more than one byte.


Run Length Encoding (RLE) Data Compression Algorithm

Run-length encoding (RLE) algorithm is a lossless compression of runs of data, where repeated sequences of data are stored as a representation of a single data value and its count (how many times it repeats consecutively). RLE encoding is commonly used in JPEG, TIFF, and PDF files, to name a few examples.

SPECIFICATIONS

Please reference the image below to find the hexadecimal value of the ASCII symbols:

*Our task is to ask questions on Discord or in class and document our findings on this wiki page collaboratively, regarding the functionality of this project.

*For anybody interested in editing the wiki page, here is the dokuwiki user guide: https://www.dokuwiki.org/wiki:syntax#basic_text_formatting -Ash

DATA HEADER SPECIFICATIONS

(mostly the same as rle0)

Header Format:

byte 0: 0x72
byte 1: 0x6c
byte 2: 0x65
byte 3: 0x58
byte 4: 0x20
byte 5: 0x52
byte 6: 0x4c
byte 7: 0x45
byte 8: 0x00 (reserved)
byte 9: 0x02 (version)
byte 10: 0x(stride value) – changes depending on input
byte 11: 0xArgv The length of the source file name, not including NULL terminator
(how many characters in Argv - 1)
byte 12: The name of the source file, not including the NULL terminator

CUSTOMISATION: variable stride

The stride will determine the workings of the the encoding/decoding process. In rle0, the stride was always 1, thus we had ababcc→ 1a1b1a1b2c, where we compressed runs of a single byte. For rle1, the stride given will determine how many bytes are in each run, for example a stride of two will mean we check for runs of two, which conducts as ababcc→ 2ab1cc. Since the strides are chars, they will hold any value between 1-255, and so each of these values should be valid for run length.

PROGRAM

Encode program will take two arguments:

./encode INFILE  STRIDE
 argv[0] argv[1] argv[2]

./encode sample0.txt 2

For ENCODE, the second argument no longer indicates the destination file name, the destination file name is provided by byte 12 of the header ( the name of the original file), and appending .rle at the end.

For ENCODE, the second argument is now a STRIDE indicator, with a min of 0 and max of 255. No default stride, if no stride is indicated, display error. Store stride on byte 10 of the header.

DECODE program will take only one argument, the encoded .rle file:

./decode INFILE 
 argv[0] argv[1]

./decode sample0.txt.rle

The decoder should be able to read the header and find out the original filename. Decode will output a file with the name of the original file, without the .rle.

The decoder should be able to read the header and find out the version of the encoded file. Depending on the version it should decode the provided file accordingly, making it backward-compatible.

The decoder should be able to read the header and find out the stride and should decode the provided file accordingly.

OUTPUT SPECIFICATIONS

Just like what we did in rle0, the rle1's encoder should output the original file's length, the encoded file length, followed by the compression ratio. The output is not strict, as long as you have the correct input file length, output file length, and compression ratio. The compression ratio can be calculated by the following equation: 'Output Length / Input Length'. Make sure in your code to set the type of the variable holding the compression ratio to 'double' or 'float'. You may also want to take a look at the C output format specifiers for a prettier output.

Example:

       Input File length: 100 
       Output File length: 40
       Compression ratio: .40 or 40%

VERIFICATION

NOTE Verification using ./check may not work. If this is the case, then run manual checks. To do this, run the sample and use the stride that is in the header of the sample output you are comparing to (Byte 10). The stride used is also in the file name.

To run the check file provided to you when you grabbed the project, run ./check (linux-based system, may vary on different OS).

You can also manually verify by encoding/decoding a file, and checking if it has the same md5sum as the file with or without the .rle extension depending on if you encoded or decoded. To do this, run your encoder/decoder, and enter md5sum (output file), again, linux-based system, may vary on a different OS.

 

SUBMISSION

To be successful in this project, the following criteria (or their equivalent) must be met:

  • Project must be submit on time, by the deadline.
    • Late submissions will lose 33% credit per day, with the submission window closing on the 3rd day following the deadline.
  • All code must compile cleanly (no warnings or errors)
    • Compile with the -Wall and –std=gnu18 compiler flags
    • all requested functionality must conform to stated requirements (either on this document or in a comment banner in source code files themselves).
  • Executed programs must display in a manner similar to provided output
    • output formatted, where applicable, must match that of project requirements
  • Processing must be correct based on input given and output requested
  • Output, if applicable, must be correct based on values input
  • Code must be nicely and consistently indented
  • Code must be consistently written, to strive for readability from having a consistent style throughout
  • Code must be commented
    • Any “to be implemented” comments MUST be removed
      • these “to be implemented” comments, if still present at evaluation time, will result in points being deducted.
      • Sufficient comments explaining the point of provided logic MUST be present
  • No global variables (without instructor approval), no goto statements, no calling of main()!
  • Track/version the source code in your lab46 semester repository
  • Submit a copy of your source code to me using the submit tool (make submit on lab46 will do this) by the deadline.

Submit Tool Usage

Let's say you have completed work on the project, and are ready to submit, you would do the following (assuming you have a program called uom0.c):

lab46:~/src/SEMESTER/DESIG/PROJECT$ make submit

You should get some sort of confirmation indicating successful submission if all went according to plan. If not, check for typos and or locational mismatches.

RUBRIC

I'll be evaluating the project based on the following criteria:

117:rle1:final tally of results (117/117)
*:rle1:used grabit to obtain project by the Sunday prior to duedate [13/13]
*:rle1:clean compile, no compiler messages [13/13]
*:rle1:implementation passes verification tests [26/26]
*:rle1:adequate modifications to code from template [26/26]
*:rle1:program operations conform to project specifications [26/26]
*:rle1:code tracked in lab46 semester repo [13/13]

Pertaining to the collaborative authoring of project documentation

  • each class member is to participate in the contribution of relevant information and formatting of the documentation
    • minimal member contributions consist of:
      • near the class average edits (a value of at least four productive edits)
      • near the average class content change average (a value of at least 256 bytes (absolute value of data content change))
      • near the class content contribution average (a value of at least 1kiB)
      • no adding in one commit then later removing in its entirety for the sake of satisfying edit requirements
    • adding and formatting data in an organized fashion, aiming to create an informative and readable document that anyone in the class can reference
    • content contributions will be factored into a documentation coefficient, a value multiplied against your actual project submission to influence the end result:
      • no contributions, co-efficient is 0.50
      • less than minimum contributions is 0.75
      • met minimum contribution threshold is 1.00

Additionally

  • Solutions not abiding by spirit of project will be subject to a 50% overall deduction
  • Solutions not utilizing descriptive why and how comments will be subject to a 25% overall deduction
  • Solutions not utilizing indentation to promote scope and clarity or otherwise maintaining consistency in code style and presentation will be subject to a 25% overall deduction
  • Solutions not organized and easy to read (assume a terminal at least 90 characters wide, 40 characters tall) are subject to a 25% overall deduction
haas/fall2024/discrete/projects/rle1.txt · Last modified: 2022/09/26 15:26 by 127.0.0.1