User Tools

Site Tools


notes:discrete:fall2022:projects:rle0

This is an old revision of the document!


BACKGROUND

Run Length Encoding (RLE) Data Compression Algorithm

Run-length encoding (RLE) algorithm is a lossless compression of runs of data, where repeated sequences of data are stored as a representation of a single data value and its count (how many times it repeats consecutively). RLE encoding is commonly used in JPEG, TIFF, and PDF files, to name a few examples.

For anybody interested in editing the wiki page, here is the dokuwiki user guide: https://www.dokuwiki.org/wiki:syntax#basic_text_formatting -Ash

ALGORITHM: RLE

The RLE encoding algorithm first takes a file that contains a run of data in the following way:

Original:

aaaabbcdefgggg

Suppose this data sequence is contained within a .txt file.

The RLE algorithm will encode this data by replacing the repeated characters with a count number and a single value.

4a2b1c1d1e1f5g

There are 4 a's (0x61), 2 b's(0x62), 1 c(0x63), 1 d(0x64), 1 e(0x65), 1 f (0x66), and 5 g's (0x67).

Encoded:

04 61 02 62 01 63 01 64 01 65 01 66 05 67
^  ^  ^  ^  ^  ^  ^  ^  ^  ^  ^  ^  ^  ^
4  a  2  b  1  c  1  d  1  e  1  f  5  g

Please reference the image below to find the hexadecimal value of the ASCII symbols:

SPECIFICATIONS

Our task is to ask questions on Discord or in class, document our findings on this wiki page, and implement the RLE encoding/decoding algorithms in our own encoder and decoder programs.

DATA HEADER SPECIFICATIONS

Header Format:

byte 0: 0x72
byte 1: 0x6c
byte 2: 0x65
byte 3: 0x58
byte 4: 0x20
byte 5: 0x52
byte 6: 0x4c
byte 7: 0x45
byte 8: 0x00 (reserved)
byte 9: 0x01 (version)
byte 10: 0x01 (stride value)
byte 11: 0xArgv source file name length (not including NULL terminator)
byte 12: source file name + through byte 11 value minus 1

Info

Byte 8 - Reserved for a future project. Byte 9 - Version number, this week is version 1. Byte A - Width of comparison. 1 byte for this week's project. Byte B - 'bob' would be 3.

In the case of 'bob', byte C would be 0x62('b'), byte D would be 0x6F('o'), and byte E would be 0x62('b').

PROGRAM

encode program will take two arguments:

./encode INFILE  OUTFILE
 argv[0] argv[1] argv[2]

./encode sample0.txt sample0.txt.rle

decode program will take one argument:

./decode INFILE
 argv[0] arfv[1]

./decode sample0.txt.rle

OUTPUT SPECIFICATIONS

VERIFICATION

All programs should be able to both encode/decode with interoperability in mind such that person_a's decoder must be able to decode a file that has been encoded by person_b's encoder, and vise versa.

eg.

some_file = "Hello World"
person_a -> encode(some_file)
person_b -> decode(some_file)

print(some_file)

>> Hello World
notes/discrete/fall2022/projects/rle0.1662428047.txt.gz · Last modified: 2022/09/06 01:34 by abarbcal