User Tools

Site Tools


haas:fall2017:discrete:projects:dcf0

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
haas:fall2017:discrete:projects:dcf0 [2017/08/20 23:17] – [Grabit Integration] wedgehaas:fall2017:discrete:projects:dcf0 [2017/09/06 17:19] (current) – [Decode] wedge
Line 8: Line 8:
 ======Project: RUN-LENGTH ENCODING - DATA COMPRESSION FUN (dcf0)====== ======Project: RUN-LENGTH ENCODING - DATA COMPRESSION FUN (dcf0)======
  
 +=====Errata=====
 +Any changes that have been made.
 +
 +  * Revision 0.1: Enhanced included 'check' script (20170901)
 +  * Revision 0.2: Filled out the "Verify Results" section (20170903)
 +  * Revision 0.3: Added a "Check Results" section (20170904)
 +  * Revision 0.4: Further enhanced 'check' script, and updated project page verify section to reflect improved functionality (20170904)
 +  * Revision 0.5: After some absolutely incredible irc shenanigans in what I've come to call the "discrete irc labor day power hours", some 1337 h4xxing was done, and it was discovered that the included in/sample2.bmp.rle file was actually incorrectly encoded. Finally a seemingly working copy was created, only to later discover that, while not incorrect, was also not optimal. So, as of 8pm Monday Sept 4th, a more optimal and correct in/sample2.bmp.rle file was placed in the project directory. Please run 'make getdata' to share in the goodness. And thank you to everyone who showed up- THAT is why I do what I do. (20170904)
 =====Objective===== =====Objective=====
 To apply your skills in implementing an encoding scheme that, in ideal circumstances, will lead to a smaller storage footprint. To apply your skills in implementing an encoding scheme that, in ideal circumstances, will lead to a smaller storage footprint.
Line 137: Line 145:
  
 Your program should: Your program should:
-  * obtain 1 parameter from the command-line (see **command-line arguments** section below): +  * obtain 2 parameters from the command-line (see **command-line arguments** section below): 
-    * argv[1]: name of source file+    * argv[1]: name of the input file 
 +    * argv[2]: name of the output file
     * this should be a file that exists, but you should do appropriate error checking and bail out if the file cannot be accessed     * this should be a file that exists, but you should do appropriate error checking and bail out if the file cannot be accessed
-    * for **encode**, the output file will be the filename specified in argv[1] will an "**.rle**" suffixed to the end. This adds a certain universal aspect to how we'll go about naming things (**tar** and **gzip** do this too). +    * for **encode**, the output file will be in RLE format (ideally with an "**.rle**" suffixed to the end). This adds a certain universal aspect to how we'll go about naming things (**tar** and **gzip** do this too). 
-    * for **decode**, this should be an RL-encoded file+    * for **decode**, the input file should be an RL-encoded file.
-      * the output file name will be obtained from parsing the file's header data.+
       * be sure to perform appropriate error checking and bail out as needed.       * be sure to perform appropriate error checking and bail out as needed.
   * implement the specified algorithm in both encoding and decoding forms.   * implement the specified algorithm in both encoding and decoding forms.
     * please be sure to test it against varying types of data, to make sure it works no matter what you throw at it.     * please be sure to test it against varying types of data, to make sure it works no matter what you throw at it.
 +      * I mean it: don't just test it on some small ASCII example, be sure to test it against the full set of sample files in the **in/** directory.
   * calculate and display some statistics gleaned during the performance of the process.   * calculate and display some statistics gleaned during the performance of the process.
     * for example, **encode** should display information on:     * for example, **encode** should display information on:
Line 162: Line 171:
   * your RL-encoded data **MUST** be conformant to the project specifications described above.   * your RL-encoded data **MUST** be conformant to the project specifications described above.
     * you should be able to encode/decode a set of data with 100% retrieval rate. No data should be lost.     * you should be able to encode/decode a set of data with 100% retrieval rate. No data should be lost.
 +    * remember, you are encoding/decoding **binary** data, NOT ASCII. Don't fall prey to your misconceptions.
   * **decode** should validate the header information (is it encoded in version 1? if not, complain to STDERR of "version mismatch!" and exit).   * **decode** should validate the header information (is it encoded in version 1? if not, complain to STDERR of "version mismatch!" and exit).
     * if the first 8 bytes of the header do not check out, error out with an "invalid data format detected! aborting process..." message to STDERR.     * if the first 8 bytes of the header do not check out, error out with an "invalid data format detected! aborting process..." message to STDERR.
Line 230: Line 240:
  
 <code c> <code c>
-    if (argc < 2)  // if less than arguments have been provided+    if (argc < 3)  // if less than arguments have been provided
     {     {
         fprintf(stderr, "Not enough arguments!\n");         fprintf(stderr, "Not enough arguments!\n");
Line 243: Line 253:
  
 <cli> <cli>
-lab46:~/src/discrete/dcf0$ ./encode in/sample0.txt out/sample0.txt.rle +lab46:~/src/discrete/dcf0$ ./encode in/sample0.txt out/sample0.txt.rle
  input name length: 14 bytes  input name length: 14 bytes
     input filename: in/sample0.txt     input filename: in/sample0.txt
Line 250: Line 260:
       stride value: 1 byte       stride value: 1 byte
            read in: 82 bytes            read in: 82 bytes
-         wrote out: 64 bytes +         wrote out: 62 bytes 
-  compression rate: 21.95%+  compression rate: 24.39%
 lab46:~/src/discrete/dcf0$  lab46:~/src/discrete/dcf0$ 
 </cli> </cli>
Line 258: Line 268:
  
 <cli> <cli>
-lab46:~/src/discrete/dcf0$ ./decode in/sample0.txt.rle out/sample0.txt  +lab46:~/src/discrete/dcf0$ mkdir tmp 
- input name length: 18 bytes +lab46:~/src/discrete/dcf0$ ./decode out/sample0.txt.rle tmp/sample0.txt  
-    input filename: in/sample0.txt.rle+ input name length: 19 bytes 
 +    input filename: out/sample0.txt.rle
 output name length: 15 bytes output name length: 15 bytes
-   output filename: out/sample0.txt+   output filename: tmp/sample0.txt
        header text: dcfX RLE v1        header text: dcfX RLE v1
       stride value: 1 byte       stride value: 1 byte
-           read in: 64 bytes+           read in: 62 bytes
          wrote out: 82 bytes          wrote out: 82 bytes
-    inflation rate: 21.95%+    inflation rate: 32.26%
 lab46:~/src/discrete/dcf0$  lab46:~/src/discrete/dcf0$ 
 </cli> </cli>
  
-A good way to test that both encode and decode are working is to encode data then immediately turn around and decode that same data. If the decoded file is in the same state as the original, pre-encoded file, you know things are working. 
 =====Check Results===== =====Check Results=====
-If you'd like to verify your implementations,+ 
 +A good way to test that both encode and decode are working is to encode data then immediately turn around and decode that same data. If the decoded file is in the same state as the original, pre-encoded file, you know things are working. 
 + 
 +====diff compare=== 
 +A quick way to check if two files are identical is to run the **diff(1)** command on them, so assuming the original file in **in/sample0.txt**, and the decoded version (which should be the same thing) in **tmp/sample0.txt**: 
 + 
 +<cli> 
 +lab46:~/src/discrete/dcf0$ diff in/sample0.txt tmp/sample0.txt 
 +lab46:~/src/discrete/dcf0$  
 +</cli> 
 + 
 +Just getting your prompt back indicates no major differences were found. 
 + 
 +====MD5sum compare==== 
 +If you'd like to be REALLY sure, generate MD5sum hashes and compare: 
 + 
 +<cli> 
 +lab46:~/src/discrete/dcf0$ md5sum in/sample0.txt tmp/sample0.txt 
 +10f9bc85023dcf37be2b04638cb45ee2  in/sample0.txt 
 +10f9bc85023dcf37be2b04638cb45ee2  tmp/sample0.txt 
 +lab46:~/src/discrete/dcf0$  
 +</cli> 
 + 
 +As you can see, both hashes match (the MD5sum hashes are analyzing the file contents, NOT the name/location). 
 + 
 +====Hex Dump/Visualization==== 
 +You may want to check and see what exactly your program is generating. 
 + 
 +This can be done by performing a hex data dump (or visualization) of the raw data in the output file. 
 + 
 +The tool I'd recommend for quick viewing is **xxd(1)**; please see the following example: 
 + 
 +<cli> 
 +lab46:~/src/discrete/dcf0$ xxd out/sample0.txt.rle 
 +0000000: 6463 6658 2052 4c45 0001 010e 696e 2f73  dcfX RLE....in/
 +0000010: 616d 706c 6530 2e74 7874 0161 0262 0363  ample0.txt.a.b.c 
 +0000020: 0464 0565 0666 0767 0868 0969 086a 076b  .d.e.f.g.h.i.j.k 
 +0000030: 066c 056d 046e 036f 0270 0171 010a       .l.m.n.o.p.q.. 
 +lab46:~/src/discrete/dcf0$  
 +</cli> 
 + 
 +With this output, we can confirm, byte-by-byte, what has been placed in our encoded file. What you'll see are three fields: 
 + 
 +  * leftmost: byte offset (from start of file) 
 +  * middle: hex data (in pairs- big endian by default, so as you expect to read it) 
 +  * rightmost: the ASCII-ized representation of the middle data 
 + 
 +=====Verify Results===== 
 +If you'd like to verify your implementations, there is a **check** script included when you use the **grabit** tool to obtain the skeleton files and data. 
 + 
 +**NOTE:** As there have been updates to this script since the project was first released, you may want to manually obtain a copy, to ensure you have the latest and greatest: 
 + 
 +<cli> 
 +lab46:~/src/discrete/dcf0$ cp /var/public/fall2017/discrete/dcf0/check . 
 +</cli> 
 + 
 +To run it, you need a functioning **encode** and **decode** program (although it does its best otherwise). 
 + 
 +It runs through four separate tests, storing the results in a corresponding **o#/** directory (sometimes, if applicable, intermediate results in a corresponding **m#/** directory): 
 + 
 +  * test 0: take the raw data files in **in/** and encodes them (**o0/**) 
 +  * test 1: take pre-encoded data files in **in/** and decodes them (**o1/**) 
 +  * test 2: take the raw data files in **in/**, encodes them (**m2/**), then decodes them (**o2/**) 
 +  * test 3: take pre-encoded data files in **in/**, decodes them (**m3/**), then encodes them (**o3/**) 
 + 
 +How it works: 
 + 
 +  - depending on the test, encodes or decodes a file in the **in/** directory. 
 +    * if single step, result is in **o#/** directory 
 +    * if multi-step, result is in **m#/** directory, then second operation puts its result into **o#/**  
 +  - A checksum is taken of the original file in **in/** 
 +  - Another checksum is taken of the new file in **o#/** 
 +  - The checksums are compared. If they match, "OK" is displayed; if they do not match, a corresponding "FAIL" message appears. 
 + 
 +====Successful operation==== 
 +If all goes according to plan, you'll see "OK" status messages displayed. 
 + 
 +<cli> 
 +lab46:~/src/discrete/dcf0$ ./check 
 +================================================= 
 += PHASE 0: Raw -> Encode data verification test = 
 +================================================= 
 +in/sample0.txt -> o0/sample0.txt.rle: OK 
 +in/sample1.txt -> o0/sample1.txt.rle: OK 
 +in/sample2.bmp -> o0/sample2.bmp.rle: OK 
 +in/sample3.wav -> o0/sample3.wav.rle: OK 
 + 
 +================================================= 
 += PHASE 1: Decode -> Raw data verification test = 
 +================================================= 
 +in/sample0.txt.rle -> o1/sample0.txt: OK 
 +in/sample1.txt.rle -> o1/sample1.txt: OK 
 +in/sample2.bmp.rle -> o1/sample2.bmp: OK 
 +in/sample3.wav.rle -> o1/sample3.wav: OK 
 + 
 +================================================ 
 += PHASE 2: Raw -> Encode -> Decode -> Raw test = 
 +================================================ 
 +in/sample0.txt -> m2/sample0.txt.rle -> o2/sample0.txt: OK 
 +in/sample1.txt -> m2/sample1.txt.rle -> o2/sample1.txt: OK 
 +in/sample2.bmp -> m2/sample2.bmp.rle -> o2/sample2.bmp: OK 
 +in/sample3.wav -> m2/sample3.wav.rle -> o2/sample3.wav: OK 
 + 
 +============================================= 
 += PHASE 3: Decode -> Raw -> Encode Raw test = 
 +============================================= 
 +in/sample0.txt.rle -> m3/sample0.txt -> o3/sample0.txt.rle: OK 
 +in/sample1.txt.rle -> m3/sample1.txt -> o3/sample1.txt.rle: OK 
 +in/sample2.bmp.rle -> m3/sample2.bmp -> o3/sample2.bmp.rle: OK 
 +in/sample3.wav.rle -> m3/sample3.wav -> o3/sample3.wav.rle: OK 
 + 
 +</cli> 
 + 
 +====Unsuccessful operation==== 
 +Should something not work correctly, you'll see a "FAIL" message: 
 + 
 +<cli> 
 +lab46:~/src/discrete/dcf0$ ./check 
 +================================================= 
 += PHASE 0: Raw -> Encode data verification test = 
 +================================================= 
 +in/sample0.txt -> o0/sample0.txt.rle: OK 
 +in/sample1.txt -> o0/sample1.txt.rle: OK 
 +in/sample2.bmp -> o0/sample2.bmp.rle: FAIL: checksums do not match 
 +in/sample3.wav -> o0/sample3.wav.rle: OK 
 + 
 +================================================= 
 += PHASE 1: Decode -> Raw data verification test = 
 +================================================= 
 +in/sample0.txt.rle -> o1/sample0.txt: OK 
 +in/sample1.txt.rle -> o1/sample1.txt: OK 
 +in/sample2.bmp.rle -> o1/sample2.bmp: FAIL: checksums do not match 
 +in/sample3.wav.rle -> o1/sample3.wav: OK 
 + 
 +================================================ 
 += PHASE 2: Raw -> Encode -> Decode -> Raw test = 
 +================================================ 
 +in/sample0.txt -> m2/sample0.txt.rle -> o2/sample0.txt: OK 
 +in/sample1.txt -> m2/sample1.txt.rle -> o2/sample1.txt: OK 
 +in/sample2.bmp -> m2/sample2.bmp.rle -> o2/sample2.bmp: FAIL: checksums do not match 
 +in/sample3.wav -> m2/sample3.wav.rle -> o2/sample3.wav: OK 
 + 
 +============================================= 
 += PHASE 3: Decode -> Raw -> Encode Raw test = 
 +============================================= 
 +in/sample0.txt.rle -> m3/sample0.txt -> o3/sample0.txt.rle: OK 
 +in/sample1.txt.rle -> m3/sample1.txt -> o3/sample1.txt.rle: OK 
 +in/sample2.bmp.rle -> m3/sample2.bmp -> o3/sample2.bmp.rle: FAIL: checksums do not match 
 +in/sample3.wav.rle -> m3/sample3.wav -> o3/sample3.wav.rle: OK 
 + 
 +</cli> 
 + 
 +====Incomplete operation==== 
 +Should something not work at all (like a missing or uncompiling decode binary), you'll see a "MISSING" message: 
 + 
 +<cli> 
 +lab46:~/src/discrete/dcf0$ ./check 
 +... 
 + 
 +================================================= 
 += PHASE 1: Decode -> Raw data verification test = 
 +================================================= 
 +in/sample0.txt.rle -> o1/sample0.txt: MISSING: decode 
 +in/sample1.txt.rle -> o1/sample1.txt: MISSING: decode 
 +in/sample2.bmp.rle -> o1/sample2.bmp: MISSING: decode 
 +in/sample3.wav.rle -> o1/sample3.wav: MISSING: decode 
 +... 
 +</cli>
  
 =====Submission===== =====Submission=====
haas/fall2017/discrete/projects/dcf0.1503271056.txt.gz · Last modified: 2017/08/20 23:17 by wedge