User Tools

Site Tools


haas:fall2017:discrete:projects:dcf1

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
haas:fall2017:discrete:projects:dcf1 [2017/09/08 11:31] – [Other specification details] wedgehaas:fall2017:discrete:projects:dcf1 [2017/10/09 14:21] (current) wedge
Line 3: Line 3:
 <WRAP><fs 150%>CSCS2330 Discrete Structures</fs></WRAP> <WRAP><fs 150%>CSCS2330 Discrete Structures</fs></WRAP>
 </WRAP> </WRAP>
- 
-~~TOC~~ 
  
 ======Project: RUN-LENGTH ENCODING - DATA COMPRESSION FUN (dcf1)====== ======Project: RUN-LENGTH ENCODING - DATA COMPRESSION FUN (dcf1)======
 +
 +=====Errata=====
 +Any changes that have been made.
 +
 +  * Revision 0.1: Updating dcfX v2 spec and added some additional implementation constraints (20170907)
 +  * Revision 0.2: Finalized project data files, adapted included 'check' script for dcf1 (20170909)
 +  * Revision 0.3: Updated check script so it no longer gives out false negatives. **make getdata** to grab the updated copy (20170921)
  
 =====Objective===== =====Objective=====
Line 55: Line 60:
 ====Header==== ====Header====
 It is actually **identical** to the specifications of last week, save for four changes: It is actually **identical** to the specifications of last week, save for four changes:
-  - we're no longer hard-coding the **stride** value to 1 (byte 10)+  - we're no longer hard-coding the **stride** value to 1 (byte 10), but instead obtaining it from the command-line (argv[3]), any valid value between 1 and 255 (inclusive).
   - we're placing a 2 in the version byte (byte 9)   - we're placing a 2 in the version byte (byte 9)
   - the embedded source file name will now be stripped of any path (ie "in/sample0.txt" should now just be stored as "sample0.txt")   - the embedded source file name will now be stripped of any path (ie "in/sample0.txt" should now just be stored as "sample0.txt")
   - the destination argument (argv[2]) is now merely a path, NOT a path+filename (ie "out/sample0.txt.rle" should now just be "out")   - the destination argument (argv[2]) is now merely a path, NOT a path+filename (ie "out/sample0.txt.rle" should now just be "out")
     * the destination file is a combination of the destination path + source filename + ".rle" extension (for encode).     * the destination file is a combination of the destination path + source filename + ".rle" extension (for encode).
 +
 +And specifically for **decode**, the source filename will be retrieved out of the post-header information at the start of the encoded file.
  
 Every RL-encoded file will start with the following 12-byte header: Every RL-encoded file will start with the following 12-byte header:
Line 148: Line 155:
     * in fact, I'd recommend avoiding things like **fseek(3)** and **ftell(3)** altogether (or anything that artificially adjusts the current file position), your solutions will benefit from the resulting simplicity (but again, that simplicity will only come with an understanding of the process).     * in fact, I'd recommend avoiding things like **fseek(3)** and **ftell(3)** altogether (or anything that artificially adjusts the current file position), your solutions will benefit from the resulting simplicity (but again, that simplicity will only come with an understanding of the process).
   * NO solutions centrally based on knowledge of the file's length. Do not compute file length and then encode/decode (with encode/decode process being somehow based on the length). You can (hint: and should) gradually accumulate the file lengths via the process, but the process itself should in no way be based on some fixed, known length. Again, this is further preparing us for letting the computer take control of what it is good at, and freeing us to focus on the conceptual crafting of solutions.   * NO solutions centrally based on knowledge of the file's length. Do not compute file length and then encode/decode (with encode/decode process being somehow based on the length). You can (hint: and should) gradually accumulate the file lengths via the process, but the process itself should in no way be based on some fixed, known length. Again, this is further preparing us for letting the computer take control of what it is good at, and freeing us to focus on the conceptual crafting of solutions.
 +  * when computing compression/inflation rates, only calculate the DATA, omit the header and filename information. But you'll also want to keep track of total bytes read/written for purposes of displaying such statistics in the info table (see examples).
 +    * I will be looking for the output prompts, but not necessarily the precise values (for example, if your compression/inflation equation yields different results than mine, that won't count against you- provided you have at least tried to compute something reasonable).
 =====Grabit Integration===== =====Grabit Integration=====
 For those familiar with the **grabit** tool on lab46, I have made some skeleton files and a custom **Makefile** available for this project. For those familiar with the **grabit** tool on lab46, I have made some skeleton files and a custom **Makefile** available for this project.
Line 160: Line 169:
 ‘/var/public/SEMESTER/discrete/dcf1/encode.c’ -> ‘/home/USERNAME/src/discrete/dcf1/encode.c’ ‘/var/public/SEMESTER/discrete/dcf1/encode.c’ -> ‘/home/USERNAME/src/discrete/dcf1/encode.c’
 ‘/var/public/SEMESTER/discrete/dcf1/decode.c’ -> ‘/home/USERNAME/src/discrete/dcf1/decode.c’ ‘/var/public/SEMESTER/discrete/dcf1/decode.c’ -> ‘/home/USERNAME/src/discrete/dcf1/decode.c’
-‘/var/public/SEMESTER/discrete/dcf1/data/sample0.txt’ -> ‘/home/USERNAME/src/discrete/dcf1/data/sample0.txt’ +...
-‘/var/public/SEMESTER/discrete/dcf1/data/sample1.txt’ -> ‘/home/USERNAME/src/discrete/dcf1/data/sample1.txt’ +
-‘/var/public/SEMESTER/discrete/dcf1/data/sample2.bmp’ -> ‘/home/USERNAME/src/discrete/dcf1/data/sample2.bmp’ +
-‘/var/public/SEMESTER/discrete/dcf1/data/sample3.wav’ -> ‘/home/USERNAME/src/discrete/dcf1/data/sample3.wav’ +
-‘/var/public/SEMESTER/discrete/dcf1/data/sample4.bmp.rle’ -> ‘/home/USERNAME/src/discrete/dcf1/data/sample4.bmp.rle’ +
-‘/var/public/SEMESTER/discrete/dcf1/data/sample5.txt.rle’ -> ‘/home/USERNAME/src/discrete/dcf1/data/sample5.txt.rle’ +
-‘/var/public/SEMESTER/discrete/dcf1/data/sample6.mp3.rle’ -> ‘/home/USERNAME/src/discrete/dcf1/data/sample6.mp3.rle’ +
-‘/var/public/SEMESTER/discrete/dcf1/data/sample7.txt.rle’ -> ‘/home/USERNAME/src/discrete/dcf1/data/sample7.txt.rle’ +
 make: Leaving directory '/var/public/SEMESTER/discrete/dcf1' make: Leaving directory '/var/public/SEMESTER/discrete/dcf1'
 lab46:~/src/discrete$ cd dcf1 lab46:~/src/discrete$ cd dcf1
Line 225: Line 226:
  
 <cli> <cli>
-lab46:~/src/discrete/dcf1$ ./encode data/sample2.bmp 37 +lab46:~/src/discrete/dcf1$ ./encode in/sample3.txt out 3 
-input name length: 16 bytes +   input name length: 14 bytes 
-   input filename: data/sample2.bmp +      input filename: in/sample3.txt 
-  output filename: data/sample2.bmp.rle +embedded name length: 11 bytes 
-     stride value: 37 bytes +  embedded file name: sample3.txt 
-          read in: 250934 bytes +  output name length: 19 bytes 
-        wrote out: 183758 bytes +     output filename: out/sample3.txt.rle 
- compression rate: 26.77%+        stride value: bytes 
 +             read in: 82 bytes 
 +    data written out: 78 bytes 
 +   total written out: 101 bytes 
 +    compression rate: 4.88%
 lab46:~/src/discrete/dcf1$  lab46:~/src/discrete/dcf1$ 
 </cli> </cli>
  
-With various formats, you'll likely want to play with the stride in order to find better compression scenarios.+Similarly, if we were to encode the **sample2.bmp** data file from dcf0 with the right stride, we can actually achieve a notable amount of compression (unlike our results from dcf0 with a stride fixed at 1 byte): 
 + 
 +<cli> 
 +lab46:~/src/discrete/dcf1$ ./encode ../dcf0/in/sample2.bmp out 37 
 +   input name length: 22 bytes 
 +      input filename: ../dcf0/in/sample2.bmp 
 +embedded name length: 11 bytes 
 +  embedded file name: sample2.bmp 
 +  output name length: 19 bytes 
 +     output filename: out/sample2.bmp.rle 
 +        stride value: 37 bytes 
 +             read in: 250934 bytes 
 +    data written out: 183730 bytes 
 +   total written out: 183753 bytes 
 +    compression rate: 26.78% 
 +lab46:~/src/discrete/dcf1$  
 +</cli> 
 + 
 +With various formats, you'll likely want to play with the stride in order to find better compression results.
 ====Decode==== ====Decode====
  
 <cli> <cli>
-lab46:~/src/discrete/dcf1$ ./decode data/sample5.txt.rle +lab46:~/src/discrete/dcf1$ ./decode in/sample0.txt.rle out 
-    input filename: data/sample5.txt.rle+    input filename: in/sample0.txt.rle
 output name length: 11 bytes output name length: 11 bytes
    output filename: sample5.txt    output filename: sample5.txt
Line 252: Line 275:
 </cli> </cli>
 =====Check Results===== =====Check Results=====
 +
 A good way to test that both encode and decode are working is to encode data then immediately turn around and decode that same data. If the decoded file is in the same state as the original, pre-encoded file, you know things are working. A good way to test that both encode and decode are working is to encode data then immediately turn around and decode that same data. If the decoded file is in the same state as the original, pre-encoded file, you know things are working.
  
-If you'd like to verify your implementations beyond simply encoding (and moving the original file out of the way), and then decoding, one can use the **md5sum** tool to verify an exact match.+====diff compare=== 
 +A quick way to check if two files are identical is to run the **diff(1)** command on them, so assuming the original file in **in/sample1.txt**, and the decoded version (which should be the same thing) in **tmp/sample1.txt**:
  
-Run it on the original unencoded file, then run it on the decoded file... the md5sum hashes should match.+<cli> 
 +lab46:~/src/discrete/dcf1$ diff in/sample1.txt tmp/sample1.txt 
 +lab46:~/src/discrete/dcf1$  
 +</cli> 
 + 
 +Just getting your prompt back indicates no major differences were found. 
 + 
 +====MD5sum compare==== 
 +If you'd like to be REALLY sure, generate MD5sum hashes and compare: 
 + 
 +<cli> 
 +lab46:~/src/discrete/dcf1$ md5sum in/sample1.txt tmp/sample1.txt 
 +10f9bc85023dcf37be2b04638cb45ee2  in/sample1.txt 
 +10f9bc85023dcf37be2b04638cb45ee2  tmp/sample1.txt 
 +lab46:~/src/discrete/dcf1$  
 +</cli> 
 + 
 +As you can see, both hashes match (the MD5sum hashes are analyzing the file contentsNOT the name/location). 
 + 
 +====Hex Dump/Visualization==== 
 +You may want to check and see what exactly your program is generating. 
 + 
 +This can be done by performing a hex data dump (or visualization) of the raw data in the output file. 
 + 
 +The tool I'd recommend for quick viewing is **xxd(1)**; please see the following example: 
 + 
 +<cli> 
 +lab46:~/src/discrete/dcf1$ xxd out/sample3.txt.rle 
 +0000000: 6463 6658 2052 4c45 0002 030b 7361 6d70  dcfX RLE....samp 
 +0000010: 6c65 332e 7478 7401 6162 6201 6363 6301  le3.txt.abb.ccc. 
 +0000020: 6464 6401 6465 6501 6565 6502 6666 6602  ddd.dee.eee.fff. 
 +0000030: 6767 6701 6768 6802 6868 6803 6969 6902  ggg.ghh.hhh.iii. 
 +0000040: 6a6a 6a01 6a6a 6b02 6b6b 6b02 6c6c 6c01  jjj.jjk.kkk.lll. 
 +0000050: 6d6d 6d01 6d6d 6e01 6e6e 6e01 6f6f 6f01  mmm.mmn.nnn.ooo. 
 +0000060: 7070 7101 0a                             ppq.. 
 +lab46:~/src/discrete/dcf1$  
 +</cli> 
 + 
 +With this output, we can confirm, byte-by-byte, what has been placed in our encoded file. What you'll see are three fields: 
 + 
 +  * leftmost: byte offset (from start of file) 
 +  * middle: hex data (in pairs- big endian by default, so as you expect to read it) 
 +  * rightmost: the ASCII-ized representation of the middle data 
 + 
 +=====Verify Results===== 
 +If you'd like to verify your implementations, there is a **check** script included when you use the **grabit** tool to obtain the skeleton files and data. 
 + 
 +To run it, you need a functioning **encode** and **decode** program (although it does its best otherwise). 
 + 
 +It runs through four separate tests, storing the results in a corresponding **o#/** directory (sometimes, if applicable, intermediate results in a corresponding **m#/** directory): 
 + 
 +  * test 0: take the raw data files in **in/** and encodes them (**o0/**) 
 +  * test 1: take pre-encoded data files in **in/** and decodes them (**o1/**) 
 +  * test 2: take the raw data files in **in/**, encodes them (**m2/**), then decodes them (**o2/**) 
 +  * test 3: take pre-encoded data files in **in/**, decodes them (**m3/**), then encodes them (**o3/**) 
 + 
 +How it works: 
 + 
 +  - depending on the test, encodes or decodes a file in the **in/** directory. 
 +    * if single step, result is in **o#/** directory 
 +    * if multi-step, result is in **m#/** directory, then second operation puts its result into **o#/**  
 +  - A checksum is taken of the original file in **in/** 
 +  - Another checksum is taken of the new file in **o#/** 
 +  - The checksums are comparedIf they match, "OK" is displayed; if they do not match, a corresponding "FAIL" message appears. 
 + 
 +====Successful operation==== 
 +If all goes according to plan, you'll see "OK" status messages displayed. 
 + 
 +<cli> 
 +lab46:~/src/discrete/dcf1$ ./check 
 +================================================= 
 += PHASE 0: Raw -> Encode data verification test = 
 +================================================= 
 +in/ascii1.art -> o0/ascii1.art.rle: OK 
 +in/ascii3.art -> o0/ascii3.art.rle: OK 
 +in/ascii7.art -> o0/ascii7.art.rle: OK 
 +in/ascii8.art -> o0/ascii8.art.rle: OK 
 +in/blunders2.mp3 -> o0/blunders2.mp3.rle: OK 
 +in/blunders4.mp3 -> o0/blunders4.mp3.rle: OK 
 +in/blunders7.mp3 -> o0/blunders7.mp3.rle: OK 
 +in/blunders93.mp3 -> o0/blunders93.mp3.rle: OK 
 +in/sample1.txt -> o0/sample1.txt.rle: OK 
 +in/sample2.txt -> o0/sample2.txt.rle: OK 
 +in/sample3.txt -> o0/sample3.txt.rle: OK 
 +in/sample4.txt -> o0/sample4.txt.rle: OK 
 +in/sprite13.png -> o0/sprite13.png.rle: OK 
 +in/sprite1.png -> o0/sprite1.png.rle: OK 
 +in/sprite2.png -> o0/sprite2.png.rle: OK 
 +in/sprite7.png -> o0/sprite7.png.rle: OK 
 +... 
 +</cli> 
 + 
 +====Unsuccessful operation==== 
 +Should something not work correctly, you'll see a "FAIL" message: 
 + 
 +<cli> 
 +lab46:~/src/discrete/dcf1$ ./check 
 +================================================= 
 += PHASE 0: Raw -> Encode data verification test = 
 +================================================= 
 +in/ascii1.art -> o0/ascii1.art.rle: OK 
 +in/ascii3.art -> o0/ascii3.art.rle: OK 
 +in/ascii7.art -> o0/ascii7.art.rle: OK 
 +in/ascii8.art -> o0/ascii8.art.rle: OK 
 +in/blunders2.mp3 -> o0/blunders2.mp3.rle: OK 
 +in/blunders4.mp3 -> o0/blunders4.mp3.rle: OK 
 +in/blunders7.mp3 -> o0/blunders7.mp3.rle: OK 
 +in/blunders93.mp3 -> o0/blunders93.mp3.rle: FAIL: checksums do not match 
 +in/sample1.txt -> o0/sample1.txt.rle: OK 
 +in/sample2.txt -> o0/sample2.txt.rle: OK 
 +in/sample3.txt -> o0/sample3.txt.rle: OK 
 +in/sample4.txt -> o0/sample4.txt.rle: OK 
 +in/sprite13.png -> o0/sprite13.png.rle: OK 
 +in/sprite1.png -> o0/sprite1.png.rle: OK 
 +in/sprite2.png -> o0/sprite2.png.rle: OK 
 +in/sprite7.png -> o0/sprite7.png.rle: OK 
 +... 
 +</cli> 
 + 
 +====Incomplete operation==== 
 +Should something not work at all (like a missing or uncompiling decode binary), you'll see a "MISSING" message: 
 + 
 +<cli> 
 +lab46:~/src/discrete/dcf1$ ./check 
 +... 
 +================================================= 
 += PHASE 1: Decode -> Raw data verification test = 
 +================================================= 
 +Missing 'decode', skipping test. 
 +... 
 +</cli>
  
-The **diff(1)** tool will also likely work well enough for our endeavors here. 
 =====Submission===== =====Submission=====
-====Project Submission====+To successfully complete this project, the following criteria must be met: 
 + 
 +  * Code must compile cleanly (no warnings or errors) 
 +  * Output must be correct, and match the form given in the sample output above. 
 +  * Implementations must be compliant to dcfX v2 spec, and pass all tests in the check tool. 
 +  * Code must be nicely and consistently indented (you may use the **indent** tool) 
 +  * Code must implement the algorithm(s) presented above. 
 +    * **encode.c** 
 +    * **decode.c** 
 +  * indicated error conditions are identified and reported, along with expected program behavior 
 +  * Code must be commented 
 +    * comments must be meaningful and descriptive of the process (tell me how/why you're doing what you're doing) 
 +    * have a properly filled-out comment banner at the top 
 +      * be sure to include any compiling instructions, if they differ from just typing 'make' 
 +  * Track/version the source code in a repository 
 +  * Submit a copy of your source code to me using the **submit** tool. 
 To submit this program to me using the **submit** tool, run the following command at your lab46 prompt: To submit this program to me using the **submit** tool, run the following command at your lab46 prompt:
  
Line 284: Line 454:
  
 You should get some sort of confirmation indicating successful submission if all went according to plan. If not, check for typos and or locational mismatches. You should get some sort of confirmation indicating successful submission if all went according to plan. If not, check for typos and or locational mismatches.
- 
-====Submission Criteria==== 
-To be successful in this project, the following criteria must be met: 
- 
-  * Project must be submit on time, by the posted deadline. 
-    * Early submissions will earn 1 bonus point per full day in advance of the deadline. 
-      * Bonus eligibility requires an honest attempt at performing the project (no blank efforts accepted) 
-    * Late submissions will lose 25% credit per day, with the submission window closing on the 4th day following the deadline. 
-      * To clarify: if a project is due on Wednesday (before its end), it would then be 25% off on Thursday, 50% off on Friday, 75% off on Saturday, and worth 0% once it becomes Sunday. 
-      * Certain projects may not have a late grace period, and the due date is the absolute end of things. 
-  * all requested functionality must conform to stated requirements (either on this project page or in comment banner in source code files themselves). 
-  * code resulting in two binaries must be submitted: 
-    * source code that when compiled produces the **encode** program 
-      * if you're only using one file for the encode, that source file should be called **encode.c** 
-    * source code that when compiled produces the **decode** program 
-      * if you're only using one file for the decode, that source file should be called **decode.c** 
-  * Output generated must conform to any provided requirements and specifications (be it in syntax or sample output) 
-    * output obviously must also be correct based on input. 
-  * Processing must be correct based on input given and output requested 
-  * Specification details are NOT to be altered. This project will be evaluated according to the specifications laid out in this document. 
-  * Code must compile cleanly. 
-    * Each source file must compile cleanly (worth 3 total points): 
-      * 3/3: no compiler warnings, notes or errors. 
-      * 2/3: one of warning or note present during compile 
-      * 1/3: two of warning or note present during compile 
-      * 0/3: compiler errors present (code doesn't compile) 
-  * Code must be nicely and consistently indented (you may use the **indent** tool) 
-    * You are free to use your own coding style, but you must be **consistent** 
-    * Avoid unnecessary blank lines (some are good for readability, but do not go overboard- double-spacing your code will get points deducted). 
-    * Indentation will be rated on the following scale (worth 3 total points): 
-      * 3/3: Aesthetically pleasing, pristine indentation, easy to read, organized 
-      * 2/3: Mostly consistent indentation, but some distractions (superfluous or lacking blank lines, or some sort of "busy" ness to the code) 
-      * 1/3: Some indentation issues, difficult to read 
-      * 0/3: Lack of consistent indentation (didn't appear to try) 
-  * Code must be commented 
-    * Commenting will be rated on the following scale (worth 4 total points): 
-      * 4/4: Not only aesthetically pleasing, but also adequately explains the WHY behind what you are doing 
-      * 3/4: Aesthetically pleasing (comments aligned or generally not distracting), easy to read, organized 
-      * 2/4: Mostly consistent, some distractions or gaps in comments (not explaining important things) 
-      * 1/4: Light commenting effort, not much time or energy appears to have been put in. 
-      * 0/4: No original comments 
-      * should I deserve nice things, my terminal is usually 90 characters wide. So if you'd like to format your code not to exceed 90 character wide terminals (and avoid line wrapping comments), at least as reasonably as possible, those are two sure-fire ways of making a good impression on me with respect to code presentation and comments. 
-    * Sufficient comments explaining the point of provided logic **MUST** be present 
-  * Code must be appropriately modified 
-    * Appropriate modifications will be rated on the following scale (worth 3 total points): 
-      * 3/3: Complete attention to detail, original-looking implementation 
-      * 2/3: Lacking some details (like variable initializations), but otherwise complete (still conforms, or conforms mostly to specifications) 
-      * 1/3: Incomplete implementation (typically lacking some obvious details/does not conform to specifications) 
-      * 0/3: Incomplete implementation to the point of non-functionality (or was not started at all) 
-    * Implementation must be accurate with respect to the spirit/purpose of the project (if the focus is on exploring a certain algorithm to produce results, but you avoid the algorithm yet still produce the same results-- that's what I'm talking about here).. worth 3 total points: 
-      * 3/3: Implementation is in line with spirit of project 
-      * 2/3: Some avoidance/shortcuts taken (note this does not mean optimization-- you can optimize all you want, so long as it doesn't violate the spirit of the project). 
-      * 1/3: Generally avoiding the spirit of the project (new, different things, resorting to old and familiar, despite it being against the directions) 
-      * 0/3: entirely avoiding. 
-    * Error checking must be adequately and appropriately performed, according to the following scale (worth 3 total points): 
-      * 3/3: Full and proper error checking performed for all reasonable cases, including queries for external resources and data. 
-      * 2/3: Enough error checking performed to pass basic project requirements and work for most operational cases. 
-      * 1/3: Minimal error checking, code is fragile (code may not work in full accordance with project requirements) 
-      * 0/3: No error checking (code likely does not work in accordance with project requirements) 
-  * Track/version the source code in a repository 
-  * Submit a copy of your source code to me using the **submit** tool (**make submit** will do this) by the deadline. 
  
haas/fall2017/discrete/projects/dcf1.1504870282.txt.gz · Last modified: 2017/09/08 11:31 by wedge