This shows you the differences between two versions of the page.
Next revision | Previous revision | ||
haas:fall2019:discrete:projects:cnv0 [2019/11/05 12:16] – created wedge | haas:fall2019:discrete:projects:cnv0 [2019/11/05 12:30] (current) – [Specifications] wedge | ||
---|---|---|---|
Line 4: | Line 4: | ||
</ | </ | ||
- | ======Project: | + | ======Project: |
- | =====Errata===== | + | =====Objective===== |
- | Any changes | + | To create a program |
- | * < | + | =====Background===== |
+ | In mathematics, | ||
- | =====Objective===== | + | Expanding our view on the situation, when considering factors of a number, we have the presence of a " |
- | To apply your skills in implementing an encoding scheme that, in ideal circumstances, will lead to a smaller storage footprint. | + | |
- | ====This week's algorithm: RLE+control_sequences==== | + | For 17, a prime number, we have just ONE factor pair: 1 and 17: |
- | Last week's project dealt with the second version of this algorithm, implementing a configurable yet global stride value; this week we add another feature into the mix which should further improve overall efficiency, and perhaps even reduce wasted space. | + | * 17 % 1 == 0 |
+ | * 17 % 17 == 0 | ||
- | The addition? The use of a control sequence byte. | + | All other values (2-16) when we divide them into 17 results in a non-zero value for the remainder. |
- | What will happen here, is that instead of assuming the fixed count, value sequences throughout | + | In this way, prime, or primary, numbers, have exactly ONE factor pair. To further simplify matters, we can call it an N-ary(1) or nary(1) |
- | In dcf0, our stride value was fixed to 1 byte. We could only count up sequences of single byte runs, which in some cases yielded compression; | + | A secondary, or nary(2) number, on the other hand, has exactly TWO sets of factor pairs. |
- | In dcf1, we added a configurable stride, which can then start counting up new sorts of data runs (such as when groups of two bytes may see strings of repetition, or 5 bytes, or 11 bytes). | + | Take the number 6, for instance: |
- | In this project (dcf2), we still have a specified stride, but we eliminate unnecessary encodings when there' | + | * factor pair of 1 and 6 |
+ | * factor pair of 2 and 3 | ||
- | The control sequence consists of a special byte, designated at run-time (for encode), or read out of an encoded file's header (for decode). And is followed by a 1 byte count and 1 byte stride. In a similar manner to how we read the file name (read the file name length byte in the header, then proceed to count out the bytes that follow), we do the same here with our control sequence. | + | Where 17 was a primary number, 6 is a secondary number. |
- | To demonstrate what RLE+control_sequence does, let us look at the following data: | + | ====Determining factor pairs==== |
+ | We are going to be exploring a basic, brute force, method of determining factors for a number, and that is the "trial by division" | ||
- | aaaaaabcdbcdbcdddddddefghijklmnnnnnnowxyz (41 bytes total) | + | Here, we successively divide a number by potential factors, to see if the factor evenly divides into the number. For convenience, |
- | Encoding with our new algorithm, we would get the following (our control sequence byte will be a 2A, then followed by the count byte, then the stride byte, then the encoded data): | + | So, the number 5: |
- | * 2A 06 01 61 62 63 64 62 63 64 62 63 2A 07 01 64 65 66 67 68 69 6A 6B 6C 6D 2A 06 01 6E 6F 77 78 79 7A | + | * 5 % 2 == 1 |
- | * 34 bytes | + | * 5 % 3 == 2 |
+ | * 5 % 4 == 1 | ||
- | The advantage here is that if we were to have a long sequence of non-patterned data, we don't have to bother | + | No other evenly divisible factors |
- | =====dcfX RLE v3 specification===== | + | The number 14: |
- | You'll be writing an **encode** and a **decode** program implementing RLE+control_sequences, | + | |
- | ====Header==== | + | * 14 % 2 == 0 |
- | It is actually | + | * 14 % 3 == 2 |
- | | + | * 14 % 4 == 2 |
- | | + | * 14 % 5 == 4 |
+ | * 14 % 6 == 2 | ||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
+ | * 14 % 13 == 1 | ||
- | Every RL-encoded file will start with the following 12-byte header: | + | Because factor pairs ALWAYS come in a set of 2, we have the factor pairs of 1 and 14, along with 2 and 7. |
- | * byte 0: 0x64 | + | How about 12: |
- | * byte 1: 0x63 | + | |
- | * byte 2: 0x66 | + | |
- | * byte 3: 0x58 | + | |
- | * byte 4: 0x20 | + | |
- | * byte 5: 0x52 | + | |
- | * byte 6: 0x4c | + | |
- | * byte 7: 0x45 | + | |
- | * byte 8: (previously reserved, now control byte) | + | |
- | * byte 9: 0x03 (version of our RLE specification) | + | |
- | * byte 10: 1 byte for stride value | + | |
- | * byte 11: 1 byte for source file name's length (doesn' | + | |
- | * bytes 12 through length-1 indicated in byte 11: ASCII string of original filename to write. This will also be the name of the file **decode** creates, leaving the RL-encoded file intact. | + | |
- | Following this we will have a repeating sequence of **count** and **value** fields (where | + | |
+ | | ||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
- | =====Program===== | + | There are 4 additional factors discovered here, giving us a total of 6 factors, or three factor pairs: |
- | It is your task to write an encoder and decoder for this specification | + | |
- | | + | * 1, 12 |
- | | + | |
+ | * 3, 4 | ||
- | Your program should: | + | Notice also how the factors are nested: 1 and 12 are the outermost, 2 and 6 are encapsulated within that, and inside there, 3 and 4. |
- | * for encode, obtain 4 parameters from the command-line (see **command-line arguments** section below): | + | |
- | * argv[1]: name of path + source file | + | |
- | * this should be a file that exists, but you should do appropriate error checking and bail out if the file cannot be accessed | + | |
- | * argv[2]: path of destination location (NOT file NAME, just PATH). The destination file name will be generated, based on the source file and whether we are encoding or decoding: | + | |
- | * if encoding, destination is argv[2] + argv[1] (minus source path) + " | + | |
- | * if decoding, destination is argv[2] + embedded file name (which shouldn' | + | |
- | * argv[3]: stride value | + | |
- | * this is a value between 1 and 255 (inclusive). | + | |
- | * be sure to do error checking to make sure it is in this range. | + | |
- | * argv[4]: control byte | + | |
- | * this is a value between 0 and 255 (inclusive). | + | |
- | * for **encode**, | + | |
- | * for **decode**, only accept | + | |
- | * the output file name will be obtained from parsing the file's header data. | + | |
- | * be sure to perform appropriate error checking | + | |
- | * if the version byte in the header is 1, assume a 1 byte stride | + | |
- | * this way, v3 decode will be backwards compatible with v1 encode | + | |
- | * if the version byte in the header is 2, check the stride byte and decode according to RLE v2 (report as version 2 in output) | + | |
- | * once again, baking in backwards compatibility | + | |
- | * implement the specified algorithm in both encoding and decoding forms. | + | |
- | * please be sure to test it against varying types of data, to make sure it works no matter what you throw at it. | + | |
- | * calculate and display some statistics gleaned during the performance of the process. | + | |
- | * for example, **encode** should display information on: | + | |
- | * how many bytes read in | + | |
- | * how many bytes written out | + | |
- | * control byte used (display the hex value) | + | |
- | * stride | + | |
- | * compression rate | + | |
- | * **decode** should also display: | + | |
- | * RLE header information | + | |
- | * filename information | + | |
- | * control byte used (display the hex value) | + | |
- | * stride | + | |
- | * how many bytes read in | + | |
- | * how many bytes written out | + | |
- | * decompression rate | + | |
- | * see the sample program outputs below | + | |
- | * display errors to STDERR | + | |
- | * display run-time information to STDOUT | + | |
- | * your RL-encoded data **MUST** be conformant to the project specifications described above. | + | |
- | * you should be able to encode/ | + | |
- | * **decode** should validate the header information (is it encoded in version 1, 2, or 3? if not, complain to STDERR of " | + | |
- | * if the first 8 bytes of the header do not check out, error out with an " | + | |
- | * if the file only contains a header (and no encoded data), report to STDERR "empty data segment" | + | |
- | ====Other specification details==== | + | |
- | * end of file bytes should no longer be tricky in v3, simply treat them as unencoded data | + | Because there are 3 factor pairs, 12 would be considered an nary(3) value (or a tertiary number). |
- | * your program should | + | |
- | * your program also should avoid writing the EOF byte as a data byte (or reading it and processing it as anything other than a marker to stop reading). | + | =====Program===== |
- | * Since we are using single bytes to store our counts, one needs to be mindful not to allow the byte value to "roll over"; limit your counts to 255. | + | It is your task to write a program that, upon accepting various pieces of input from the command-line, computes the number of factor pairs of a given number or range of numbers, displaying to STDOUT all the numbers in that range that qualify as a nary number of the specification. |
- | * the **feof(3)** function can be of great use. | + | |
- | * what if the data contains as legitimate data the same byte we're using as our control byte? Simple, escape it with the control byte, then a count, stride, and the data. Your code should not be checking for control bytes in the middle of an encoded data packet, merely in raw data. It should be the only instance of a 01 count, 01 stride control sequence... | + | |
- | =====Grabit Integration===== | + | |
- | For those familiar with the **grabit** tool on lab46, I have made some skeleton files and a custom **Makefile** available for this project. | + | |
- | To " | + | =====Program run-time usage===== |
+ | Your program should accept command-line arguments as follows: | ||
<cli> | <cli> | ||
- | lab46:~/src/ | + | $ ./cnv0 NARY START END |
</ | </ | ||
- | Just another "nice thing" we deserve. | + | All are mandatory. If any are lacking or incorrect, display an error and exit with a non-zero value. |
- | NOTE: You do NOT want to do this on a populated dcf2 project directory-- it will overwrite files. Only do this on an empty directory. | + | =====Specifications===== |
+ | Your program should: | ||
- | =====Makefile fun===== | + | * have valid, descriptive variable names of length //no shorter than// 4 symbols |
- | With the Makefile, we have your basic compile | + | * have consistent, well-defined indentation (no less than 4 spaces per level of indentation) |
+ | * all code within the same scope aligned to its indentation level | ||
+ | * have proximal comments explaining | ||
+ | * to STDERR, prompt for the number (range appropriate of an unsigned long int) | ||
+ | * properly store this in a variable of type **unsigned long int** | ||
+ | * process the arguments, check to make sure the numbers provided are positive numbers greater than or equal to 1 or 2 (depending); | ||
+ | * the nary value must be a value greater than or equal to 1 | ||
+ | * the starting and ending values must be greater than or equal to 2 | ||
+ | * proceed to evaluate the appropriate number range, determining whether or not it is an nary number as specified. | ||
+ | * if it is, display the value to STDOUT in space-separated form (see execution section below for message) | ||
+ | * if it is not, do not display anything related to that value (again, see execution section below) | ||
+ | * using a single return statement at the conclusion of the code, return a 0 indicating successful operation | ||
- | * **make**: compile everything | + | Some additional points |
- | * **make debug**: compile everything with debug support | + | * Note that the driving variables in your loops need to be at least of type **short int**, otherwise you may get a warning when you compile |
- | * **make clean**: remove all binaries | + | |
- | * **make getdata**: re-obtain a fresh copy of project data files | + | |
- | * **make save**: make a backup of your project | + | |
- | * **make submit**: submit project (uses submit tool) | + | |
- | + | ||
- | =====Command-Line Arguments===== | + | |
- | + | ||
- | ====setting up main()==== | + | |
- | To accept (or rather, to gain access) to arguments given to your program at runtime, we need to specify two parameters to the main() function. While the names don't matter, the types do.. I like the traditional | + | |
- | + | ||
- | Please declare your main() function as follows: | + | |
- | + | ||
- | <code c> | + | |
- | int main(int argc, char **argv) | + | |
- | </ | + | |
- | + | ||
- | The arguments are accessible via the argv array, in the order they were specified: | + | |
- | + | ||
- | * argv[0]: program invocation (path + program name) | + | |
- | * argv[1]: our input file | + | |
- | * argv[2]: our output path | + | |
- | * argv[3]: our stride value (1-255) | + | |
- | * argv[4]: our control sequence byte (0-255) | + | |
- | + | ||
- | ====Simple argument checks==== | + | |
- | Although I'm not going to require extensive argument parsing or checking for this project, we should check to see if the minimal number of arguments has been provided: | + | |
- | + | ||
- | <code c> | + | |
- | if (argc < 3) // if less than 3 arguments have been provided | + | |
- | { | + | |
- | fprintf(stderr, | + | |
- | exit(1); | + | |
- | } | + | |
- | </ | + | |
=====Execution===== | =====Execution===== | ||
- | Your program output should be as follows (given the specific input): | ||
- | |||
- | ====Encode==== | ||
+ | ====Secondary number output==== | ||
<cli> | <cli> | ||
- | lab46: | + | lab46: |
- | dcfX v3 encode details | + | 4 6 8 9 10 |
- | ================================== | + | lab46: |
- | input name length: 11 bytes | + | |
- | input filename: sample2.bmp | + | |
- | output filename: ./ | + | |
- | | + | |
- | | + | |
- | read in: 250934 bytes | + | |
- | wrote out: 112390 bytes | + | |
- | | + | |
- | lab46: | + | |
</ | </ | ||
- | With various formats, you'll likely want to play with the stride in order to find better compression scenarios. | + | The execution of the program is short and simple- obtain the input, do the processing, produce the output, and then terminate. |
- | ====Decode==== | + | |
+ | =====Compiling===== | ||
+ | As we have been doing all along, use the following options to gcc when compiling: | ||
<cli> | <cli> | ||
- | lab46: | + | lab46: |
- | input filename: sample5.txt.rle | + | lab46: |
- | output name length: 11 bytes | + | |
- | | + | |
- | | + | |
- | control byte: 0x29 | + | |
- | stride value: 4 bytes | + | |
- | read in: 2734 bytes | + | |
- | wrote out: 3600 bytes | + | |
- | inflation rate: 24.06% | + | |
- | lab46: | + | |
</ | </ | ||
- | =====Check Results===== | ||
- | A good way to test that both encode and decode are working is to encode data then immediately turn around and decode that same data. If the decoded file is in the same state as the original, pre-encoded file, you know things are working. | ||
- | If you'd like to verify your implementations beyond simply encoding (and moving the original file out of the way), and then decoding, one can use the **md5sum** tool to verify an exact match. | + | =====Submission===== |
+ | To successfully complete this project, the following criteria must be met: | ||
- | Run it on the original unencoded file, then run it on the decoded file... the md5sum hashes should match. | + | * Code must compile cleanly (no notes, warnings, nor errors) |
+ | * Output must be correct, and match the form given in the sample output above. | ||
+ | * Code must be nicely and consistently indented | ||
+ | * Code must be well commented | ||
+ | * Do NOT double space your code. Group like statements together. | ||
+ | * Output Formatting (including spacing) of program must conform to the provided output (see above). | ||
+ | * Track/ | ||
+ | * Submit a copy of your source code to me using the **submit** tool. | ||
- | The **diff(1)** tool will also likely work well enough for our endeavors here. | ||
- | |||
- | =====Submission===== | ||
- | ====Project Submission==== | ||
To submit this program to me using the **submit** tool, run the following command at your lab46 prompt: | To submit this program to me using the **submit** tool, run the following command at your lab46 prompt: | ||
<cli> | <cli> | ||
- | lab46: | + | $ submit |
- | removed ' | + | |
- | removed ' | + | |
- | removed ' | + | |
- | + | ||
- | Project backup process commencing | + | |
- | + | ||
- | Taking snapshot of current project (cnv0) ... OK | + | |
- | Compressing snapshot of cnv0 project archive | + | |
- | Setting secure permissions on cnv0 archive | + | |
- | + | ||
- | Project backup process complete | + | |
Submitting discrete project " | Submitting discrete project " | ||
- | -> ../cnv0-DATESTRING-HOUR.tar.gz(OK) | + | -> cnv0.c(OK) |
SUCCESSFULLY SUBMITTED | SUCCESSFULLY SUBMITTED | ||
Line 250: | Line 160: | ||
You should get some sort of confirmation indicating successful submission if all went according to plan. If not, check for typos and or locational mismatches. | You should get some sort of confirmation indicating successful submission if all went according to plan. If not, check for typos and or locational mismatches. | ||
+ | What I'll be looking for: | ||
+ | |||
+ | < | ||
+ | 78: | ||
+ | *: | ||
+ | *: | ||
+ | *: | ||
+ | *: | ||
+ | *:cnv0:no negative compiler messages for program [13/13] | ||
+ | *:cnv0:code is pushed to lab46 repository [13/13] | ||
+ | </ | ||
+ | Additionally: | ||
+ | * Solutions not abiding by spirit of project will be subject to a 25% overall deduction | ||
+ | * Solutions not utilizing descriptive why and how comments will be subject to a 25% overall deduction | ||
+ | * Solutions not utilizing indentation to promote scope and clarity will be subject to a 25% overall deduction | ||
+ | * Solutions not organized and easy to read are subject to a 25% overall deduction |