This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
haas:fall2017:discrete:projects:bdt1 [2017/10/02 15:11] – [Background] wedge | haas:fall2017:discrete:projects:bdt1 [2017/10/19 14:28] (current) – wedge | ||
---|---|---|---|
Line 3: | Line 3: | ||
< | < | ||
</ | </ | ||
- | |||
- | ~~TOC~~ | ||
======Project: | ======Project: | ||
Line 18: | Line 16: | ||
It would be uniquely useful if we had a way to highlight the first point (byte) of difference in two files, so we can then focus on why/how they are different, vs. devoting far too much time to discovering what is different. | It would be uniquely useful if we had a way to highlight the first point (byte) of difference in two files, so we can then focus on why/how they are different, vs. devoting far too much time to discovering what is different. | ||
=====Task===== | =====Task===== | ||
- | Your task is to write a hex viewer, along the lines of the **xxd(1)** | + | Your task is to write a custom binary difference visualizer, in a format not unlike that of **xxd(1)**, but certainly different from the output format we strove for in **bdt0**. |
- | An insightful exploration | + | The idea is to take 2 files as input, parse through those (ideally similar) files, until the first point of difference is found, at which point your tool will display: |
- | =====Experiencing xxd===== | + | * the bytes leading up to the difference (in both files) |
- | If we don't know what it is we are implementing, | + | * the byte of difference |
+ | | ||
- | < | + | =====Thought empowerment vs. thought slavery===== |
- | lab46: | + | Something I've noticed with many, who are so used to conforming and following authority, is that the thought of questioning why things are rarely comes into the picture. |
- | > | + | |
- | [abcdefghijklmnopqrstuvwxyz] | + | |
- | 01: BINARY | + | |
- | 01234567: | + | |
- | 0123456789: | + | |
- | 0123456789ABCDEF: | + | |
- | )!@# | + | |
- | . | + | |
- | lab46: | + | |
- | </ | + | |
- | Note how it is filled with ASCII text- many of our recognizable symbols we use when using a text editor. | + | I've certainly seen plenty |
- | But, to illustrate how text is just a form of binary, witness what we are shown when we peel away a layer, and view the binary data (represented in hex for convenience) of that same file: | + | The thing is, **we** are very much in control, and if the universe doesn' |
- | < | + | So here, while debugging binary data... instead of just going with the flow and inconveniencing ourselves, losing our place and wasting time elongating our debugging process, we will be writing a specialized tool that should assist us greatly in the dcfX debugging process. |
- | lab46: | + | |
- | 0000000: 3e41 4243 4445 4647 4849 4a4b 4c4d 4e4f > | + | |
- | 0000010: 5051 5253 5455 5657 5859 5a3c 0a5b 6162 PQRSTUVWXYZ< | + | |
- | 0000020: 6364 6566 6768 696a 6b6c 6d6e 6f70 7172 cdefghijklmnopqr | + | |
- | 0000030: 7374 7576 7778 797a 5d0a 3031 3a09 0920 stuvwxyz].01:.. | + | |
- | 0000040: 4249 4e41 5259 0a30 3132 3334 3536 373a BINARY.01234567: | + | |
- | 0000050: 0920 4f43 5441 4c0a 3031 3233 3435 3637 . OCTAL.01234567 | + | |
- | 0000060: 3839 3a09 2044 4543 494d 414c 0a30 3132 89:. DECIMAL.012 | + | |
- | 0000070: 3334 3536 3738 3941 4243 4445 463a 4845 3456789ABCDEF: | + | |
- | 0000080: 5841 4445 4349 4d41 4c0a 2921 4023 2425 XADECIMAL.)!@# | + | |
- | 0000090: 5e26 2a28 0a2e 0a ^& | + | |
- | lab46: | + | |
- | </ | + | |
- | The EXACT same file, with the EXACT same arrangement of data, only represented more as the computer looks at it (sequentially, one byte immediately following the next). | + | The key is to identify an inconvenience. If we have a tool that helps, but is limited, is that a limitation we can live with, or can we improve our overall process by improving |
- | The output of **xxd(1)** | + | We've done this a bit with pipes... |
- | - the address or offset (from the start of file). This is a hexadecimal address, starting at 0 (beginning of the file), and increments according to the number of bytes displayed. You'll notice that there are (at maximum) the same number | + | |
- | - the actual data (represented in hex); here we see 8 columns | + | |
- | - the ASCII rendering (far right field); if we are viewing an ASCII file, we will easily see the ASCII contents of this file. If we are viewing a non-ASCII file, we may still see random ASCII values, but that is just that the value stored in the particular byte maps to that ASCII value, and should NOT be considered actual ASCII data. | + | |
- | This is one of those conceptual roadblocks many develop- they think that binary is somehow more complicated than it is, and create all sorts of obstacles to effective access. Here we will try to break down some of those walls, because this is really | + | So please, be on the lookout for limitations in the process- ANY process. Sometimes there is nothing we can really |
- | Your task is to write a C program that takes a file name as a command-line argument, opens that file, reads its contents, and displays that data to the screen in the manner | + | * does it suit you? |
+ | * is it effective/ | ||
+ | * what is detracting from ideal efficiency? | ||
+ | * what might improve | ||
+ | * is there an existing tool that could be brought into the fold? | ||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
- | Your program must: | + | There are constantly opportunities for enhancement of process. It is our job to identify strategic ones that can make significant gains. That's why we automate things with shell scripts, that's why we learn to solve problems, that's why we learn about different approaches to algorithms. |
- | | + | So these **bdt#** projects are a specific foray into this special case study of writing our own custom tool that can get a certain job done, faster. Reducing OUR particular need to keep tabs on something |
- | | + | |
- | | + | |
- | * Where **filename** is the name of the file specified on the command-line (make sure the quotes surround it in the output). | + | |
- | * no further processing should be done if the file is not able to be accessed. | + | |
- | * Detect the current size of the terminal (see " | + | |
- | * If the terminal your program is being run in is **less than** 80 columns, display an error message and exit. | + | |
- | * error message should be of the form: **Error: Terminal width is less than 80 columns!** | + | |
- | * Your program will only be displaying to an area up to 80 characters wide, so a wider terminal will not influence program output. | + | |
- | * Similarly, if the number of lines in the terminal is **less than** 20, display a similar error message and exit. | + | |
- | * error message should be of the form: **Error: Terminal height is less than 20 lines!** | + | |
- | * Unlike the width, the height | + | |
- | * The second command-line argument is a sizing throttle (controlling the number of lines your program will display). If no argument, or a **0** is given, assume autosize (use the detected height | + | |
- | * Each row will display: | + | |
- | * a 7-digit hex offset (referring to the first data byte on a given line) | + | |
- | * followed by a colon and a single space | + | |
- | * then eight space separated groups of two bytes | + | |
- | * however you arrive at it: two total spaces following | + | |
- | * a 16-character ASCII representation field (no separating spaces between the values) | + | |
- | * all printable characters should be displayed. | + | |
- | * all non-printable (and various whitespace) characters should be substituted with a ' | + | |
- | * A newline will be the last character on each line. | + | |
- | * The hex values and rendered ASCII displayed will be sourced from the file specified on the command-line. While the target files for this project are less than 512 bytes, your program should be able to handle larger and smaller files, and update its display accordingly. | + | |
- | * If a line throttle | + | |
- | * Once the data in the file has been exhausted, you need to wrap up as appropriate; | + | |
- | * Don't forget to **fclose()** any open file pointers! And **free()** any **malloc()**' | + | |
- | * If provided (via command-line arguments), highlight the offset field and the specified address + length (see below). | + | |
- | * If the last pair is not complete (ie only an address given), ignore that request. | + | |
- | =====Detecting Terminal Size===== | + | =====Implementation Restrictions===== |
- | To detect the current size of your terminal, you may make use of the following code, provided in the form of a complete program for you to test, and then adapt into your code as appropriate. | + | |
- | It makes use of a pre-existing **structure**, | + | As our goal is not only to explore the more subtle concepts |
- | <code c> | + | As such, the following implementation restrictions are also in place: |
- | #include < | + | |
- | #include < | + | |
- | #include < | + | |
- | int main (void) | + | * use any **break** or **continue** statements sparingly. I am not forbidding their use, but I also don't want this to turn into a lazy solution free-for-all. I am letting you use them, but with **justification**. |
- | { | + | * absolutely **NO** infinite loops (**while(1)**). |
- | | + | * no forced redirection of the flow of the process (no seeking to the end of the file to grab a max size only to zip back somewhere else: deal with the data in as you are naturally encountering it). |
- | | + | * All " |
+ | | ||
- | printf (" | + | Basically, I am going loosen my implementation restriction grip for this project: I would like you NOT to disappoint me. Write clean, effective |
- | printf (" | + | |
- | return (0); | + | |
- | } | + | |
- | </code> | + | |
- | An **ioctl(2)** is a method (and system/ | + | =====Program Specifications===== |
+ | For this project, I am looking for a minimum subset | ||
- | Here we are accessing the information on our terminal file, retrieving the width and height so that we can make use of them productively in our programs. | + | ====Basic functionality==== |
+ | Your program should: | ||
- | Compile and run the above code to see how it works. Try it in different size terminals. Then incorporate | + | * accept two files as command-line arguments (these would be the files you'd like to compare) |
+ | * display | ||
+ | * display the row preceding the first identified byte of difference | ||
+ | * display the row containing (and coloring/ | ||
+ | * display the row following the identified byte of difference for the first, then second file | ||
- | =====Selection highlighting===== | + | The focus is the FIRST byte of difference. The algorithm could get considerably trickier when dealing with additional differences (especially if extra bytes are involved in the difference). |
- | The following adds a nice visual twist to things: | + | |
- | * Enhance the program | + | ====Bonus opportunities==== |
- | * For any line containing this colorized text, highlight the address in bold white. | + | Some ideas to enhance your program |
- | ====Sample output==== | + | * accept some sort of mode argument, a number, that would alter the behavior of your tool. Such as: |
+ | * 0: display as project specifies | ||
+ | * 1: display on separate lines, vs. the same line of difference (first file, newline, second file). | ||
+ | * additional modes as justified | ||
+ | * accept numeric offset arguments, 1 for each file, to instruct your tool where they should start reading/ | ||
+ | * this would be a way for your tool to natively support " | ||
+ | * this would likely require displaying the pertinent offsets for each file. | ||
+ | * you could endeavor to explore some algorithmic enhancements to automatically detect additional points of difference. Note that this could be rather fragile, depending on the identified differences. | ||
- | As an example, running the program with the following arguments could produce results like this: | + | =====Output===== |
- | + | A basic mockup | |
- | {{: | + | |
- | + | ||
- | ====ANSI escape sequences for color==== | + | |
- | This probably isn't very portable, and depending on the terminal, it may not work for some people. | + | |
- | + | ||
- | It may be most convenient to set up preprocessor #define statements near the top of your code, as follows: | + | |
- | + | ||
- | <code c> | + | |
- | # | + | |
- | # | + | |
- | # | + | |
- | # | + | |
- | # | + | |
- | # | + | |
- | # | + | |
- | # | + | |
- | # | + | |
- | # | + | |
- | # | + | |
- | # | + | |
- | # | + | |
- | # | + | |
- | # | + | |
- | # | + | |
- | # | + | |
- | # | + | |
- | </ | + | |
- | + | ||
- | To use, you output them: | + | |
- | + | ||
- | < | + | |
- | fprintf(stdout, | + | |
- | fprintf(stdout, | + | |
- | fprintf(stdout, | + | |
- | </ | + | |
- | + | ||
- | You have to remember to turn the color or setting off (resetting it) to revert back to the original color. | + | |
- | + | ||
- | You can mix and match as well: | + | |
- | + | ||
- | < | + | |
- | fprintf(stdout, | + | |
- | fprintf(stdout, | + | |
- | fprintf(stdout, | + | |
- | fprintf(stdout, | + | |
- | fprintf(stdout, | + | |
- | </ | + | |
- | + | ||
- | While there are 8 available foreground colors, bolding can double that range to 16. | + | |
- | + | ||
- | =====Implementation Restrictions===== | + | |
- | + | ||
- | As our goal is not only to explore the more subtle concepts of computing but to promote different methods of thinking | + | |
- | + | ||
- | As such, the following implementation restrictions are also in place: | + | |
- | + | ||
- | * absolutely **NO** switch/ | + | |
- | * absolutely **NO** infinite loops (**while(1)**, | + | |
- | * no forced redirection of the flow of the process (no seeking to the end of the file to grab a max size only to zip back somewhere else: deal with the data in as you are naturally encountering it). | + | |
- | * With the exception of any negative values, all numbers should be transacted in hexadecimal (as in the values you assign and compare and manipulate in your code). | + | |
- | * No line must exceed 80 characters in width. | + | |
- | * All " | + | |
- | * For the highlighted address and lengths, store them in an array of structs (containing the //address// and //length// members). | + | |
- | * **NO** logic shunts (ie having an if statement nested inside a loop to bypass an undesirable iteration)- this should be handled by the loop condition! | + | |
+ | <cli> | ||
+ | lab46: | ||
+ | 0000090: 0011 2233 4455 6677 8899 aabb ccdd eeff | 0011 2233 4455 6677 8899 aabb ccdd eeff | ||
+ | 00000a0: 55aa 66bb 0401 77cc 88dd 99ee aaff 89af | 55aa 66bb 0501 77cc 88dd 99ee aaff 89af | ||
+ | 00000b0: 9988 7766 5544 3322 1100 ffee ddcc bbaa | 9988 7766 5544 3322 1100 ffee ddcc bbaa | ||
+ | lab46: | ||
+ | </ | ||
=====Submission===== | =====Submission===== | ||
To successfully complete this project, the following criteria must be met: | To successfully complete this project, the following criteria must be met: |