Corning Community College
CSCS2330 Discrete Structures
======Project: Binary Data Tool (bdt2)======
=====Objective=====
To continue our binary data explorations in the creation of a useful tool for aiding us in the debugging of our dcfX endeavors.
=====Background=====
With our recent **xxd(1)** tool implementation in **bdt0**, and our rudimentary "first point of difference" "binary diff" that was implemented in **bdt1**, we are going to continue down this rabbit hole further by writing a tool to allow us to explore multiple points of difference (within reason).
One of the things I noticed while helping debug was a frequent loss of place while looking at the hex output of encoded data. Seconds were lost relocating the points of difference, and over time, those lost seconds add up.
It would be uniquely useful if we had a way to highlight the point (byte) of difference in two files, so we can then focus on why/how they are different, vs. devoting far too much time to discovering what is different.
=====Task=====
Your task is to write a custom binary difference visualizer, in a format not unlike that of **xxd(1)**, but certainly different from the output format we strove for in **bdt0** and **bdt1**.
The idea is to take 2 files as input, parse through those (ideally similar) files, until the a point of difference is found, at which point your tool will display:
* in the typical units of 16 bytes per line, show all the bytes on the line of difference (in both files)
* the byte(s) of difference are to be highlighted/coloured in some fashion
=====Thought empowerment vs. thought slavery=====
Something I've noticed with many who are so used to conforming and following authority, is that the thought of "questioning why things are" rarely comes into the picture.
I've certainly seen plenty of examples... of people messing something up, and then proceeding to live with the mistake, maybe bothered by the inconvenience, but seemingly powerless to fix it.
The thing is, **we** are very much in control, and if the universe doesn't conform to our demands, we must simply realign the universe.
So here, while debugging binary data... instead of just going with the flow and inconveniencing ourselves, losing our place and wasting time elongating our debugging process, we will be writing a specialized tool that should assist us greatly in the dcfX debugging process.
The key is to identify an inconvenience. If we have a tool that helps, but is limited, is that a limitation we can live with, or can we improve our overall process by improving the tool (either by extending it, or by writing a new tool altogether).
We've done this a bit with pipes... **xxd(1)** doesn't natively support capping its lines of display, so we've been using UNIX pipes to have commands like **head(1)** and **tail(1)** greatly enhance the utility of our **xxd(1)** output (versus haplessly scrolling through hundreds of lines of hex values). Thing is, how many would have done this if I never showed you examples?
So please, be on the lookout for limitations in the process- ANY process. Sometimes there is nothing we can really do, but other times, we definitely can. Don't just go with some mindless flow- constantly evaluate whatever process you are following:
* does it suit you?
* is it effective/efficient?
* what is detracting from ideal efficiency?
* what might improve the process?
* is there an existing tool that could be brought into the fold?
* have you investigated?
* have you asked?
* is there a new tool that can be written that would fill this niche?
* what would it do?
* would it be a burden to write?
There are constantly opportunities for enhancement of process. It is our job to identify strategic ones that can make significant gains. That's why we automate things with shell scripts, that's why we learn to solve problems, that's why we learn about different approaches to algorithms.
So these **bdt#** projects are a specific foray into this special case study of writing our own custom tool that can get a certain job done, faster. Reducing OUR particular need to keep tabs on something the computer is very much better at doing.
=====Implementation Restrictions=====
As our goal is not only to explore the more subtle concepts of computing but to promote different methods of thinking (and arriving at solutions seemingly in different ways), one of the themes I have been harping on is the stricter adherence to the structured programming philosophy. It isn't just good enough to be able to crank out a solution if you remain blind to the many nuances of the tools we are using, so we will at times be going out of our way to emphasize focus on certain areas that may see less exposure (or avoidance due to it being less familiar).
As such, the following implementation restrictions are also in place:
* use any **break** or **continue**, or other flow redirection statements sparingly. I am not forbidding their use, but I also don't want this to turn into a lazy solution free-for-all. I am letting you use them, but with **justification**.
* **justification** implies some thoughtful why/how style comments explaining how a particular use of one of these statements is effective and efficient (not: "I couldn't think of any other way to do it").
* absolutely **NO** infinite loops (**while(1)** or the like).
* ALL variables must be well and pertinently named, and be no fewer than 4 symbols in length
* no forced redirection of the flow of the process (no seeking to the end of the file to grab a max size only to zip back somewhere else: deal with the data in as you are naturally encountering it; no telling; no "ungetting" data back into the file).
* All "arrays" must be declared and referenced using ONLY pointer notation, NO square brackets.
* **NO** logic shunts (ie having an if statement nested inside a loop to bypass an undesirable iteration)- this should be handled by the loop condition!
* at most, only **one** return() statement per function. Error terminations should use **exit()**
Basically, I am going loosen my implementation restriction grip for this project: I would like you NOT to disappoint me. Write clean, well-indented, well-commented, effective code... show me that you have learned something from your programming experience.
=====Program Specifications=====
====Basic functionality====
Your program should:
* accept two files as command-line arguments (these would be the files you'd like to compare)
* display the address/offset on the left just as **xxd(1)** does
* highlight the address
* display the row containing (and colouring/highlighting) the identified byte(s) of difference for the first, then second file
* default to the first line of difference
* with the presence of a set **THROTTLE** variable, allow your bdt2 program to display the specified number of matches (1 or more). NOTE: there is no "0 for unlimited" functionality here.
* utilize flag logic to set important functionality states (like displaying a line containing differences)
In bdt1, the focus is the FIRST byte of difference. Here in bdt2, we are looking at supporting multiple bytes of difference, perhaps even occurring on the same line.
=====Access variables in your C program=====
Setting a variable in your terminal (or even on the command-line prefixing the execution of your bdt2 tool) allows you to communicate information to your program, enabling it to change or alter its default behaviour (just as command-line arguments do)
For this project, you can make use of the **getenv(3)** function, provided in the C standard library (symbols included via the **stdlib.h** file).
Excerpted from the **getenv(3)** manual page:
NAME
getenv - get an environment variable
SYNOPSIS
#include
char *getenv (const char *name);
DESCRIPTION
The getenv() function searches the environment list to find
the environment variable name, and returns a pointer to the
corresponding value string.
RETURN VALUE
The getenv() function returns a pointer to the value in the
environment, or NULL if there is no match.
Your "const char *name" is the name of the variable set (for the purposes of this project, call it **THROTTLE**). Check the example execution to see it in use.
You may notice similarity in parameter usage to that of other standard library functions that you've utilized, like **atoi(3)**.
**NOTE:** You may **not** want to nest your **getenv(3)** call as a parameter to **atoi(3)**; in the event of an error (or no such variable being set), a NULL tends to cause a segmentation fault.
=====Reference Implementations=====
To assist you in your off-system development efforts, the **bin/** directory in the bdt2 project (available on lab46 via grabit) has 2 compiled binaries:
* bin/ref_bdt2.aarch64 (use on the Raspberry Pi)
* bin/ref_bdt2.x86_64 (use on lab46, or an intel-compatible system)
Since the Raspberry Pi uses an ARM processor and lab46 uses an Intel processor, binaries are not compatible across architectures.
If you are not sure what architecture your development system uses, run the **uname -m** command:
lab46:~$ uname -m
x86_64
yourpi:~$ uname -m
aarch64
Your bdt2 implementation should match the behaviour and appearance of these reference implementations.
=====Output=====
An example of expected output:
{{ :haas:fall2020:discrete:projects:bdt2_output.jpg |}}
=====Submission=====
To successfully complete this project, the following criteria must be met:
* Code must compile cleanly (no warnings or errors)
* Use the **-Wall** and **--std=gnu99** flags when compiling.
* Code must be nicely and consistently indented (you may use the **indent** tool)
* Code must utilize the algorithm/approach presented above
* Output **must** match the specifications presented above (when given the same inputs)
* Code must be commented
* be sure your comments reflect the **how** and **why** of what you are doing, not merely the **what**.
* Track/version the source code in a repository
* Submit a copy of your source code to me using the **submit** tool.
To submit this program to me using the **submit** tool, run the following command at your lab46 prompt:
lab46:~/src/desig/bdt2$ submit discrete bdt2 bdt2.c
Submitting discrete project "bdt2":
-> bdt2.c(OK)
SUCCESSFULLY SUBMITTED
You should get some sort of confirmation indicating successful submission if all went according to plan. If not, check for typos and or locational mismatches.
What I will be looking for:
195:bdt2:final tally of results (195/195)
*:bdt2:bdt2.c compiles cleanly, no compiler messages [13/13]
*:bdt2:bdt2.c implements only specified algorithm [26/26]
*:bdt2:bdt2.c code conforms to project specifications [26/26]
*:bdt2:bdt2.c implementation free from restrictions [26/26]
*:bdt2:bdt2.c implements and utilizes THROTTLE usage [13/13]
*:bdt2:bdt2.c implements and utilizes a flag in logic [13/13]
*:bdt2:bdt2 runtime output conforms to reference [26/26]
*:bdt2:bdt2 runtime output matches reference [26/26]
*:bdt2:bdt2 committed, pushed to lab46 repository [26/26]
Additionally:
* Solutions not abiding by spirit of project will be subject to a 25% overall deduction
* Solutions not utilizing descriptive why and how comments will be subject to a 25% overall deduction
* Solutions not utilizing indentation to promote scope and clarity will be subject to a 25% overall deduction
* Solutions not organized and easy to read are subject to a 25% overall deduction