Table of Contents

Corning Community College

CSCS1320 C/C++ Programming

Project: C Binary Fun (cbf0)

Objective

To practice manipulating binary data in a C program (for fun and glory).

Background

We've had a newfound exposure to data this semester, and how programming languages like C interpret different forms of data; we look at things like data type and corresponding storage allocated, and we use these as ingredients in our programmatic solutions.

Yet- we've also inserted a layer of abstraction between us and the computer: integers and floating point values and ASCII characters… each with its own unique ways of accessing and manipulating.

The thing is: to the computer all data is largely the same: sequences of 1s and 0s, accessed in units of bytes.

This project will expose us to some of the underlying aspects of the realm of data that lies closer to the computer, that of “binary data”, where we often come in contact with hexadecimal values to aid us in the interaction.

So, as we are here to learn more about the computer, it only makes sense to steer some of our activities towards the manipulation of binary data as well- one cannot effectively solve a whole domain of problems if they have no idea how to work with it.

This project aims to ameliorate that.

Binary data merely refers to data as the computer stores it. The computer is a binary device, so its raw data (as it exists on various forms of storage and media) is often referred to as binary data, to reflect the 1s and 0s being represented.

The data we have become familiar with is textual data. We read from and write to files (even with those files commonly being the keyboard and screen) with the express purpose of retrieving or storing text with them. And with the use of various text processing tools, we can easily manipulate these text files.

But: did you know that all text data is also binary data?

The trick to remember is that its opposite is not always true: not all binary data is text. In fact most of it isn't. Text represents is a very narrow range of possible data values, and then only within a certain context. You may “see” random letters when viewing binary data, but there is no continuity. The data values that we utilize when interacting with text are also valid combinations of binary values. Which can mean almost anything.

So, text is really ONE (of many) possible representations of binary data. We need to gain a wider perspective and get more familiar with this more expansive and general notion of binary data.

The computer works in units of bytes, which these days means groups of 8 bits. C has the ability to arbitrarily read and write individual bytes of data, and we will want to make use of that to aid us in our current task.

Opening and reading from files

The nice thing about C is that it tends to embody the “everything is a file” mantra from UNIX.

What this means, basically, is that interacting with data in a file is really no different that interacting with data from the keyboard or data to the screen. We merely need a FILE pointer and appropriate resources allocated.

To interact with a file, we must first declare a pointer to type FILE that will be our point of transaction.

Common names for our file pointer variable are fp, fPtr, input, inp, but in reality can be anything you want.

The intention is that, of course, you name variables so they are meaningful in the context of the overall implementation.

    FILE *input  = NULL;

Opening a file with fopen()

To attach a file stream to a FILE pointer, we utilize a file opening function such as fopen().

It takes two arguments:

  1. the path and name of file we wish to open (provided as a string)
  2. the mode we wish to open the file as (provided as a string)

There are 3 common file opening modes (and combinations thereof, among others, sometimes dependent on the particular operating system being run). For now, I highly recommend just sticking to ONE mode of operation per FILE pointer. This can avoid messy things like data corruption and indirect logic/runtime errors.

The 3 file modes:

If we wanted to open the file “sample0.txt” in the current directory for reading, using the file pointer input, we would do the following:

    input  = fopen ("sample0.txt", "r");

Note the double quotes around each argument. They both need to be strings (ie array of char terminated with NULL terminator characters), and the double quotes enables this.

Reading from the file

If the file is filled with a set format of data you'd like to retrieve, such as one short integer per line (basically, a text file filled with numbers), we can just use our trusty and familiar fscanf() function. We merely have have to indicate the correct file pointer:

    short int value  = 0;
 
    ...
 
    fscanf (input, "%hd", &value);

If there is no simple universal “format” to the file, or if the raw information in the file is the information we are interested in, we need to instead look at it as a consecutive collection of bytes, and we can grab a char's worth of data (I would recommend starting out by looking at a file like this as a byte-by-byte or char-by-char endeavor… ignore trying to transact with groups of them until you get the process down with individual chars).

The fscanf() function is still viable here, but if all we're after is a char value, there's a special purpose input function we can use instead: fgetc()

To read a byte of data from a file and store it in our variable (called byte), we would do the following:

    char byte  = 0;
 
    ...
 
    byte  = fgetc (input);

The fgetc() function takes the intended FILE pointer it is to read from as its argument, so input should be a FILE pointer AND should have previously been fopen()'ed (and for reading!) prior to calling fgetc().

To make things easier, placing logic to read from a file in a loop can be a very powerful combination.

Task

Your task is to write a hex viewer, along the lines of the xxd(1) tool found on the system.

Experiencing xxd

If we don't know what it is we are implementing, we won't be all that successful. So, here's a quick overview of the xxd(1) tool we will be simulating aspects of; first up, a plain text look at a data file we will be processing:

lab46:~/src/cprog/cbf0$ cat sample0.txt
>ABCDEFGHIJKLMNOPQRSTUVWXYZ<
[abcdefghijklmnopqrstuvwxyz]
01:              BINARY
01234567:        OCTAL
0123456789:      DECIMAL
0123456789ABCDEF:HEXADECIMAL
)!@#$%^&*(
.
lab46:~/src/cprog/cbf0$ 

Note how it is filled with ASCII text- many of our recognizable symbols we use when using a text editor.

But, to illustrate how text is just a form of binary, witness what we are shown when we peel away a layer, and view the binary data (represented in hex for convenience) of that same file:

lab46:~/src/cprog/cbf0$ xxd sample0.txt
00000000: 3e41 4243 4445 4647 4849 4a4b 4c4d 4e4f  >ABCDEFGHIJKLMNO
00000010: 5051 5253 5455 5657 5859 5a3c 0a5b 6162  PQRSTUVWXYZ<.[ab
00000020: 6364 6566 6768 696a 6b6c 6d6e 6f70 7172  cdefghijklmnopqr
00000030: 7374 7576 7778 797a 5d0a 3031 3a20 2020  stuvwxyz].01:   
00000040: 2020 2020 2020 2020 2020 2042 494e 4152             BINAR
00000050: 590a 3031 3233 3435 3637 3a20 2020 2020  Y.01234567:     
00000060: 2020 204f 4354 414c 0a30 3132 3334 3536     OCTAL.0123456
00000070: 3738 393a 2020 2020 2020 4445 4349 4d41  789:      DECIMA
00000080: 4c0a 3031 3233 3435 3637 3839 4142 4344  L.0123456789ABCD
00000090: 4546 3a48 4558 4144 4543 494d 414c 0a29  EF:HEXADECIMAL.)
000000a0: 2140 2324 255e 262a 280a 2e0a            !@#$%^&*(...
lab46:~/src/cprog/cbf0$ 

The EXACT same file, with the EXACT same arrangement of data, only represented more as the computer looks at it (sequentially, one byte immediately following the next).

The output of xxd(1) has 3 distinct sections:

  1. the address or offset (from the start of file). This is a hexadecimal address, starting at 0 (beginning of the file), and increments according to the number of bytes displayed. You'll notice that there are (at maximum) the same number of bytes on each line, so the offset increments by that amount with each new line it displays.
  2. the actual data (represented in hex); here we see 8 columns of hex values, grouped together in pairs of two bytes (other hex viewers may separate into 16 columns, isolating each byte for better viewing).
  3. the ASCII rendering (far right field); if we are viewing an ASCII file, we will easily see the ASCII contents of this file. If we are viewing a non-ASCII file, we may still see random ASCII values, but that is just that the value stored in the particular byte maps to that ASCII value, and should NOT be considered actual ASCII data.

This is one of those conceptual roadblocks many develop- they think that binary is somehow more complicated than it is, and create all sorts of obstacles to effective access. Here we will try to break down some of those walls, because this is really important stuff to know.

Your task is to write a C program that takes a file name as a command-line argument, opens that file, reads its contents, and displays that data to the screen in the manner that the xxd(1) tool does in the above example (note that while the xxd(1) tool has other features, we are not looking to implement them; only this simple rendering view).

Your program must:

Sample output of your program should be as follows (compared to the xxd(1) output above):

00000000: 3e 41 42 43 44 45 46 47 48 49 4a 4b 4c 4d 4e 4f  >ABCDEFGHIJKLMNO
00000010: 50 51 52 53 54 55 56 57 58 59 5a 3c 0a 5b 61 62  PQRSTUVWXYZ<.[ab
00000020: 63 64 65 66 67 68 69 6a 6b 6c 6d 6e 6f 70 71 72  cdefghijklmnopqr
00000030: 73 74 75 76 77 78 79 7a 5d 0a 30 31 3a 20 20 20  stuvwxyz].01:   
00000040: 20 20 20 20 20 20 20 20 20 20 20 42 49 4e 41 52             BINAR
00000050: 59 0a 30 31 32 33 34 35 36 37 3a 20 20 20 20 20  Y.01234567:     
00000060: 20 20 20 4f 43 54 41 4c 0a 30 31 32 33 34 35 36     OCTAL.0123456
00000070: 37 38 39 3a 20 20 20 20 20 20 44 45 43 49 4d 41  789:      DECIMA
00000080: 4c 0a 30 31 32 33 34 35 36 37 38 39 41 42 43 44  L.0123456789ABCD
00000090: 45 46 3a 48 45 58 41 44 45 43 49 4d 41 4c 0a 29  EF:HEXADECIMAL.)
000000a0: 21 40 23 24 25 5e 26 2a 28 0a 2e 0a              !@#$%^&*(...

Detecting Terminal Size

To detect the current size of your terminal, you may make use of the following code, provided in the form of a complete program for you to test, and then adapt into your code as appropriate.

It makes use of a structure, which we have not extensively covered yet, but the example shows you how you can make use of an existing struct, which is all you have to do in the program (we're just using it to retrieve information to help us on our program).

#include <stdio.h>
#include <stdlib.h>
#include <sys/ioctl.h>
 
int main (void)
{
    struct winsize terminal;
    ioctl  (0, TIOCGWINSZ, &terminal);
 
    printf ("lines:   %d\n", terminal.ws_row);
    printf ("columns: %d\n", terminal.ws_col);
    return (0);
}

An ioctl(2) is a method (and system/library call) for manipulating underlying device parameters of special files (for the UNIX people: everything is a file, including your keyboard, and terminal screen). We are basically querying the screen (or accessing lower level information made possible by communicating with the driver of the device) to obtain some useful information.

Here we are accessing the information on our terminal file, retrieving the width and height so that we can make use of them productively in our programs.

Compile and run the above code to see how it works. Try it in different size terminals. Then incorporate the logic into your hex viewer for this project.

Submission

To successfully complete this project, the following criteria must be met:

To submit this program to me using the submit tool, run the following command at your lab46 prompt:

$ submit cprog cbf0 cbf0.c
Submitting cprog project "cbf0":
    -> cbf0.c(OK)

SUCCESSFULLY SUBMITTED

You should get some sort of confirmation indicating successful submission if all went according to plan. If not, check for typos and or locational mismatches.

Evaluation Criteria

What I will be looking for:

52:cbf0:final tally of results (52/52)
*:cbf0:cbf0.c compiles cleanly, no compiler messages [13/13]
*:cbf0:cbf0.c implements only specified algorithm [13/13]
*:cbf0:cbf0.c code conforms to project specifications [13/13]
*:cbf0:cbf0 runtime output conforms to specifications [13/13]

Additionally: