Corning Community College

CSCS2320 Data Structures

~~TOC~~

Project: NODE0

Errata

This section will document any updates applied to the project since original release:

revision 1: Makefile enhancements
- fixed some operational typos in base Makefile
- tweaked appearance of update banner
- removed unnecessary blank lines, making compiling output longer than needed
revision 2: sanitize permissions
- just wanted to make sure we were dealing with consistent permissions across the class
- and, I wanted an update the whole class could apply so we could address any questions
revision 3: base Makefile improvements
- as I get ready to deploy the node0 project, I found some Makefile enhancements to backport
- it also somewhat continues the 2nd revision, ensuring consistent file permissions
revision 4: FIX for testing/node/app/node-app-display.c
- some of you may have noticed an “extra” value being displayed in your list (always a 0); this was due to a “one-too-many-times” off-by-one bug. Some may have worked around it- if so, great.
- if it is annoying you, this update fixes it. It will rename your existing code to node-app-display.c-3, so be sure to migrate your changes to the new file and proceed as usual.
- if you have already submitted the node0 project and all is fine, you don't need to bother with this; as I said, it can be easily worked around with logic.
revision 5: Base Makefile aesthetic enhancements (20141103)

Objective

To review structs and pointers, and see how these two concepts, when combined, produces an element central to our class explorations.

Reference

You absolutely, positively, MUST watch this video: http://www.youtube.com/watch?v=5VnDaHBi8dM

Structures

As we learned in C, there are two main composite data types available to us:

homogeneous (all elements are the same type). We call these arrays
heterogeneous (elements can differ). We call these structs

The struct effectively lets us design our own data type, by filling a container with all the types we need to aid us in solving some problem more effectively.

And what's more, structs are the basis for classes (they are essentially structs with some relaxed rules (i.e. they can have functions) and additional syntax and abilities (constructors, access control).

Structure elements are accessed with either the “.” or “->” operator, depending on the declared status of the struct (statically declared or dynamically declared (i.e. pointer)).

Structs declared as pointers use the “->” (I commonly see referred to as the structure pointer), where non-pointered structs use the “.”

For example:

struct rectangle {
    int length;
    int width;
    float area;
};

Can be used as a non-pointer instance:

struct rectangle box;
 
box.length = 12;
box.width = 10;
box.area = box.length * box.width;

or as a pointer:

struct rectangle *box;
 
box = (struct rectangle *) malloc (sizeof(struct rectangle));
 
box -> length = 12;
box -> width = 10;
box -> area = box -> length * box -> width;

Pointers

Pointers are commonly referred to as one of the main features that gives C its power (and, by relation, C++ too).

Pointers are actually just another type of variable: a memory variable.

Now you might be one to say- but aren't all variables “memory variables”? After all, variables reside in memory.

The distinction lies in the primary purpose of the variable, which we will take a closer look at. In essence, variables have 3 attributes:

name
contents
address

Whether we use it or not, everything has an address. This is how the computer keeps track of things. We use the name, but that gets translated back to the address.

A pointer merely adds in another capability- the ability to dereference an address. You see, since pointers are “memory variables”, that is precisely what they store: a memory address.

However, because C is a typed language (all variables must have specific data types indicated at the time of declaration), certain properties are applied to a declared variable, and therefore memory. So, instead of having a unique data type for a pointer, it ends up being a modification of an existing variable declaration (an integer pointer, a char pointer, a pointer to an array of short ints, a pointer to a struct, and even- a function pointer).

By being associated with an existing type, the compiler knows what rules to apply to the dereferenced contents at some destination memory address.

So, with a pointer we have a name, we have the contents (which instead of a number in accordance with a data type, we instead store a memory address), and we have the address of the pointer variable itself.

To make pointers useful, we dereference them, or access the memory address stored in the contents of our pointer. This may be described as “following the pointer”, since a pointer variable doesn't contain the data, it merely contains the address which contains the data (a level of abstraction out).

There are two operators in C associated with memory access:

* - dereference
& - address of

We've used both of these before– to pass by address to a function, we pass the address of a variable (effectively turning it into a pointer for the uses of the called function). We dereference to get at the contents of what a pointer points to, for if we forgot or didn't dereference, we'd instead just get the memory address (and if we weren't expecting this, we'd see “gibberish”, which would actually make more sense if we displayed it as a hexadecimal value, although still not the data we'd be looking for).

To declare a pointer, we merely specify a * before a variable's name.

int *var;

Now, a common mistake is to try and either store a value in our variable (again, as a pointer it is a memory variable– we don't store values, we store memory addresses), or to dereference it before it has been assigned.

As memory access is one of C's strengths, and memory protection incorporated into most modern operating systems, it isn't uncommon to experience some unique errors when dealing with pointers inappropriately in C. Specifically, we are likely to encounter a Segmentation Fault.

All a Segmentation Fault (aka Seg Fault, SIGSEGV, Signal 11) happens to be is an intervention from the operating system informing us we tried to access memory that we did not have control over. (You don't expect to get away consequence free by randomly wandering into some stranger's home, right? Same idea here.)

When initializing a pointer, we will either set it equal to the address of another variable, or, as will commonly be the case in Data Structures, we will allocate a block of memory and set our pointer's contents equal to the starting address of that otherwise unnamed memory region. Our pointer then also becomes the only means of referencing that memory– if we change the contents of our pointer, and nothing else is pointing to our allocated memory, that memory is lost and inaccessible. This is called a “memory leak”, and enough of them can exhaust the memory resources of a machine.

Nodes

Now let us look at the basic underlying unit that we will be spending the semester playing with in Data Structures: the node.

There's nothing special about a node- it is something we make, it is something we choose to call a node (we could easily call them something else). It contains both useful content and the ability to reference (or point to) other nodes. It is that referencing ability that makes it so viable and valuable in Data Structures.

nodes are the building blocks of linked lists, a peer to the array.

But as we know, arrays are limited to a fixed size- we must allocate the size by first usage. If don't use it all, memory is wasted; if we don't use enough, we're out of luck (short of allocating a new array that is larger, and copying all values from old to new– some languages have this as a “feature”, and tend to call it a dynamic array or vector).

The linked list, on the other hand, is only allocated a node at a time. It therefore allows a best fit for the data set we are working with. For this reason (and because we are allocating memory) linked lists are associated with dynamic memory programming… as the actual memory needs are determined during runtime.

What is / What is in the node

So what exactly IS this node thing?

For one, it is something we create. A self-created variable, eh? Anything that might come to mind as far as a packaging agent? A variable that can contain other variables?

If you are thinking struct, you'd be absolutely right.

We will use struct to package its contents.

What is in the node? Again, this answer depends on our particular needs. For now, lacking any specific application, let us put some value in our node (we'll call it value, as our node contains a value).

The node, to make it practical and useful, also needs the ability to reference other nodes. Pointers are they key for referencing other variables (via memory address storage and dereferencing), so our simple node will also contain a pointer.

What type of pointer? Well, a pointer to the type that defines our node- our struct.

Here it is:

struct node {
    char value;
    struct node *next;
};

To make our lives easier, we'll also follow up our node struct definition with a typedef, merely to save on typing (it isn't needed, it is only added for our convenience). So then we'd have:

struct node {
    char value;
    struct node *next;
};
typedef struct node Node;

So now, when we wish to create a new node (let's say first, which we'll also make a pointer), we have two ways we can declare it:

struct node *first;

or:

Node *first;

as I said, identical… just one is more convenient for us to type, due to the added typedef.

Allocation

Now, for any given node to be useful, we need to allocate memory for it. Otherwise, the first attempted access to any of its members is a segfault waiting to happen.

To allocate memory, we use malloc(), as follows (we'll assign memory for our first node, declared above, to reference):

first = (Node *) malloc (sizeof(Node));

We have to cast the return type of malloc() because it is somewhat type generic- it returns a pointer to raw memory, and in C that needs to be associated with a specific format. Since we're allocating memory to a node, we'll have the compiler apply the rules of the node struct we just created.

Putting a value in the node

Assigning (and retrieving) information from the node is merely a struct operation:

// to assign
first -> value = 12; // first is a pointer to a struct, so we use the structure pointer arrow
 
// to retrieve
printf("first's value is %d\n", first -> value);
 
// dealing with next- set it to an initial sane value
first -> next = NULL;

Linking our node to another node

The key to everything is linking one node to the next. This is also the part where confusion sets in, due to the level of abstraction at play.

I highly recommend drawing pictures to help you in tracing out what is going on, especially as you are first learning this… and likely throughout the semester. And by drawing pictures, I mean take out a sheet of paper and pen/pencil, and draw nodes (use circles), write the “value” in the circle, and draw one way arrows to identify where any pointers point.

At present, we have a node called first that we've allocated memory to and altered its contents, that diagram would look as follows:

Notice how all the elements of the node are dealt with (both value and the next pointer). And first, being a mere pointer to a struct, is a name that points to (because it merely contains the address of) the memory region we malloc()'ed and are storing our struct in.

Now,to link to another node (and put in, say, a 37 for its value) we'd do something along these lines:

first -> next = (Node *) malloc (sizeof(Node));
first -> next -> value = 37;
first -> next -> next = NULL;

And don't forget to update your diagram:

Use variables to make it easier to traverse the list

Since we need to keep a placeholder on our allocated memory, first is intended to be a more or less immovable aspect of our list (it is our link to everything- we don't want to adjust it unless we absolutely need to).

You may be noticing the potential for some very long code about to happen (what if we wanted to add a third node… those next's would become next → next, and so on). But there's a way to keep it simple (but ambiguous, at least without an updated diagram)… and that is to just use another variable, whose job is to be more of a temporary placeholder. We shall call it tmp.

Here is that same node construction logic, redone using an additional tmp node pointer, and also adding in a third node (containing the value 8):

Node *first, *tmp = NULL;

first = (Node *) malloc (sizeof(Node));
tmp = first;

tmp -> value = 12;
tmp -> next = NULL;

tmp -> next = (Node *) malloc (sizeof(Node));
tmp = tmp -> next;

tmp -> value = 37;
tmp -> next = NULL;

tmp -> next = (Node *) malloc (sizeof(Node));
tmp = tmp -> next;

tmp -> value = 8;
tmp -> next = NULL;

Our node diagram now looks as follows (but ideally, you'd have been updating it line by line as this program went along– do not wait “until the end”…. build and update your diagram as changes are happening):

Procedure for bootstrapping

Obtain

On Lab46, in /var/public/data/fall2014/, is a project directory called node0. Change to that directory.

lab46:~$ cd /var/public/data/fall2014/node0
lab46:/var/public/data/fall2014/node0$

In it will be the skeleton structure of what we'll be using for many of our projects this semester.

First order of business will be to obtain your own copy. This can be done simply by typing make copy at the prompt (it will place a copy in your home directory under ~/src/data/node0/ (feel free to move it should you wish to store it elsewhere):

lab46:/var/public/data/fall2014/node0$ make copy
...

When done, be sure to return to your home directory and wander in to your local copy of the node0 project:

lab46:/var/public/data/fall2014/node0$ cd ~/src/data/node0
lab46:~/src/data/node0$

Overview

You'll see various files and directories located here (one regular file, Makefile, and 5 directories). The directory structure (note, not all these directories may yet be present) for the project is as follows:

bin: compiled, executable programs will reside here
inc: project-related header files (which we can #include) are here
lib: compiled, archived object files (aka libraries) will reside here
src: subdirectory tree containing our Data Structure implementations
- node: location of our node implementation
- list: location of our linked list implementation (manipulation of nodes)
- stack: location of our stack implementation (manipulation of lists)
- queue: location of our queue implementation (a different manipulation of lists)
- …
testing: subdirectory tree containing our test apps and unit tests
- node: node-related testing files will be here
  - app: end-user applications, demonstrating use
  - unit: unit tests, helping to verify correct implementation
- list: list-related testing files
  - app: end-user applications, demonstrating use
  - unit: unit tests, helping to verify correct implementation
- …

Operating

The project is driven by a fleet of optimized Makefiles, which will facilitate the compiling process for us.

Each Makefile plays a unique role (the closer the Makefile is to the source code, the more specialized it becomes).

The base-level Makefile is used to enact whole-project actions, such as initiating a compile, cleaning the project directory tree of compiled and object code, submitting projects, or applying bug-fixes or upgrading to other projects.

Running make help will give you a list of available options:

lab46:~/src/data/node0$ make help

****************[ Data Structures List Implementation ]*****************
** make                     - build everything (libs and testing)     **
** make debug               - build everything with debug symbols     **
**                                                                    **
** make libs                - build all supporting libraries          **
** make libs-debug          - build all libraries with debug symbols  **
** make testing             - build unit tests                        **
** make testing-debug       - build unit tests with debugging symbols **
**                                                                    **
** make save                - create a backup archive                 **
** make submit              - submit assignment (based on dirname)    **
** make update              - check for and apply updates             **
**                                                                    **
** make clean               - clean; remove all objects/compiled code **
** make help                - this information                        **
************************************************************************
lab46:~/src/data/node0$

In general, you will likely make the most frequent use of these options:

make: just go and try to build everything
make debug: build with debugging support
make clean: clean out everything so we can do a full compile

Most of what you do will be some combination of those 3 options.

Project Submission

When you are done with the project and are ready to submit it, you simply run make submit:

lab46:~/src/data/node0$ make submit
...

Bugfixes and Updates

Sometimes, a typo or other issue will be uncovered in the provided code you have. I will endeavor to release updates which will enable you to bring your code up-to-date with my copy.

When a new update is available, you will start seeing the following appear as you go about using make:

lab46:~/src/data/node0$ make
*********************************************************
*** NEW UPDATE AVAILABLE: Type 'make update' to apply ***
*********************************************************
...
lab46:~/src/data/node0$

When this occurs, you may want to perform a backup (and/or commit/push any changes to your repository)– certain files may be replaced, and you do not want to lose any modifications you have made:

lab46:~/src/data/node0$ make save
...

Once you have done that, go ahead and apply the update:

lab46:~/src/data/node0$ make update
Update 1 COMMENCING
Update 1 CHANGE SUMMARY: Fixed base and other Makefile typos
    Please reference errata section on project page for more information
Update 1 COMPLETE
Updated from revision 0 to revision 1
lab46:~/src/data/node0$

At this point your code is up to date (obviously the above output will reflect whatever the current revision is).

upgrades

As the semester progresses, additional projects will be made available. When this occurs, you may notice new entries appear on the make help display. For example, as the deadline for the node0 project approaches, you will see the following appear:

...
** make update              - check for and apply updates             **
** make upgrade-node1       - upgrade to next project (node1)         **
**                                                                    **
** make clean               - clean; remove all objects/compiled code **
...

By typing make upgrade-node1, your current work on node0 will be copied into a new node1 directory (peer to node0), and any new files will be copied in from its project directory in /var/public/data/fall2014/node1/

As such, it is most advisable to have completed work on node0 before upgrading to the node1 project, so any work you've done will be immediately available to build upon in the next project (the projects will be comprehensive to one another– node1 will rely on work completed in node0, sll0 (the project after node1) will rely on the work done in node1, etc.).

Project Task

In testing/node/app/, you will find a file called: node-app-display.c

Take a look at the code already there. Figure out what is going on, make sure you understand it. It builds a list of nodes based on user input.

If you look at the bottom of the program, you'll see the following comment:

    // Display list from start to end

It is here I would like for you to add code that will display the contents of this arbitrary list of nodes, from beginning to end.

Building the code

You've made changes to node-app-display.c, and are ready to see your results. What do we do?

First, change back to the base of the project:

lab46:~/src/data/node0/testing/node/app$ cd ..
lab46:~/src/data/node0/testing/node$ cd ..
lab46:~/src/data/node0/testing$ cd ..
lab46:~/src/data/node0$

OR: You may want to have two terminals open- in one you are situated in ~/src/data/node0/testing/node/app/ editing away, and in the other you are in ~/src/data/node0/; this way you can take care of development activities AND easily check your results, without constantly navigating back and forth between various locations.

cleaning things out

If you've already done this a few times, you may want to clean things out and do a fresh compile (never hurts, and might actually fix some problems):

lab46:~/src/data/node0$ make clean

compile project

Next, compile the whole project

lab46:~/src/data/node0$ make

Our binaries

Compiled executables go in the bin directory, so if we change into there and take a look around we see:

lab46:~/src/data/node0$ cd bin
lab46:~/src/data/node0/bin$ ls
node-app-display  node-app-test  node-app-test2
lab46:~/src/data/node0/bin$

Run the program

To run node-app-display, we'd do the following (specify a relative path to the executable):

lab46:~/src/data/node0/bin$ ./node-app-display

The program will now run, and do whatever it was programmed to do.

Sample Output

For example, let's say we ran the program and put the values 6, 17, 23, 4, 56, and 2 in the list. Your completed program would look like this when run:

lab46:~/src/data/node0/bin$ ./node-app-display 
Enter a value (-1 to quit): 6
Enter a value (-1 to quit): 17
Enter a value (-1 to quit): 23
Enter a value (-1 to quit): 4
Enter a value (-1 to quit): 56
Enter a value (-1 to quit): 2
Enter a value (-1 to quit): -1
6 -> 17 -> 23 -> 4 -> 56 -> 2 -> NULL
lab46:~/src/data/node0/bin$

NOTE: This is just example input. Not only should your program work with this, but lists of any length, containing any arrangement of valid values.

Your task

You are specifically responsible for creating this line of output:

6 -> 17 -> 23 -> 4 -> 56 -> 2 -> NULL

It needs to work for whatever values are put in the list (which can range from 0 to infinite values).

You need to display the node's contents, and separate that from the next bit of information with a space separated “→”, to help show the continuity of nodes we've joined together.

Finally, when you have exhausted your list, display a terminating “NULL” to visually signify the completion of the task.

Submission Criteria

To be successful in this project, the following criteria must be met:

Code must compile cleanly (no warnings or errors)
Executed program must display in a manner similar to provided output
Output must be correct (i.e. the list visualization) based on values input
Code must be nicely and consistently indented (you may use the indent tool)
Code must be commented
Track/version the source code in a repository
Submit a copy of your source code to me using the submit tool (make submit will do this) by the deadline.

Lab46 Wiki

Sidebar

Table of Contents

Project: NODE0

Errata

Objective

Reference

Structures

Pointers

Nodes

What is / What is in the node

Allocation

Putting a value in the node

Linking our node to another node

Use variables to make it easier to traverse the list

Procedure for bootstrapping

Obtain

Overview

Operating

Project Submission

Bugfixes and Updates

upgrades

Project Task

Building the code

cleaning things out

compile project

Our binaries

Run the program

Sample Output

Your task

Submission Criteria

Lab46 Wiki

User Tools

Site Tools

Sidebar

Table of Contents

Project: NODE0

Errata

Objective

Reference

Structures

Pointers

Nodes

What is / What is in the node

Allocation

Putting a value in the node

Linking our node to another node

Use variables to make it easier to traverse the list

Procedure for bootstrapping

Obtain

Overview

Operating

Project Submission

Bugfixes and Updates

upgrades

Project Task

Building the code

cleaning things out

compile project

Our binaries

Run the program

Sample Output

Your task

Submission Criteria

Page Tools