Corning Community College
CSCS2320 Data Structures
~~TOC~~
This section will document any updates applied to the project since original release:
To apply our recent list activities with structs and pointers, and see how these two things, when combined, produces an element central to our class explorations.
You absolutely, positively, MUST watch this video: http://www.youtube.com/watch?v=5VnDaHBi8dM
As we learned in C, there are two main composite data types available to us:
The struct effectively lets us design our own data type, by filling a container with all the types we need to aid us in solving some problem more effectively.
And what's more, structs are the basis for classes (they are essentially structs with some relaxed rules (i.e. they can have functions) and additional syntax and abilities (constructors, access control).
Structure elements are accessed with either the “.” or “->” operator, depending on the declared status of the struct (statically declared or dynamically declared (i.e. pointer)).
Structs declared as pointers use the “->” (I commonly see referred to as the structure pointer), where non-pointered structs use the “.”
For example:
struct rectangle { int length; int width; float area; };
Can be used as a non-pointer instance:
struct rectangle box; box.length = 12; box.width = 10; box.area = box.length * box.width;
or as a pointer:
struct rectangle *box; box = (struct rectangle *) malloc (sizeof(struct rectangle)); box -> length = 12; box -> width = 10; box -> area = box -> length * box -> width;
Pointers are commonly referred to as one of the main features that gives C its power (and, by relation, C++ too).
Pointers are actually just another type of variable: a memory variable.
Now you might be one to say- but aren't all variables “memory variables”? After all, variables reside in memory.
The distinction lies in the primary purpose of the variable, which we will take a closer look at. In essence, variables have 3 attributes:
Whether we use it or not, everything has an address. This is how the computer keeps track of things. We use the name, but that gets translated back to the address.
A pointer merely adds in another capability- the ability to dereference an address. You see, since pointers are “memory variables”, that is precisely what they store: a memory address.
However, because C is a typed language (all variables must have specific data types indicated at the time of declaration), certain properties are applied to a declared variable, and therefore memory. So, instead of having a unique data type for a pointer, it ends up being a modification of an existing variable declaration (an integer pointer, a char pointer, a pointer to an array of short ints, a pointer to a struct, and even- a function pointer).
By being associated with an existing type, the compiler knows what rules to apply to the dereferenced contents at some destination memory address.
So, with a pointer we have a name, we have the contents (which instead of a number in accordance with a data type, we instead store a memory address), and we have the address of the pointer variable itself.
To make pointers useful, we dereference them, or access the memory address stored in the contents of our pointer. This may be described as “following the pointer”, since a pointer variable doesn't contain the data, it merely contains the address which contains the data (a level of abstraction out).
There are two operators in C associated with memory access:
We've used both of these before– to pass by address to a function, we pass the address of a variable (effectively turning it into a pointer for the uses of the called function). We dereference to get at the contents of what a pointer points to, for if we forgot or didn't dereference, we'd instead just get the memory address (and if we weren't expecting this, we'd see “gibberish”, which would actually make more sense if we displayed it as a hexadecimal value, although still not the data we'd be looking for).
To declare a pointer, we merely specify a * before a variable's name.
int *var;
Now, a common mistake is to try and either store a value in our variable (again, as a pointer it is a memory variable– we don't store values, we store memory addresses), or to dereference it before it has been assigned.
As memory access is one of C's strengths, and memory protection incorporated into most modern operating systems, it isn't uncommon to experience some unique errors when dealing with pointers inappropriately in C. Specifically, we are likely to encounter a Segmentation Fault.
All a Segmentation Fault (aka Seg Fault, SIGSEGV, Signal 11) happens to be is an intervention from the operating system informing us we tried to access memory that we did not have control over. (You don't expect to get away consequence free by randomly wandering into some stranger's home, right? Same idea here.)
When initializing a pointer, we will either set it equal to the address of another variable, or, as will commonly be the case in Data Structures, we will allocate a block of memory and set our pointer's contents equal to the starting address of that otherwise unnamed memory region. Our pointer then also becomes the only means of referencing that memory– if we change the contents of our pointer, and nothing else is pointing to our allocated memory, that memory is lost and inaccessible. This is called a “memory leak”, and enough of them can exhaust the memory resources of a machine.
Now let us look at the basic underlying unit that we will be spending the semester playing with in Data Structures: the node.
There's nothing special about a node- it is something we make, it is something we choose to call a node (we could easily call them something else). It contains both useful content and the ability to reference (or point to) other nodes. It is that referencing ability that makes it so viable and valuable in Data Structures.
nodes are the building blocks of linked lists, a peer to the array.
But as we know, arrays are limited to a fixed size- we must allocate the size by first usage. If don't use it all, memory is wasted; if we don't use enough, we're out of luck (short of allocating a new array that is larger, and copying all values from old to new– some languages have this as a “feature”, and tend to call it a dynamic array or vector).
The linked list, on the other hand, is only allocated a node at a time. It therefore allows a best fit for the data set we are working with. For this reason (and because we are allocating memory) linked lists are associated with dynamic memory programming… as the actual memory needs are determined during runtime.
So what exactly IS this node thing?
For one, it is something we create. A self-created variable, eh? Anything that might come to mind as far as a packaging agent? A variable that can contain other variables?
If you are thinking struct, you'd be absolutely right.
We will use struct to package its contents.
What is in the node? Again, this answer depends on our particular needs. For now, lacking any specific application, let us put some value in our node (we'll call it value, as our node contains a value).
The node, to make it practical and useful, also needs the ability to reference other nodes. Pointers are they key for referencing other variables (via memory address storage and dereferencing), so our simple node will also contain a pointer.
What type of pointer? Well, a pointer to the type that defines our node- our struct.
Here it is:
struct node { int info; struct node *after; };
To make our lives easier, we'll also follow up our node struct definition with a typedef, merely to save on typing (it isn't needed, it is only added for our convenience). So then we'd have:
struct node { int info; struct node *after; }; typedef struct node Node;
So now, when we wish to create a new node (let's say first, which we'll also make a pointer), we have two ways we can declare it:
struct node *first;
or:
Node *first;
as I said, identical… just one is more convenient for us to type, due to the added typedef.
Now, for any given node to be useful, we need to allocate memory for it. Otherwise, the first attempted access to any of its members is a segfault waiting to happen.
To allocate memory, we use malloc(), as follows (we'll assign memory for our first node, declared above, to reference):
first = (Node *) malloc (sizeof(Node));
We have to cast the return type of malloc() because it is somewhat type generic- it returns a pointer to raw memory, and in C that needs to be associated with a specific format. Since we're allocating memory to a node, we'll have the compiler apply the rules of the node struct we just created.
Assigning (and retrieving) information from the node is merely a struct operation:
// to assign first -> info = 12; // first is a pointer to a struct, so we use the structure pointer arrow // to retrieve printf("first's value is %d\n", first -> info); // dealing with the next node, the after pointer- set it to an initial sane value first -> after = NULL;
The key to everything in a list is linking one node to the next. This is also the part where confusion sets in, due to the level of abstraction at play.
I highly recommend drawing pictures to help you in tracing out what is going on, especially as you are first learning this… and likely throughout the semester. And by drawing pictures, I mean take out a sheet of paper and pen/pencil, and draw nodes (use circles), write the “value” in the circle, and draw one way arrows to identify where any pointers point.
At present, we have a node called first that we've allocated memory to and altered its contents, that diagram would look as follows:
Notice how all the elements of the node are dealt with (both value and the next pointer). And first, being a mere pointer to a struct, is a name that points to (because it merely contains the address of) the memory region we malloc()'ed and are storing our struct in.
Now,to link to another node (and put in, say, a 37 for its value) we'd do something along these lines:
first -> after = (Node *) malloc (sizeof(Node)); first -> after -> info = 37; first -> after -> after = NULL;
And don't forget to update your diagram:
Since we need to keep a placeholder on our allocated memory, first is intended to be a more or less immovable aspect of our list (it is our link to everything- we don't want to adjust it unless we absolutely need to).
You may be noticing the potential for some very long code about to happen (what if we wanted to add a third node… those after's would become after → after, and so on). But there's a way to keep it simple (but ambiguous, at least without an updated diagram)… and that is to just use another variable, whose job is to be more of a temporary placeholder. We shall call it tmp.
Here is that same node construction logic, redone using an additional tmp node pointer, and also adding in a third node (containing the value 8):
Node *first, *tmp = NULL; first = (Node *) malloc (sizeof(Node)); tmp = first; tmp -> info = 12; tmp -> after = NULL; tmp -> after = (Node *) malloc (sizeof(Node)); tmp = tmp -> after; tmp -> info = 37; tmp -> after = NULL; tmp -> after = (Node *) malloc (sizeof(Node)); tmp = tmp -> after; tmp -> info = 8; tmp -> after = NULL;
Our node diagram now looks as follows (but ideally, you'd have been updating it line by line as this program went along– do not wait “until the end”…. build and update your diagram as changes are happening):
On Lab46, in /var/public/fall2015/data/, is a project directory called sln1; you do not need to go there, but we will be referencing that path to obtain our own copy. In that path will be the skeleton structure of what we'll be using for many of our projects this semester.
Please type the following (you can be anywhere on lab46):
lab46:~$ make -C /var/public/fall2015/data/sln1/ copy
When done, be sure to return to your home directory and wander in to your local copy of the sln1 project:
lab46:$ cd ~/src/data/sln1 lab46:~/src/data/sln1$
NOTE: You may move sln1 to a different directory structure of your choosing (should you be using different names for things), but you MUST retain the sln1 name for the directory– there's a lot of administrative logic helping to make our lives easier that is based on that specific name for the project directory.
You'll see various files and directories located here (one regular file, Makefile, and 6 directories). The directory structure (note, not all these directories may yet be present) for the project is as follows:
The project is driven by a fleet of optimized Makefiles, which will facilitate the compiling process for us.
Each Makefile plays a unique role (the closer the Makefile is to the source code, the more specialized it becomes).
The base-level Makefile is used to enact whole-project actions, such as initiating a compile, cleaning the project directory tree of compiled and object code, submitting projects, or applying bug-fixes or upgrading to other projects.
Running make help will give you a list of available options:
lab46:~/src/data/sln1$ make help ****************[ Data Structures Node Implementation ]***************** ** make - build everything (libs and testing) ** ** make debug - build everything with debug symbols ** ** ** ** make libs - build all supporting libraries ** ** make libs-debug - build all libraries with debug symbols ** ** make testing - build unit tests ** ** make testing-debug - build unit tests with debugging symbols ** ** ** ** make use-test-reference - use working implementation object files ** ** make use-your-own-code - use your node/list implementation code ** ** ** ** make save - create a backup archive ** ** make submit - submit assignment (based on dirname) ** ** make update - check for and apply updates ** ** ** ** make clean - clean; remove all objects/compiled code ** ** make help - this information ** ************************************************************************ lab46:~/src/data/sln1$
In general, you will likely make the most frequent use of these options:
Most of what you do will be some combination of those 3 options.
Sometimes, a typo or other issue will be uncovered in the provided code you have. I will endeavor to release updates which will enable you to bring your code up-to-date with my copy.
When a new update is available, you will start seeing the following appear as you go about using make:
lab46:~/src/data/sln1$ make ********************************************************* *** NEW UPDATE AVAILABLE: Type 'make update' to apply *** ********************************************************* ... lab46:~/src/data/sln1$
When this occurs, you may want to perform a backup (and/or commit/push any changes to your repository)– certain files may be replaced, and you do not want to lose any modifications you have made:
lab46:~/src/data/sln1$ make save ...
Once you have done that, go ahead and apply the update:
lab46:~/src/data/sln1$ make update Update 1 COMMENCING Update 1 CHANGE SUMMARY: Fixed base and other Makefile typos Please reference errata section on project page for more information Update 1 COMPLETE Updated from revision 0 to revision 1 lab46:~/src/data/sln1$
At this point your code is up to date (obviously the above output will reflect whatever the current revision is).
As the semester progresses, additional projects will be made available. When this occurs, you may notice new entries appear on the make help display. For example, as the deadline for the node0 project approaches, you will see the following appear:
... ** make update - check for and apply updates ** ** make upgrade-sll0 - upgrade to next project (sll0) ** ** ** ** make clean - clean; remove all objects/compiled code ** ...
By typing make upgrade-sll0, your current work on sln1 will be copied into a new sll0 directory (peer to sln1), and any new files will be copied in from its project directory in /var/public/fall2015/data/sll0/
As such, it is most advisable to have completed work on sln1 before upgrading to the sll0 project, so any work you've done will be immediately available to build upon in the next project (the projects will be comprehensive to one another– sll0 will rely on work completed in sln1, sll1 (the project after sll0) will rely on the work done in sll0, etc.).
To facilitate debugging and correction of errors and warnings in your code at compile time, such compiler messages will be redirected to a text file called errors in the base of the project directory.
You can view this file to ascertain what errors existed in the last build of the project.
With each new project build, this file is overwritten, so you always have the most up-to-date version of compile-time information.
For this project, you are responsible for the following:
In src/node/, you will find 4 files: mk.c, cp.c, rm.c, and a Makefile
Take a look at these files. These are currently the skeleton functions which will be compiled and archived into the node library (libnode.a) that we will be using in this and future projects.
Figure out what is going on, make sure you understand it.
There are 3 functions in the node library:
None of these files denote an entire runnable program. These are merely standalone functions. The various programs under the app/ and unit/ directories will use these functions in addition to their application logic to create complete executable programs.
You will also notice there are function prototypes for these node library functions in the node.h header file, located in the inc/ subdirectory, which you'll notice all the related programs you'll be playing with in this project are #includeing.
The prototypes (taken right from inc/node.h are as follows:
Node *mknode(int ); // allocate new node containing value Node *rmnode(Node *); // deallocate node Node *cpnode(Node *); // duplicate node
This is your API for the node library. In order to use the node library three things need to happen:
In general, this is no different than what you've already done, each and every time you've used printf(), scanf(), atoi(), sqrt(), etc. Only until now, you haven't actually had the code right in front of you. But these functions all work the same way, these conditions have to be met for them to operate and be used.
The compiler does a lot of behind-the-scenes work (linking against the C standard library by default, so all you have to do is include stdio.h and/or stdlib.h).
If you've ever played with the math library, you've had a slightly closer look, as such code wouldn't compile with only an include of math.h, you also needed to add a -lm on the compiler command-line.
Again, same details apply here, only the Makefile system automates the library linking. All we have to do is #include the appropriate files.
Upon successful implementation of the node library, take a look in app/node/, which will have (among others) the following files:
Take a look at these programs; your task is to complete them according to the directions located in comments within.
This further works with the activities you've been doing on the first two projects- first with an array, then as a diagram/pseudocode, and now as syntactically correct C code.
As a means of testing your understanding, this program sets up a pre-existing array, filled with values, and displays it to STDOUT.
Your task is to add in logic that builds a list, one node at a time, containing the same values (and in the same order) as is found in that array, and to then display the linked list to STDOUT, where we should see identical information.
Sample output of completed code should look like:
lab46:~/src/data/sln1/bin$ ./node-app-arrtolist Array: 3 1 4 1 5 9 2 6 5 3 5 8 9 7 List: 3 1 4 1 5 9 2 6 5 3 5 8 9 7 lab46:~/src/data/sln1/bin$
As the array is defined with set values, your output, when complete and correct, should always be the same. This tends to be a good exercise in demonstrating you understand conceptually what is going on and can perform the necessary node manipulations to pull it off.
Again, be sure to use node library functions (like mknode()) in this program.
In the node-app-display program you had to implement the list display functionality- effectively putting pseudocode written in the previous project to work as usable C code.
Here, we work with that same idea, only we change a few things around structurally in the program- the final output should still be the same, but the code to produce it will be different.
Basically, there are three items for you to address:
The output should be the same as experienced in node-app-display when completed.
In unit/node/, you will find 3 files (along with a Makefile):
These are complete runnable programs (when compiled, and linked against the node library, which is all handled for you by the Makefile system in place).
Of particular importance, I want you to take a close look at:
You've made changes to your node library implementation, or node-app-display.c, and are ready to see your results. What do we do?
First, change back to the base of the project:
lab46:~/src/data/sln1/src/node$ cd ~/src/data/sln1 lab46:~/src/data/sln1$
OR: You may want to have two terminals open- in one you are situated in ~/src/data/sln1/src/node/ editing away, and in the other you are in ~/src/data/sln1/; this way you can take care of development activities AND easily check your results, without constantly navigating back and forth between various locations.
If you've already done this a few times, you may want to clean things out and do a fresh compile (never hurts, and might actually fix some problems):
lab46:~/src/data/sln1$ make clean
Next, compile the whole project:
lab46:~/src/data/sln1$ make
or, compile with debugging support (make OR make debug):
lab46:~/src/data/sln1$ make debug
Compiled executables go in the bin directory, so if we change into there and take a look around we see:
lab46:~/src/data/sln1$ cd bin lab46:~/src/data/sln1/bin$ ls node-app-display node-app-test node-app-test2 unit-cpnode unit-mknode unit-rmnode verify-cpnode.sh verify-mknode.sh verify-rmnode.sh lab46:~/src/data/sln1/bin$
There may be others, and in time more and more files will appear here.
To run node-app-display, we'd do the following (specify a relative path to the executable):
lab46:~/src/data/node0/bin$ ./node-app-display
The program will now run, and do whatever it was programmed to do.
For example, let's say we ran the program and put the values 6, 17, 23, 4, 56, and 2 in the list. Your completed program would look like this when run:
lab46:~/src/data/node0/bin$ ./node-app-display Enter a value (-1 to quit): 6 Enter a value (-1 to quit): 17 Enter a value (-1 to quit): 23 Enter a value (-1 to quit): 4 Enter a value (-1 to quit): 56 Enter a value (-1 to quit): 2 Enter a value (-1 to quit): -1 6 -> 17 -> 23 -> 4 -> 56 -> 2 -> NULL lab46:~/src/data/node0/bin$
NOTE: This is just example input. Not only should your program work with this, but lists of any length, containing any arrangement of valid values.
As the layers and complexities rise, narrowing down the source of errors becomes increasingly important.
If the unit-cpnode unit test isn't working, is it because of a problem there or in cpnode() itself?
To aid you in your development efforts, you now have the ability to import a functioning node implementation into your project for the purposes of testing unit test functionality (so you can see what you SHOULD be getting, then go back and continue working on your implementation)
You'll notice that, upon running make help in the base-level Makefile, the following new options appear (about halfway in the middle):
** ** ** make use-test-reference - use working implementation object files ** ** make use-your-own-code - use your node implementation code ** ** **
In order to make use of it, you'll need to run make use-test-reference from the base of your sln1 project directory, as follows:
lab46:~/src/data/sln1$ make use-test-reference ... NODE reference implementation in place, run 'make' to build everything. lab46:~/src/data/sln1$
You'll see that final message indicating everything is in place (it automatically runs a make clean for you), and then you can go ahead and build everything with it:
lab46:~/src/data/sln1$ make ...
Debugging: When using the test reference implementation, you will not be able to debug the contents of the node and list functions (the files provided do not have debugging symbols added), so you'll need to take care not to step into these functions (it would be just like stepping into printf(). You can still compile the project with debugging support and debug (as usual) those compiled functions (ie the stack functions).
If you were trying out the reference implementation to verify queue functionality, and wanted to revert back to your own code, it is as simple as:
lab46:~/src/data/sln1$ make use-your-own-code Local node implementation restored, run 'make clean; make' to build everything. lab46:~/src/data/sln1$
Just to be clear: the reference implementation is not some magic shortcut getting you out of doing this project; it merely gives you a glimpse into how things are working, or should be working, provided your node library is complete and fully functional.
To assist you in verifying a correct implementation, a fully working implementation of the node library should resemble the following:
Here is what you should get for the node library:
lab46:~/src/data/sln1$ make check ==================================================== = Verifying Singly-Linked Node Functionality = ==================================================== [mknode] Total: 4, Matches: 4, Mismatches: 0 [cpnode] Total: 5, Matches: 5, Mismatches: 0 [rmnode] Total: 2, Matches: 2, Mismatches: 0 ==================================================== [RESULTS] Total: 11, Matches: 11, Mismatches: 0 ==================================================== lab46:~/src/data/sln1$
Note that there are sub-scripts that can also be manually run (as well as the unit tests themselves)… the more specific you get, the more detailed information you will receive (useful for debugging).
This top-level make check action gives you the 30,000 foot view… what is the current status of your node library implementation? From there, you take whatever appropriate action is necessary.
When you are done with the project and are ready to submit it, you simply run make submit:
lab46:~/src/data/PROJECT$ make submit ...
To be successful in this project, the following criteria must be met: