Corning Community College
CSCS2320 Data Structures
~~TOC~~
======Project: NODES======
=====Objective=====
To review structs and pointers, and see how these two concepts, when combined, produces the core element of our class explorations.
=====Reference=====
You absolutely, positively, MUST watch this video: http://www.youtube.com/watch?v=5VnDaHBi8dM
=====Structures=====
As we learned in C, there are two main composite data types available to us:
* homogeneous (all elements are the same type). We call these **arrays**
* heterogeneous (elements can differ). We call these **structs**
The struct effectively lets us design our own data type, by filling a container with all the types we need to aid us in solving some problem more effectively.
And what's more, structs are the basis for classes (they are essentially structs with some relaxed rules (i.e. they can have functions) and additional syntax and abilities (constructors, access control).
Structure elements are accessed with either the "." or "->" operator, depending on the declared status of the struct.
Structs declared as pointers use the "->" (I commonly see it called the structure pointer), where non-pointered structs use the "."
For example:
struct rectangle {
int length;
int width;
float area;
};
Can be used as a non-pointer instance:
struct rectangle box;
box.length = 12;
box.width = 10;
box.area = box.length * box.width;
or as a pointer:
struct rectangle *box;
box = (struct rectangle *) malloc (sizeof(struct rectangle));
box -> length = 12;
box -> width = 10;
box -> area = box -> length * box -> width;
=====Pointers=====
Pointers are commonly referred to as one of the main features that gives C its power (and, by relation, C++ too).
Pointers are actually just another type of variable: a memory variable.
Now you might be one to say- but aren't all variables "memory variables"? After all, variables reside in memory.
The distinction lies in the primary purpose of the variable, which we will take a closer look at. In essence, variables have 3 attributes:
* name
* contents
* address
Whether we use it or not, everything has an address. This is how the computer keeps track of things. We use the name, but that gets translated back to the address.
A pointer merely adds in another capability- the ability to dereference. You see, since pointers are "memory variables", that is what they store-- a memory address.
However, because C is a typed language (variables have specific data types), certain properties are applied to a declared variable, and therefore memory. So, instead of having a unique data type for a pointer, it ends up being a modification of an existing variable declaration (an integer pointer, a char pointer, a pointer to an array of short ints, a pointer to a struct, and even- a function pointer).
By being associated with an existing type, the compiler knows what rules to apply to some destination memory address.
So, with a pointer we have a name, we have the contents (which instead of a number in accordance with a data type, we instead store a memory address), and we have the address.
To make pointers useful, we **dereference** them, or access the memory address stored in the contents of our pointer. This may be described as "following the pointer", since a pointer variable doesn't contain the data, it merely contains the address which contains the data (1 level of abstraction out).
There are two operators in C associated with memory access:
* * - dereference
* & - address of
We've used both of these before-- to pass by address to a function, we pass the address of a variable (effectively turning it into a pointer for the uses of the called function). We dereference to get at the contents of what a pointer points to, for if we forgot or didn't dereference, we'd instead just get the memory address (and if we weren't expecting this, we'd see "gibberish", which would actually make more sense if we displayed it as a hexadecimal value).
To declare a pointer, we merely specify a * before a variable's name.
int *var;
Now, a common mistake is to try and either store a value in our variable (again, as a pointer it is a memory variable-- we don't store values, we store memory addresses), or to dereference it before it has been assigned.
As memory access is one of C's strengths, and memory protection incorporated into most modern operating systems, it isn't uncommon to experience some unique errors when dealing with pointers inappropriately in C. Specifically, we are likely to encounter a **Segmentation Fault**.
All a Segmentation Fault (or Seg Fault, SIGSEGV) happens to be is an intervention from the operating system informing us we tried to access memory that we did not have control over.
When initializing a pointer, we will either set it equal to the address of another variable, or, as will commonly be the case in Data Structures, we will allocate a block of memory and set our pointer's contents equal to the starting address of that otherwise unnamed memory region. Our pointer then also becomes the only means of referencing that memory-- if we change the contents of our pointer, and nothing else is pointing to our allocated memory, it is lost and inaccessible. This is called a "memory leak", and enough of them can exhaust the memory resources of a machine.
=====Nodes=====
Now let us look at the basic underlying unit that we will be spending the semester playing with in Data Structures; the node.
There's nothing special about a node- it is something we make, containing both useful content and the ability to reference (or point to) other nodes. It is that referencing ability that makes it so viable.
nodes are the building blocks of linked lists, a peer to the array.
But as we know, arrays are limited to a fixed size- we must allocate the size by first usage. If don't use it all, memory is wasted; if we don't use enough, we're out of luck (short of allocating a new array that is larger, and copying all values from old to new-- some languages have this as a more basic feature, and tend to call it a dynamic array or vector).
The linked list, on the other hand, is only allocated a node at a time. It therefore allows a best fit for the data set we are working with. For this reason (and because we are allocating memory) linked lists are associated with dynamic memory programming... as the actual memory needs are determined during runtime.
====What is / What is in the node====
So what exactly **IS** this node thing?
For one, it is something we create. A self-created variable, eh? Anything that might come to mind as far as a packaging agent? A variable that can contain other variables?
If you are thinking **struct**, you'd be absolutely right.
We will use struct to package its contents.
What is in the node? Again, this answer depends on our particular needs. For now, lacking any specific application, let us put some generic integer value in our node (we'll call it **value**, as our node contains a value).
The node, to make it practical and useful, also needs the ability to reference other nodes. Pointers are they key for referencing other variables (via memory address storage and dereferencing), so our simple node will also contain a pointer.
What type of pointer? Well, a pointer to the type that defines our node- our struct.
Here it is:
struct node {
int value;
struct node *next;
};
To make our lives easier, we'll also follow up our node struct definition with a **typedef**, merely to save on typing (it isn't needed, it is only added for our convenience). So then we'd have:
struct node {
int value;
struct node *next;
};
typedef struct node Node;
So now, when we wish to create a new node (let's say first, which we'll also make a pointer), we have two ways we can declare it:
struct node *first;
or:
Node *first;
as I said, identical... just one is more convenient for us to type, due to the added typedef.
====Allocation====
Now, for any given node to be useful, we need to allocate memory for it (otherwise, the first attempted access to any of its members is a segfault waiting to happen).
To allocate memory, we use **malloc()**, as follows (we'll assign memory to our **first** node, declared above):
first = (Node *) malloc (sizeof(Node));
We have to cast the return type of **malloc()** because it is somewhat type generic- it returns a pointer to raw memory, and in C that needs to be associated with a specific format. Since we're allocating memory to a node, we'll have the compiler apply the rules of the node struct we just created.
====Putting a value in the node====
Assigning (and retrieving) information from the node is merely a struct operation:
// to assign
first -> value = 12; // first is a pointer to a struct, so we use the structure pointer arrow
// to retrieve
printf("first's value is %d\n", first -> value);
// dealing with next- set it to an initial sane value
first -> next = NULL;
====Linking our node to another node====
The key to everything is linking one node to the next. This is also the part where confusion sets in, due to the level of abstraction at play.
I highly recommend drawing pictures to help you in tracing out what is going on, especially as you are first learning this... and likely throughout the semester. And by drawing pictures, I mean take out a sheet of paper and pen/pencil, and draw nodes (use circles), write the "value" in the circle, and draw one way arrows to identify where any pointers point.
At present, we have a node called first that we've allocated memory to and altered its contents, that diagram would look as follows:
{{ :haas:spring2014:data:projects:initalnode.jpg |initial node diagram}}
Notice how all the elements of the node are dealt with (both value and the next pointer). And first, being a mere pointer to a struct, is a name that points to (because it merely contains the address of) the memory region we **malloc()**'ed and are storing our struct in.
Now,to link to another node (and put in, say, a 37 for its value) we'd do something along these lines:
first -> next = (Node *) malloc (sizeof(Node));
first -> next -> value = 37;
first -> next -> next = NULL;
And don't forget to update your diagram:
{{ :haas:spring2014:data:projects:nextnode.jpg |}}
====Use variables to make it easier to traverse the list====
Since we need to keep a placeholder on our allocated memory, **first** is more or less immovable in the current context (it is our link to everything).
You may be noticing the potential for some very long code about to happen (what if we wanted to add a third nod... those next's would become next -> next, and so on). But there's a way to keep it simple (but ambiguous, at least without an updated diagram)... and that is to just use another variable, whose job is to be more of a temporary placeholder. We shall call it **tmp**.
Here is that same node construction logic, redone using an additional **tmp** node pointer, and also adding in a third node (containing the value 8):
Node *first, *tmp = NULL;
first = (Node *) malloc (sizeof(Node));
tmp = first;
tmp -> value = 12;
tmp -> next = NULL;
tmp -> next = (Node *) malloc (sizeof(Node));
tmp = tmp -> next;
tmp -> value = 37;
tmp -> next = NULL;
tmp -> next = (Node *) malloc (sizeof(Node));
tmp = tmp -> next;
tmp -> value = 8;
tmp -> next = NULL;
Our node diagram now looks as follows (but ideally, you'd have been updating it line by line as this program went along-- do not wait "until the end".... build and update your diagram as changes are happening):
{{ :haas:spring2014:data:projects:thirdnodeandtmp.jpg |}}
=====Procedure=====
I would like for you to write a C program utilizing nodes to construct a simple list, joining new values onto the end of the last node.
The program should do the following:
* implement a node containing signed char's (for **value**)
* prompt the user to enter a value (-1 to exit)
* allocate a node (or the next node-- you'll have to deal with first vs. remaining case logic)
* obtain input from the user and assign it to the current node
* once a -1 has been provided, display the list from beginning to end and then exit
Sample output would look as follows:
lab46:~/src/data$ ./node
Enter a value (-1 to quit): 12
Enter a value (-1 to quit): 37
Enter a value (-1 to quit): 8
Enter a value (-1 to quit): 59
Enter a value (-1 to quit): 86
Enter a value (-1 to quit): 4
Enter a value (-1 to quit): -1
List: 12 -> 37 -> 8 -> 59 -> 86 -> 4 -> NULL
lab46:~/src/data$
NOTE: This is just example input. Not only should your program work with this, but lists of any length, containing any arrangement of valid values.
=====Submission=====
To successfully complete this project, the following criteria must be met:
* Handwritten diagram fully detailed for an 8 node list uploaded to your Opus
* insert the image into your journal, and have related commentary on what is going on
* submit the URL (to that week's journal entry) along with your program
* Code must compile cleanly (no warnings or errors)
* Executed program must display in a manner similar to provided output.
* Output must be correct
* Code must be nicely and consistently indented (you may use the **indent** tool)
* Code must be commented
* have a properly filled-out comment banner at the top
* have at least 20% of your program consist of **//**-style descriptive comments
* Track/version the source code in a repository
* Submit a copy of your source code to me using the **submit** tool.
To submit this program to me using the **submit** tool, run the following command at your lab46 prompt:
$ submit data nodes node.c http://lab46.corning-cc.edu/opus/SEMESTER/user/start#week_2
Submitting data project "nodes":
-> node.c(OK)
-> http://lab46.corning-cc.edu/opus/SEMESTER/user/start#week_2(OK)
SUCCESSFULLY SUBMITTED
You should get some sort of confirmation indicating successful submission if all went according to plan. If not, check for typos and or locational mismatches.