Corning Community College CSCS2320 Data Structures ======Project: Singly Linked Nodes (sln1)====== =====Errata===== This section will document any updates applied to the project since original release: * __revision #__: (DATESTAMP) =====Objective===== To apply our recent list activities with structs and pointers, and see how these two things, when combined, produces an element central to our class explorations. =====Reference===== You absolutely, positively, MUST watch this video: http://www.youtube.com/watch?v=5VnDaHBi8dM =====Structures===== As we learned in C, there are two main composite data types available to us: * homogeneous (all elements are the same type). We call these **arrays** * heterogeneous (elements can differ). We call these **structs** The struct effectively lets us design our own data type, by filling a container with all the types we need to aid us in solving some problem more effectively. And what's more, structs are the basis for classes (they are essentially structs with some relaxed rules (i.e. they can have functions) and additional syntax and abilities (constructors, access control). Structure elements are accessed with either the "." or "->" operator, depending on the declared status of the struct (statically declared or dynamically declared (i.e. pointer)). Structs declared as pointers use the "->" (I commonly see referred to as the structure pointer), where non-pointered structs use the "." For example: struct rectangle { int length; int width; float area; }; Can be used as a non-pointer instance: struct rectangle box; box.length = 12; box.width = 10; box.area = box.length * box.width; or as a pointer: struct rectangle *box; box = (struct rectangle *) malloc (sizeof(struct rectangle)); box -> length = 12; box -> width = 10; box -> area = box -> length * box -> width; =====Pointers===== Pointers are commonly referred to as one of the main features that gives C its power (and, by relation, C++ too). Pointers are actually just another type of variable: a memory variable. Now you might be one to say- but aren't all variables "memory variables"? After all, variables reside in memory. The distinction lies in the primary purpose of the variable, which we will take a closer look at. In essence, variables have 3 attributes: * name * contents * address Whether we use it or not, everything has an address. This is how the computer keeps track of things. We use the name, but that gets translated back to the address. A pointer merely adds in another capability- the ability to dereference an address. You see, since pointers are "//memory variables//", that is precisely what they store: **a memory address**. However, because C is a typed language (all variables must have specific data types indicated at the time of declaration), certain properties are applied to a declared variable, and therefore memory. So, instead of having a unique data type for a pointer, it ends up being a modification of an existing variable declaration (an integer pointer, a char pointer, a pointer to an array of short ints, a pointer to a struct, and even- a function pointer). By being associated with an existing type, the compiler knows what rules to apply to the dereferenced contents at some destination memory address. So, with a pointer we have a name, we have the contents (which instead of a number in accordance with a data type, we instead store a memory address), and we have the address of the pointer variable itself. To make pointers useful, we **dereference** them, or access the memory address stored in the contents of our pointer. This may be described as "following the pointer", since a pointer variable doesn't contain the data, it merely contains the address which contains the data (a level of abstraction out). There are two operators in C associated with memory access: * * - dereference * & - address of We've used both of these before-- to pass by address to a function, we pass the address of a variable (effectively turning it into a pointer for the uses of the called function). We dereference to get at the contents of what a pointer points to, for if we forgot or didn't dereference, we'd instead just get the memory address (and if we weren't expecting this, we'd see "gibberish", which would actually make more sense if we displayed it as a hexadecimal value, although still not the data we'd be looking for). To declare a pointer, we merely specify a * before a variable's name. int *var; Now, a common mistake is to try and either store a value in our variable (again, as a pointer it is a memory variable-- we don't store values, we store memory addresses), or to dereference it before it has been assigned. As memory access is one of C's strengths, and memory protection incorporated into most modern operating systems, it isn't uncommon to experience some unique errors when dealing with pointers inappropriately in C. Specifically, we are likely to encounter a **Segmentation Fault**. All a Segmentation Fault (aka Seg Fault, SIGSEGV, Signal 11) happens to be is an intervention from the operating system informing us we tried to access memory that we did not have control over. (You don't expect to get away consequence free by randomly wandering into some stranger's home, right? Same idea here.) When initializing a pointer, we will either set it equal to the address of another variable, or, as will commonly be the case in Data Structures, we will allocate a block of memory and set our pointer's contents equal to the starting address of that otherwise unnamed memory region. Our pointer then also becomes the only means of referencing that memory-- if we change the contents of our pointer, and nothing else is pointing to our allocated memory, that memory is lost and inaccessible. This is called a "memory leak", and enough of them can exhaust the memory resources of a machine. =====Nodes===== Now let us look at the basic underlying unit that we will be spending the semester playing with in Data Structures: the node. There's nothing special about a node- it is something we make, it is something we choose to call a node (we could easily call them something else). It contains both useful content and the ability to reference (or point to) other nodes. It is that referencing ability that makes it so viable and valuable in Data Structures. nodes are the building blocks of linked lists, a peer to the array. But as we know, arrays are limited to a fixed size- we must allocate the size by first usage. If don't use it all, memory is wasted; if we don't use enough, we're out of luck (short of allocating a new array that is larger, and copying all values from old to new-- some languages have this as a "feature", and tend to call it a dynamic array or vector). The linked list, on the other hand, is only allocated a node at a time. It therefore allows a best fit for the data set we are working with. For this reason (and because we are allocating memory) linked lists are associated with dynamic memory programming... as the actual memory needs are determined during runtime. ====What is / What is in the node==== So what exactly **IS** this node thing? For one, it is something we create. A self-created variable, eh? Anything that might come to mind as far as a packaging agent? A variable that can contain other variables? If you are thinking **struct**, you'd be absolutely right. We will use struct to package its contents. What is **in** the node? Again, this answer depends on our particular needs. For now, lacking any specific application, let us put some value in our node (we'll call it **info**, as our node //contains// a value of information). The node, to make it practical and useful, also needs the ability to reference other nodes. Pointers are they key for referencing other variables (via memory address storage and dereferencing), so our simple node will also contain a pointer. What type of pointer? Well, a pointer to the type that defines our node- our struct. Here it is: struct node { char info; struct node *right; }; To make our lives easier, we'll also follow up our node struct definition with a **typedef**, merely to save on typing (it isn't needed, it is only added for our convenience). So then we'd have: struct node { char info; struct node *right; }; typedef struct node Node; So now, when we wish to create a new node (let's say first, which we'll also make a pointer), we have two ways we can declare it: struct node *first = NULL; or: Node *first = NULL; as I said, identical... just one is more convenient for us to type, due to the added typedef. ====Allocation==== Now, for any given node to be useful, we need to allocate memory for it. Otherwise, the first attempted access to any of its members is a segfault waiting to happen. To allocate memory, we use **malloc()**, as follows (we'll assign memory for our **first** node, declared above, to reference): first = (Node *) malloc (sizeof(Node)); We have to cast the return type of **malloc()** because it is somewhat type generic- it returns a pointer to raw memory, and in C that needs to be associated with a specific format. Since we're allocating memory to a node, we'll have the compiler apply the rules of the node struct we just created. ====Putting a value in the node==== Assigning (and retrieving) information from the node is merely a struct operation: // to assign first -> info = 12; // first is a pointer to a struct, so we use the structure pointer arrow // to retrieve printf("first's value is %hhd\n", first -> info); // dealing with the next node, the 'right' pointer- set it to an initial sane value first -> right = NULL; ====Linking our node to another node==== The key to everything in a list is linking one node to the next. This is also the part where confusion sets in, due to the level of abstraction at play. I highly recommend drawing pictures to help you in tracing out what is going on, especially as you are first learning this... and likely throughout the semester. And by drawing pictures, I mean take out a sheet of paper and pen/pencil, and draw nodes (use circles), write the "value" in the circle, and draw one way arrows to identify where any pointers point. At present, we have a node called first that we've allocated memory to and altered its contents, that diagram would look as follows: {{ :haas:spring2015:data:projects:initalnode.jpg |initial node diagram}} Notice how all the elements of the node are dealt with (both value and the next pointer). And //first//, being a mere pointer to a struct, is a name that points to (because it merely contains the address of) the memory region we **malloc()**'ed and are storing our struct in. Now,to link to another node (and put in, say, a 37 for its value) we'd do something along these lines: first -> right = (Node *) malloc (sizeof(Node)); first -> right -> info = 37; first -> right -> right = NULL; And don't forget to update your diagram: {{ :haas:spring2015:data:projects:nextnode.jpg |}} ====Use variables to make it easier to traverse the list==== Since we need to keep a placeholder on our allocated memory, **first** is intended to be a more or less immovable aspect of our list (it is our link to everything- we don't want to adjust it unless we absolutely need to). You may be noticing the potential for some very long code about to happen (what if we wanted to add a third node... those there's would become right -> right, and so on). But there's a way to keep it simple (but ambiguous, at least without an updated diagram)... and that is to just use another variable, whose job is to be more of a temporary placeholder. We shall call it **tmp**. Here is that same node construction logic, redone using an additional **tmp** node pointer, and also adding in a third node (containing the value 8): Node *first = NULL, *tmp = NULL; first = (Node *) malloc (sizeof(Node)); tmp = first; tmp -> info = 12; tmp -> right = NULL; tmp -> right = (Node *) malloc (sizeof(Node)); tmp = tmp -> right; tmp -> info = 37; tmp -> right = NULL; tmp -> right = (Node *) malloc (sizeof(Node)); tmp = tmp -> right; tmp -> info = 8; tmp -> right = NULL; Our node diagram now looks as follows (but ideally, you'd have been updating it line by line as this program went along-- do not wait "until the end".... build and update your diagram as changes are happening): {{ :haas:spring2015:data:projects:thirdnodeandtmp.jpg |}} =====Procedure for bootstrapping===== ====Grabit Integration==== To "grab" it, type the following when in your ~/src/data directory, or where-ever you'd like to set up your project: lab46:~/src/data$ grabit data sln1 make: Entering directory '/var/public/SEMESTER/data/sln1' Commencing copy process for SEMESTER data project sln1 -> Creating project sln1 directory tree ... OK -> Copying sln1 project files ... OK -> Synchronizing sln1 project revision level ... OK -> Establishing sane file permissions for sln1 ... OK *** Copy Complete! You may now switch to the '/home/USER/src/data/sln1' directory make: Leaving directory '/var/public/SEMESTER/data/sln1' lab46:~/src/data$ NOTE: You do NOT want to do this on a populated sln1 project directory-- it will overwrite files. And, of course, your basic compile and clean-up operations via the Makefile. When done, be sure to return to your home directory and wander in to your local copy of the **sln1** project: lab46:$ cd ~/src/data/sln1 lab46:~/src/data/sln1$ **NOTE:** If you move **sln1** to a different directory, you **MUST** retain the **sln1** name for the directory-- there's a lot of administrative logic helping to make our lives easier that is based on that specific name for the project directory. ====Overview==== You'll see various files and directories located here (one regular file, **Makefile**, and 6 directories). The directory structure (note, not all these directories may yet be present) for the project is as follows: * **app**: subdirectory tree containing applications/demos using Data Structures implementations * **node**: location of node end-user applications * **bin**: compiled, executable programs will reside here * **inc**: project-related header files (which we can **#include**) are here * **lib**: compiled, archived object files (aka libraries) will reside here * **src**: subdirectory tree containing our Data Structure implementations * **node**: location of our node implementation * **list**: location of our linked list implementation (manipulation of nodes) * **stack**: location of our stack implementation (manipulation of lists) * **queue**: location of our queue implementation (a different manipulation of lists) * ... * **unit**: subdirectory tree containing unit tests, helping to verify correct implementation * **node**: node-related unit tests will be here * **list**: list-related unit tests will be here * ... =====Operating===== The project is driven by a fleet of optimized **Makefile**s, which will facilitate the compiling process for us. Each Makefile plays a unique role (the closer the Makefile is to the source code, the more specialized it becomes). The base-level Makefile is used to enact whole-project actions, such as initiating a compile, cleaning the project directory tree of compiled and object code, submitting projects, or applying bug-fixes or upgrading to other projects. Running **make help** will give you a list of available options: lab46:~/src/data/sln1$ make help ****************[ Data Structures Node Implementation ]***************** ** make - build everything (libs, units, apps) ** ** make debug - build everything with debug symbols ** ** make check - check implementation for validity ** ** ** ** make libs - build all supporting libraries ** ** make libs-debug - build all libraries with debug symbols ** ** make apps - build unit tests ** ** make apps-debug - build unit tests with debugging symbols ** ** make units - build unit tests ** ** make units-debug - build unit tests with debugging symbols ** ** ** ** make save - create a backup archive ** ** make submit - submit assignment (based on dirname) ** ** ** ** make update - check for and apply updates ** ** make reupdate - re-apply last revision ** ** make reupdate-all - re-apply all revisions ** ** ** ** make clean - clean; remove all objects/compiled code ** ** make help - this information ** ************************************************************************ lab46:~/src/data/sln1$ In general, you will likely make the most frequent use of these options: - **make**: build library and unit tests - **make apps**: build application programs - **make debug**: build library and unit tests with debugging support - **make clean**: clean out everything so we can do a full compile Most of what you do will be some combination of those 4 options. ====Bugfixes and Updates==== Sometimes, a typo or other issue will be uncovered in the provided code you have. I will endeavor to release updates which will enable you to bring your code up-to-date with my copy. When a new update is available, you will start seeing the following appear as you go about using make: lab46:~/src/data/sln1$ make ********************************************************* *** NEW UPDATE AVAILABLE: Type 'make update' to apply *** ********************************************************* ... lab46:~/src/data/sln1$ When this occurs, you may want to perform a backup (and/or commit/push any changes to your repository)-- certain files may be replaced, and you do not want to lose any modifications you have made: lab46:~/src/data/sln1$ make save ... Once you have done that, go ahead and apply the update: lab46:~/src/data/sln1$ make update Update 1 COMMENCING Update 1 CHANGE SUMMARY: Fixed base and other Makefile typos Please reference errata section on project page for more information Update 1 COMPLETE Updated from revision 0 to revision 1 lab46:~/src/data/sln1$ At this point your code is up to date (obviously the above output will reflect whatever the current revision is). ====upgrades==== As the semester progresses, additional projects will be made available. When this occurs, you may notice new entries appear on the **make help** display. For example, as the deadline for the **sln1** project approaches, you will see the following appear: ... ** make reupdate-all - re-apply all revisions ** ** ** ** make upgrade-sll0 - upgrade to next project (sll0) ** ** make clean - clean; remove all objects/compiled code ** ** make help - this information ** ************************************************************************ lab46:~/src/data/sln1$ By typing **make upgrade-sll0**, your current work on **sln1** will be copied into a new **sll0** directory (a peer/sibling directory to **sln1/**), and any new files will be copied in from its project directory. As such, it is most advisable to have completed work on **sln1** and submitted it before upgrading to the **sll0** project, so any work you've done will be immediately available to build upon in the next project (the projects will be comprehensive to one another-- **sll0** will rely on work completed in **sln1**, **sll1** (the project following **sll0**) will rely on the work done in **sll0**, etc.). ====error reporting==== To facilitate debugging and correction of errors and warnings in your code at compile time, such compiler messages will be redirected to a text file called **errors** in the base of the project directory. You can view this file to ascertain what errors existed in the last build of the project. With each new project build, this file is overwritten, so you always have the most up-to-date version of compile-time information. =====Project Task===== For this project, you are responsible for the following: * implementing the node library (**src/node/**) * **mknode()** * **cpnode()** * **rmnode()** * completing some sample applications making use of the node library (**app/node/**) * **node-app-arrtolist.c** * **node-app-display.c** * **node-app-display2.c** ====node library==== In **src/node/**, you will find 4 files: **mk.c**, **cp.c**, **rm.c**, and a **Makefile** Take a look at these files. These are currently the skeleton functions which will be compiled and archived into the node library (**libnode.a**) that we will be using in this and future projects. Figure out what is going on, make sure you understand it. There are 3 functions in the node library: * **mknode()** - creates and initializes a new node, eliminating your need to manually run **malloc()** for new nodes * **cpnode()** - duplicates an existing node * **rmnode()** - removes/deallocates (frees the memory allocated to) a node None of these files denote an entire runnable program. These are merely standalone functions. The various programs under the **app/** and **unit/** directories will use these functions in addition to their application logic to create complete executable programs. You will also notice there are function prototypes for these node library functions in the **node.h** header file, located in the **inc/** subdirectory, which you'll notice all the related programs you'll be playing with in this project are **#include**ing. The prototypes (taken right from **inc/node.h**) are as follows: Node *mknode(char ); // allocate new node containing value Node *rmnode(Node *); // deallocate node Node *cpnode(Node *); // duplicate node This is your API for the node library. In order to use the node library three things need to happen: * you must **#include "node.h"** (generally already done for you in this project) * you must link against **lib/libnode.a** (the Makefiles will take care of this for you) * you must call the functions providing the appropriate arguments and handling the return values In general, this is no different than what you've already done, each and every time you've used **printf()**, **scanf()**, **atoi()**, **sqrt()**, etc. Only until now, you haven't actually had the code right in front of you. But these functions all work the same way, these conditions have to be met for them to operate and be used. The compiler does a lot of behind-the-scenes work (linking against the C standard library by default, so all you have to do is include **stdio.h** and/or **stdlib.h**). If you've ever played with the math library, you've had a slightly closer look, as such code wouldn't compile with only an include of **math.h**, you also needed to add a **-lm** on the compiler command-line. Again, same details apply here, only the Makefile system automates the library linking. All we have to do is **#include** the appropriate files. ====Node application programs==== Upon successful implementation of the node library, take a look in **app/node/**, which will have (among others) the following files: * **node-app-display.c** * **node-app-display2.c** * **node-app-arrtolist.c** Take a look at these programs; your task is to complete them according to the directions located in comments within. This further works with the activities you've been doing on the first two projects- first with an array, then as a diagram/pseudocode, and now as syntactically correct C code. ===node-app-arrtolist=== As a means of testing your understanding, this program sets up a pre-existing array, filled with values, and displays it to STDOUT. Your task is to add in logic that builds a list, one node at a time, containing the same values (and in the same order) as is found in that array, and to then display the linked list to STDOUT, where we should see identical information. Sample output of completed code should look like: lab46:~/src/data/sln1/bin$ ./node-app-arrtolist Array: 3 1 4 1 5 9 2 6 5 3 5 8 9 7 List: 3 1 4 1 5 9 2 6 5 3 5 8 9 7 lab46:~/src/data/sln1/bin$ As the array is defined with set values, your output, when complete and correct, should always be the same. This tends to be a good exercise in demonstrating you understand conceptually what is going on and can perform the necessary node manipulations to pull it off. Again, be sure to use node library functions (like **mknode()**) in this program. ===node-app-display2=== In the **node-app-display** program you had to implement the list display functionality- effectively putting pseudocode written in the previous project to work as usable C code. Here, we work with that same idea, only we change a few things around structurally in the program- the final output should still be the same, but the code to produce it will be different. Basically, there are three items for you to address: * convert your raw **malloc()** calls and node initializations to **mknode()** calls. There should be 0 instances of **malloc()** in your final code. * move your display code into a dedicated **display()** function, which takes as a parameter a Node pointer pointing to the preferred beginning of the list you'd like to display. * This starts warming us up to future projects, where we'll be using and calling lots of functions The output should be the same as experienced in **node-app-display** when completed. ====Node library unit tests==== In **unit/node/**, you will find 3 files (along with a **Makefile**): * **unit-cpnode.c** - unit test for **cpnode()** library function * **unit-mknode.c** - unit test for **mknode()** library function * **unit-rmnode.c** - unit test for **rmnode()** library function These are complete runnable programs (when compiled, and linked against the node library, which is all handled for you by the **Makefile** system in place). Of particular importance, I want you to take a close look at: * the source code to each of these unit tests * the purpose of these programs is to validate the correct functionality of the respective library functions * follow the logic * make sure you understand what is going on * ask questions to get clarification! * the output from these programs once compiled and ran * analyze the output * make sure you understand what is going on * ask questions to get clarification! =====Building the code===== You've made changes to your node library implementation, or **node-app-display.c**, and are ready to see your results. What do we do? First, change back to the base of the project: lab46:~/src/data/sln1/src/node$ cd ~/src/data/sln1 lab46:~/src/data/sln1$ **OR:** You may want to have **two** terminals open- in one you are situated in **~/src/data/sln1/src/node/** editing away, and in the other you are in **~/src/data/sln1/**; this way you can take care of development activities AND easily check your results, without constantly navigating back and forth between various locations. ====cleaning things out==== If you've already done this a few times, you may want to clean things out and do a fresh compile (never hurts, and might actually fix some problems): lab46:~/src/data/sln1$ make clean ====compile project==== Next, compile the library and unit tests: lab46:~/src/data/sln1$ make or, compile with debugging support (**make** OR **make debug**): lab46:~/src/data/sln1$ make debug Once the library is built, you can then build the application programs. Trying to build them before you've built the library will cause errors. ====Our binaries==== Compiled executables go in the **bin** directory, so if we change into there and take a look around we see: lab46:~/src/data/sln1$ cd bin lab46:~/src/data/sln1/bin$ ls node-app-display node-app-test node-app-test2 unit-cpnode unit-mknode unit-rmnode verify-cpnode.sh verify-mknode.sh verify-rmnode.sh lab46:~/src/data/sln1/bin$ There may be others, and in time more and more files will appear here. ====Run the program==== To run **node-app-display**, we'd do the following (specify a relative path to the executable): lab46:~/src/data/sln1/bin$ ./node-app-display The program will now run, and do whatever it was programmed to do. ====Sample Output==== For example, let's say we ran the program and put the values 6, 17, 23, 4, 56, and 2 in the list. Your completed program would look like this when run: lab46:~/src/data/sln1/bin$ ./node-app-display Enter a value (-1 to quit): 6 Enter a value (-1 to quit): 17 Enter a value (-1 to quit): 23 Enter a value (-1 to quit): 4 Enter a value (-1 to quit): 56 Enter a value (-1 to quit): 2 Enter a value (-1 to quit): -1 6 -> 17 -> 23 -> 4 -> 56 -> 2 -> NULL lab46:~/src/data/sln1/bin$ **NOTE**: This is just example input. Not only should your program work with this, but lists of any length, containing any arrangement of valid values. =====Expected Results===== To assist you in verifying a correct implementation, a fully working implementation of the node library should resemble the following: ====node library==== Here is what you should get for the node library: lab46:~/src/data/sln1$ make check ==================================================== = Verifying Singly-Linked Node Functionality = ==================================================== [mknode] Total: 4, Matches: 4, Mismatches: 0 [cpnode] Total: 5, Matches: 5, Mismatches: 0 [rmnode] Total: 2, Matches: 2, Mismatches: 0 ==================================================== [RESULTS] Total: 11, Matches: 11, Mismatches: 0 ==================================================== lab46:~/src/data/sln1$ Note that there are sub-scripts that can also be manually run (as well as the unit tests themselves)... the more specific you get, the more detailed information you will receive (useful for debugging). This top-level **make check** action gives you the 30,000 foot view... what is the current status of your node library implementation? From there, you take whatever appropriate action is necessary. =====Submission===== {{page>haas:fall2018:common:submitblurb#DATA&noheader&nofooter}}