~~TOC~~ \\ \\ \\ Corning Community College \\ Data Structures \\ \\ Creating and Building Multi-File Sources =====Objective===== To look at how to split up a piece of monolithic C/C++ source code and also how to facilitate building resulting executable code. =====Background===== Predominantly, up to this point, you have been tasked with and have developed monolithic source file programs; that is, the entirely of your coding has taken place within a single source file. While this works, and has served you well so far, as you progress in programming courses, the projects will get more and more involved. Maintaining an entire codebase within a single file can start to get a little inefficient. Be it from having to constantly scroll way down to the area you are working on, or dealing with situations where you are working in groups, and having everyone making distinct changes to a single source file can create a level of management you'd probably prefer to avoid. Luckily, there are solutions to these problems, and we will be exploring them here. First up, we will explore how to split up a monolithic source file into several distinct source files. =====Program 1: singlystack.c===== The other week, during our singly linked list explorations, we finally arrived at the implementation of a stack. This implementation made use of 2 functions in addition to **main()**: **pop()** and **push()** While this program is still relatively small, it also serves as an excellent example of how to split up source code into multiple files for the process of learning how to do it. ====Step 0: Preparation==== So go and do the following to get prepared: - In **~/src/**, create a new subdirectory (I'll use **~/src/multifile/** in my examples) - Once created, place a fresh copy of **/var/public/data/singlystack.c** into your new **~/src/** subdirectory An example set of command lines is given here: lab46:~$ cd src lab46:~/src$ mkdir multifile lab46:~/src$ cd multifile lab46:~/src/multifile$ cp /var/public/data/singlystack.c . lab46:~/src/multifile$ ====Step 1: Declarations in the header==== Looking at the code in **singlystack.c**, a good first step to take is to split off global declarations (things that take the form of global variables and function prototypes) into a separate file. What's more, any file containing just these declarations, and no initializations or processing code, can be used as header files (just like **stdio.h**, **stdlib.h**, **iostream**, or **cmath**). First order of business is to get it in a separate file. What I like to do is just make another copy of the original source file, go in and remove the non-declarative code. In the case of **singlystack.c**, we'll create a file called **mystack.h** and it will be made to contain only the following: /* * singlystack.h - header file for singly linked list stack implementation * */ #include #include struct node { int value; struct node *next; }; typedef struct node Node; // Function prototypes // void push(int); Node * pop(); // Create Node pointers for our stack // Node *stack, *top; As you can see, this is pretty much the beginning of the **singlystack.c** code we worked on in class the other day. ====Step 2: Prevent duplicate declarations==== Okay, we've //just// made our header file... we need to do one additional thing to it before we proceed, so as to avoid unnecessary compiler errors claiming "already declared" symbols. What we have to do is add some preprocessor directives to only allow our header file to be included just once when we get to the compilation process. So our code gets the following before any code begins: #ifndef __MYSTACK_H #define __MYSTACK_H and the following at the very end: #endif After we do this, our **mystack.h** header file should look like this: /* * singlystack.h - header file for singly linked list stack implementation * */ #ifndef __MYSTACK_H #define __MYSTACK_H #include #include struct node { int value; struct node *next; }; typedef struct node Node; // Function prototypes // void push(int); Node * pop(); // Create Node pointers for our stack // Node *stack, *top; #endif What's happening here is that the preprocessor directive **#ifndef** is checking to see if the following (and arbitrary) symbol, **__MYSTACK_H**, exists. If it doesn't, we **#define** it, and proceed with allowing the compiler to process the rest of the code in our header file. In the event that **__MYSTACK_H** does exist, the rest of the header file is skipped (well, from that point to the corresponding **#endif** It is traditional to define compiler symbols in CAPITALS, and to utilize underscores. In the event of program-specific symbols, such as the one we just created, prefix two underscores to the symbol name. Instead of using periods to identify the filename, substitute an underscore (again, just what I've traditionally seen). Now, there is nothing magical about having called our symbol after the name of the file... we could have called it **BOB** if we had wanted, or even just **bob**. But, in an effort to maintain traditional programming conventions (if anyone else ever looks at your code, or you are working in a group, adhering to these conventions will dramatically increase the readability of your code). ====Step 3: Use our header file==== Okay, now direct our attention back to **singlystack.c**... all the data that now consists of our header file is still residing at the top of this file. What we want to do is remove it, leaving just the remaining code, and making an include preprocessor directive to our header file. The results should look like this (beginning of code to the line that defines the **main()** function): /* * singlystack.c - singly linked list stack implementation * */ #include "mystack.h" int main() Make sense? We're just pulling out some code and using files to create a sense of modularity (smaller codebases to deal with when editing). ====Step 4: Functions into separate file==== So far in our effort to split up our monolithic codebase into multiple source files, we have two distinct files: * singlystack.c * mystack.h **mystack.h** contains all the global declarations, where **singlystack.c** contains the initializations (or definitions, in the case of our 3 functions: **main()**, **push()**, and **pop()**). Now, to make our codebase even more modular, we are going to split our functions into two files- one containing just **main()**, and the other containing the definitions of **push()** and **pop()**. We will call our new file **stackops.c**, and it will end up looking as follows: /* * stackops.c - singly linked list stack implementation * */ #include "mystack.h" void push(int value) { if (top == NULL) { stack = (Node *) malloc (sizeof(Node)); top = stack; stack -> value = value; stack -> next = NULL; } else { top -> next = (Node *) malloc (sizeof(Node)); top -> next -> next = NULL; top = top -> next; top -> value = value; } } Node * pop() { Node *tmp, *tmp2; tmp = top; // tmp points at top of stack tmp2 = stack; // point tmp2 at bottom of stack if (tmp2 != NULL) // on empty stack, don't do anything { if (tmp2 -> next != NULL) // two or more nodes in stack { while (tmp2 -> next -> next != NULL) // iterate to 2nd to last { tmp2 = tmp2 -> next; } tmp2 -> next = NULL; // cap off our stack top = tmp2; // tmp2 is the new top } else // only one node in stack { stack = top = NULL; } } return(tmp); } As you look at this file, notice how we literally just lifted the **push()** and **pop()** functions out of the original file, and merely added an: #include "mystack.h" at the top. This is really all there is to it. This also leaves our original **singlystack.c** containing only the following: /* * singlystack.c - singly linked list stack implementation * */ #include "mystack.h" int main() { Node *tmp; int input = 0; printf("Enter a value (-1 to stop): "); scanf("%d", &input); stack = top = NULL; while (input != -1) { push(input); printf("Enter a value (-1 to stop): "); scanf("%d", &input); } printf("Stack created. Popping time.\n"); while ((tmp = pop()) != NULL) { printf("POPPED: %d\n", tmp -> value); free(tmp); } printf("POPPED: NULL\n"); return(0); } So the overall aims of splitting your source code into multiple files include: * further modularization; allows easier modification amongst a group * easier manipulation- less code to scroll through * organization- an extra effort of thought has been put into what to put where As an exercise to the reader, go even further and split **stackops.c** into two discrete files- one containing **push()** and one containing **pop()**. ====Step 5: Compiling the code===== Obviously one of the immediate side effects of splitting up a codebase is the increase in complexity of the compile operation. No longer can we do a simple: lab46:~/src/multi$ gcc -o singlystack singlystack.c and expect everything to be pulled in as needed. This isn't to say that some variant won't work for our purposes, but I will show you the proper way to perform a compile, which will come in handy when dealing with much more complex code bases. We will compile as follows: lab46:~/src/multi$ ls mystack.h singlystack.c stackops.c lab46:~/src/multi$ gcc -c stackops.c lab46:~/src/multi$ gcc -c singlystack.c lab46:~/src/multi$ ls mystack.h singlystack singlystack.c singlystack.o stackops.c stackops.o lab46:~/src/multi$ gcc -o singlystack singlystack.o stackops.o As you can see, we have a couple intermediate runs of the compiler to produce object files (.o), which contain the compiled version of our code. This compiled code is not executable on the system, because it has not been linked with system libraries. The last (and more familiar) step includes that operation, leaving us with an executable named **singlystack** that can be run, just as before. At this point, assuming no typos, you can run **singlystack** and see that it operates to the end user exactly as the monolithic code base. And we're done! =====Program 2: Dots and Dashes in C++===== As a means of exploring multiple examples, here is another sample code base we can split into multiple files. ====Step 0: Obtaining the source==== In **/var/public/data** you will find a file called **dotdash.cc**... copy this into a directory under your **~/src/** directory: lab46:~$ cd src lab46:~/src$ mkdir dotdash lab46:~/src$ cd dotdash lab46:~/src/dotdash$ cp -v /var/public/data/dotdash.cc . `/var/public/data/dotdash.cc' -> `./dotdash.cc' lab46:~/src/dotdash$ ====Step 1: Creating a header==== Looking through this file, we see a couple class definitions and a sample implementation. Just as with our example in C, we can split off declarations into their own header file: #ifndef _CLASS_H #define _CLASS_H using namespace std; class dot { public: dot(); void setSize(int); int getSize(); private: int size; }; class dash { public: dash(); void setLength(int); int getLength(); private: int length; }; #endif ====Step 2: main() function in its own file==== We'll isolate **main()** next, placing it in a unique file: #include #include "class.h" using namespace std; int main() { int dotsize1, dotsize2, dashlength1, dashlength2, tmp; dot myDot; dash myDash1, myDash2; dot myDot2; dotsize1 = 12; dotsize2 = 36; dashlength1 = 4; dashlength2 = 73; tmp = myDot.getSize(); cout << "myDot's size is: " << tmp << endl; cout << "myDash1's length is: " << myDash1.getLength() << endl; myDot.setSize(dotsize1); cout << "myDot's size is now: " << myDot.getSize() << endl; return(0); } ====Step 3: dot class implementation==== A file for the dot class: #include "class.h" dot :: dot() { size = 0; } void dot :: setSize(int mass) { size = mass; } int dot :: getSize() { return(size); } ====Step 4: dash class implementation==== And a file for dash: #include "class.h" dash :: dash() { length = 0; } void dash :: setLength(int howlong) { length = howlong; } int dash :: getLength() { return(length); } ====Step 5: Compiling the code==== Very similar as with the C code, but this time we use **g++**: lab46:~/src/dotdash$ g++ -c main.cc lab46:~/src/dotdash$ g++ -c dot.cc lab46:~/src/dotdash$ g++ -c dash.cc lab46:~/src/dotdash$ g++ -o dotdash main.o dot.o dash.o lab46:~/src/dotdash$ The intermediary object files are created, then combined together to form the final executable. =====Automating compilation with Makefiles===== Now that we've gone through the process of splitting up source into multiple files and getting the whole thing compiled, we will look at automating that process. Since complex code bases are the norm rather than the exception, tools were designed long ago that facilitate the task of compiling multiple files. A popular one used on UNIX/Linux systems is the **Makefile** system made available with the command **make**. To use a **Makefile**, we must create a text file, called **Makefile** in the current directory of our source files. ====Program 1: A Makefile for singlystack==== Our **Makefile** will contain the following: CC = gcc $(OPTS) $(LIBS) LIBS = OPTS = OBJ = singlystack.o stackops.o BIN = singlystack all: singlystack singlystack: $(OBJ) $(CC) $(CFLAGS) -o $(BIN) $(OBJ) $(LIBS) clean: rm -f *.o $(BIN) core default: $(BIN) What the **Makefile** does is establish a set of rules the compiler can follow to successfully compile working code. To put it in action, we would run **make** at the command line. To witness it in all its glory, we should clear out any prior object/binary files. In the **Makefile**, we put in a rule called //clean//, which when provided as an argument to **make**, will run that command (the **rm**), to remove all the specified files (in this case, all the products of compilation, leaving the original source). Let's do that now: lab46:~/src/multi$ ls Makefile singlystack singlystack.o stackops.o mystack.h singlystack.c stackops.c lab46:~/src/multi$ make clean rm -f *.o singlystack core lab46:~/src/multi$ ls Makefile mystack.h singlystack.c stackops.c lab46:~/src/multi$ So as you can see, **make clean** clears the slate, leaving just our source files. To see **make** in action building our code, simply run **make**: lab46:~/src/multi$ make gcc -c -o singlystack.o singlystack.c gcc -c -o stackops.o stackops.c gcc -o singlystack singlystack.o stackops.o lab46:~/src/multi$ The executable **singlystack** once again exists, and we can run it, just as before. Hopefully you will see that with the use of a **Makefile**, some of our development tasks can be automated. And this is by no means the extent of what **make** can do... there is a whole syntax available for use in a **Makefile**, that can go far beyond what we currently need for our purposes. I encourage you to study this **Makefile**, and even modify it for use in other programs you are working on. Additionally, everyone should already have a **Makefile**, residing in the base of their **~/src/** directory... this **Makefile** is a bit simpler, and reading the comments within one can figure out how to use it with monolithic source files. Be sure to check out that file as well, as it includes comments on many of the options in our new **Makefile** created here. **make** can be used to automate pretty much any code compilation. We just commonly use it with C and C++ programs. NOTE: Unlike most conventions in UNIX, a **Makefile** DOES start with a capital letter. This is actually quite important; if you neglect to capitalize the first letter, **make** will likely be unable to find your compilation rules. ====Program 2: A Makefile for dotdash==== Our **Makefile** will contain the following: CXX = g++ $(INC) CFLAGS = OBJ = main.o dot.o dash.o BIN = dotdash all: dotdash dotdash: $(OBJ) $(CXX) $(CFLAGS) -o $(BIN) $(OBJ) $(LIBS) clean: rm -f *.o $(BIN) core default: $(BIN) Notice the **Makefile** appears pretty much the same, except we use the variable **CXX** here instead of **CC**, to indicate the code is in C++. Aside from that, the rules are pretty much identical.