~~TOC~~
Corning Community College
Data Structures
Creating and Building Multi-File Sources
To look at how to split up a piece of monolithic C/C++ source code and also how to facilitate building resulting executable code.
Predominantly, up to this point, you have been tasked with and have developed monolithic source file programs; that is, the entirely of your coding has taken place within a single source file.
While this works, and has served you well so far, as you progress in programming courses, the projects will get more and more involved. Maintaining an entire codebase within a single file can start to get a little inefficient.
Be it from having to constantly scroll way down to the area you are working on, or dealing with situations where you are working in groups, and having everyone making distinct changes to a single source file can create a level of management you'd probably prefer to avoid.
Luckily, there are solutions to these problems, and we will be exploring them here.
First up, we will explore how to split up a monolithic source file into several distinct source files.
The other week, during our singly linked list explorations, we finally arrived at the implementation of a stack. This implementation made use of 2 functions in addition to main(): pop() and push()
While this program is still relatively small, it also serves as an excellent example of how to split up source code into multiple files for the process of learning how to do it.
So go and do the following to get prepared:
An example set of command lines is given here:
lab46:~$ cd src lab46:~/src$ mkdir multifile lab46:~/src$ cd multifile lab46:~/src/multifile$ cp /var/public/data/singlystack.c . lab46:~/src/multifile$
Looking at the code in singlystack.c, a good first step to take is to split off global declarations (things that take the form of global variables and function prototypes) into a separate file.
What's more, any file containing just these declarations, and no initializations or processing code, can be used as header files (just like stdio.h, stdlib.h, iostream, or cmath).
First order of business is to get it in a separate file. What I like to do is just make another copy of the original source file, go in and remove the non-declarative code. In the case of singlystack.c, we'll create a file called mystack.h and it will be made to contain only the following:
/* * singlystack.h - header file for singly linked list stack implementation * */ #include <stdio.h> #include <stdlib.h> struct node { int value; struct node *next; }; typedef struct node Node; // Function prototypes // void push(int); Node * pop(); // Create Node pointers for our stack // Node *stack, *top;
As you can see, this is pretty much the beginning of the singlystack.c code we worked on in class the other day.
Okay, we've just made our header file… we need to do one additional thing to it before we proceed, so as to avoid unnecessary compiler errors claiming “already declared” symbols. What we have to do is add some preprocessor directives to only allow our header file to be included just once when we get to the compilation process.
So our code gets the following before any code begins:
#ifndef __MYSTACK_H #define __MYSTACK_H
and the following at the very end:
#endif
After we do this, our mystack.h header file should look like this:
/* * singlystack.h - header file for singly linked list stack implementation * */ #ifndef __MYSTACK_H #define __MYSTACK_H #include <stdio.h> #include <stdlib.h> struct node { int value; struct node *next; }; typedef struct node Node; // Function prototypes // void push(int); Node * pop(); // Create Node pointers for our stack // Node *stack, *top; #endif
What's happening here is that the preprocessor directive #ifndef is checking to see if the following (and arbitrary) symbol, __MYSTACK_H, exists. If it doesn't, we #define it, and proceed with allowing the compiler to process the rest of the code in our header file.
In the event that __MYSTACK_H does exist, the rest of the header file is skipped (well, from that point to the corresponding #endif
It is traditional to define compiler symbols in CAPITALS, and to utilize underscores. In the event of program-specific symbols, such as the one we just created, prefix two underscores to the symbol name. Instead of using periods to identify the filename, substitute an underscore (again, just what I've traditionally seen).
Now, there is nothing magical about having called our symbol after the name of the file… we could have called it BOB if we had wanted, or even just bob. But, in an effort to maintain traditional programming conventions (if anyone else ever looks at your code, or you are working in a group, adhering to these conventions will dramatically increase the readability of your code).
Okay, now direct our attention back to singlystack.c… all the data that now consists of our header file is still residing at the top of this file.
What we want to do is remove it, leaving just the remaining code, and making an include preprocessor directive to our header file.
The results should look like this (beginning of code to the line that defines the main() function):
/* * singlystack.c - singly linked list stack implementation * */ #include "mystack.h" int main()
Make sense? We're just pulling out some code and using files to create a sense of modularity (smaller codebases to deal with when editing).
So far in our effort to split up our monolithic codebase into multiple source files, we have two distinct files:
mystack.h contains all the global declarations, where singlystack.c contains the initializations (or definitions, in the case of our 3 functions: main(), push(), and pop()).
Now, to make our codebase even more modular, we are going to split our functions into two files- one containing just main(), and the other containing the definitions of push() and pop().
We will call our new file stackops.c, and it will end up looking as follows:
/* * stackops.c - singly linked list stack implementation * */ #include "mystack.h" void push(int value) { if (top == NULL) { stack = (Node *) malloc (sizeof(Node)); top = stack; stack -> value = value; stack -> next = NULL; } else { top -> next = (Node *) malloc (sizeof(Node)); top -> next -> next = NULL; top = top -> next; top -> value = value; } } Node * pop() { Node *tmp, *tmp2; tmp = top; // tmp points at top of stack tmp2 = stack; // point tmp2 at bottom of stack if (tmp2 != NULL) // on empty stack, don't do anything { if (tmp2 -> next != NULL) // two or more nodes in stack { while (tmp2 -> next -> next != NULL) // iterate to 2nd to last { tmp2 = tmp2 -> next; } tmp2 -> next = NULL; // cap off our stack top = tmp2; // tmp2 is the new top } else // only one node in stack { stack = top = NULL; } } return(tmp); }
As you look at this file, notice how we literally just lifted the push() and pop() functions out of the original file, and merely added an:
#include "mystack.h"
at the top. This is really all there is to it.
This also leaves our original singlystack.c containing only the following:
/* * singlystack.c - singly linked list stack implementation * */ #include "mystack.h" int main() { Node *tmp; int input = 0; printf("Enter a value (-1 to stop): "); scanf("%d", &input); stack = top = NULL; while (input != -1) { push(input); printf("Enter a value (-1 to stop): "); scanf("%d", &input); } printf("Stack created. Popping time.\n"); while ((tmp = pop()) != NULL) { printf("POPPED: %d\n", tmp -> value); free(tmp); } printf("POPPED: NULL\n"); return(0); }
So the overall aims of splitting your source code into multiple files include:
As an exercise to the reader, go even further and split stackops.c into two discrete files- one containing push() and one containing pop().
Obviously one of the immediate side effects of splitting up a codebase is the increase in complexity of the compile operation. No longer can we do a simple:
lab46:~/src/multi$ gcc -o singlystack singlystack.c
and expect everything to be pulled in as needed. This isn't to say that some variant won't work for our purposes, but I will show you the proper way to perform a compile, which will come in handy when dealing with much more complex code bases.
We will compile as follows:
lab46:~/src/multi$ ls mystack.h singlystack.c stackops.c lab46:~/src/multi$ gcc -c stackops.c lab46:~/src/multi$ gcc -c singlystack.c lab46:~/src/multi$ ls mystack.h singlystack singlystack.c singlystack.o stackops.c stackops.o lab46:~/src/multi$ gcc -o singlystack singlystack.o stackops.o
As you can see, we have a couple intermediate runs of the compiler to produce object files (.o), which contain the compiled version of our code. This compiled code is not executable on the system, because it has not been linked with system libraries. The last (and more familiar) step includes that operation, leaving us with an executable named singlystack that can be run, just as before.
At this point, assuming no typos, you can run singlystack and see that it operates to the end user exactly as the monolithic code base.
And we're done!
As a means of exploring multiple examples, here is another sample code base we can split into multiple files.
In /var/public/data you will find a file called dotdash.cc… copy this into a directory under your ~/src/ directory:
lab46:~$ cd src lab46:~/src$ mkdir dotdash lab46:~/src$ cd dotdash lab46:~/src/dotdash$ cp -v /var/public/data/dotdash.cc . `/var/public/data/dotdash.cc' -> `./dotdash.cc' lab46:~/src/dotdash$
Looking through this file, we see a couple class definitions and a sample implementation. Just as with our example in C, we can split off declarations into their own header file:
#ifndef _CLASS_H #define _CLASS_H using namespace std; class dot { public: dot(); void setSize(int); int getSize(); private: int size; }; class dash { public: dash(); void setLength(int); int getLength(); private: int length; }; #endif
We'll isolate main() next, placing it in a unique file:
#include <iostream> #include "class.h" using namespace std; int main() { int dotsize1, dotsize2, dashlength1, dashlength2, tmp; dot myDot; dash myDash1, myDash2; dot myDot2; dotsize1 = 12; dotsize2 = 36; dashlength1 = 4; dashlength2 = 73; tmp = myDot.getSize(); cout << "myDot's size is: " << tmp << endl; cout << "myDash1's length is: " << myDash1.getLength() << endl; myDot.setSize(dotsize1); cout << "myDot's size is now: " << myDot.getSize() << endl; return(0); }
A file for the dot class:
#include "class.h" dot :: dot() { size = 0; } void dot :: setSize(int mass) { size = mass; } int dot :: getSize() { return(size); }
And a file for dash:
#include "class.h" dash :: dash() { length = 0; } void dash :: setLength(int howlong) { length = howlong; } int dash :: getLength() { return(length); }
Very similar as with the C code, but this time we use g++:
lab46:~/src/dotdash$ g++ -c main.cc lab46:~/src/dotdash$ g++ -c dot.cc lab46:~/src/dotdash$ g++ -c dash.cc lab46:~/src/dotdash$ g++ -o dotdash main.o dot.o dash.o lab46:~/src/dotdash$
The intermediary object files are created, then combined together to form the final executable.
Now that we've gone through the process of splitting up source into multiple files and getting the whole thing compiled, we will look at automating that process.
Since complex code bases are the norm rather than the exception, tools were designed long ago that facilitate the task of compiling multiple files. A popular one used on UNIX/Linux systems is the Makefile system made available with the command make.
To use a Makefile, we must create a text file, called Makefile in the current directory of our source files.
Our Makefile will contain the following:
CC = gcc $(OPTS) $(LIBS) LIBS = OPTS = OBJ = singlystack.o stackops.o BIN = singlystack all: singlystack singlystack: $(OBJ) $(CC) $(CFLAGS) -o $(BIN) $(OBJ) $(LIBS) clean: rm -f *.o $(BIN) core default: $(BIN)
What the Makefile does is establish a set of rules the compiler can follow to successfully compile working code. To put it in action, we would run make at the command line.
To witness it in all its glory, we should clear out any prior object/binary files. In the Makefile, we put in a rule called clean, which when provided as an argument to make, will run that command (the rm), to remove all the specified files (in this case, all the products of compilation, leaving the original source).
Let's do that now:
lab46:~/src/multi$ ls Makefile singlystack singlystack.o stackops.o mystack.h singlystack.c stackops.c lab46:~/src/multi$ make clean rm -f *.o singlystack core lab46:~/src/multi$ ls Makefile mystack.h singlystack.c stackops.c lab46:~/src/multi$
So as you can see, make clean clears the slate, leaving just our source files.
To see make in action building our code, simply run make:
lab46:~/src/multi$ make gcc -c -o singlystack.o singlystack.c gcc -c -o stackops.o stackops.c gcc -o singlystack singlystack.o stackops.o lab46:~/src/multi$
The executable singlystack once again exists, and we can run it, just as before.
Hopefully you will see that with the use of a Makefile, some of our development tasks can be automated. And this is by no means the extent of what make can do… there is a whole syntax available for use in a Makefile, that can go far beyond what we currently need for our purposes.
I encourage you to study this Makefile, and even modify it for use in other programs you are working on.
Additionally, everyone should already have a Makefile, residing in the base of their ~/src/ directory… this Makefile is a bit simpler, and reading the comments within one can figure out how to use it with monolithic source files. Be sure to check out that file as well, as it includes comments on many of the options in our new Makefile created here.
make can be used to automate pretty much any code compilation. We just commonly use it with C and C++ programs.
NOTE: Unlike most conventions in UNIX, a Makefile DOES start with a capital letter. This is actually quite important; if you neglect to capitalize the first letter, make will likely be unable to find your compilation rules.
Our Makefile will contain the following:
CXX = g++ $(INC) CFLAGS = OBJ = main.o dot.o dash.o BIN = dotdash all: dotdash dotdash: $(OBJ) $(CXX) $(CFLAGS) -o $(BIN) $(OBJ) $(LIBS) clean: rm -f *.o $(BIN) core default: $(BIN)
Notice the Makefile appears pretty much the same, except we use the variable CXX here instead of CC, to indicate the code is in C++. Aside from that, the rules are pretty much identical.