Table of Contents

~~TOC~~




Corning Community College


Data Structures



Creating and Building Multi-File Sources

Objective

To look at how to split up a piece of monolithic C/C++ source code and also how to facilitate building resulting executable code.

Background

Predominantly, up to this point, you have been tasked with and have developed monolithic source file programs; that is, the entirely of your coding has taken place within a single source file.

While this works, and has served you well so far, as you progress in programming courses, the projects will get more and more involved. Maintaining an entire codebase within a single file can start to get a little inefficient.

Be it from having to constantly scroll way down to the area you are working on, or dealing with situations where you are working in groups, and having everyone making distinct changes to a single source file can create a level of management you'd probably prefer to avoid.

Luckily, there are solutions to these problems, and we will be exploring them here.

First up, we will explore how to split up a monolithic source file into several distinct source files.

Program 1: singlystack.c

The other week, during our singly linked list explorations, we finally arrived at the implementation of a stack. This implementation made use of 2 functions in addition to main(): pop() and push()

While this program is still relatively small, it also serves as an excellent example of how to split up source code into multiple files for the process of learning how to do it.

Step 0: Preparation

So go and do the following to get prepared:

  1. In ~/src/, create a new subdirectory (I'll use ~/src/multifile/ in my examples)
  2. Once created, place a fresh copy of /var/public/data/singlystack.c into your new ~/src/ subdirectory

An example set of command lines is given here:

lab46:~$ cd src
lab46:~/src$ mkdir multifile
lab46:~/src$ cd multifile
lab46:~/src/multifile$ cp /var/public/data/singlystack.c .
lab46:~/src/multifile$ 

Step 1: Declarations in the header

Looking at the code in singlystack.c, a good first step to take is to split off global declarations (things that take the form of global variables and function prototypes) into a separate file.

What's more, any file containing just these declarations, and no initializations or processing code, can be used as header files (just like stdio.h, stdlib.h, iostream, or cmath).

First order of business is to get it in a separate file. What I like to do is just make another copy of the original source file, go in and remove the non-declarative code. In the case of singlystack.c, we'll create a file called mystack.h and it will be made to contain only the following:

/*                                                                                        
 * singlystack.h - header file for singly linked list stack implementation
 *
 */
 
#include <stdio.h>
#include <stdlib.h>
 
struct node {
    int value;
    struct node *next;
};
typedef struct node Node;
 
// Function prototypes
//
void push(int);
Node * pop();
 
// Create Node pointers for our stack
//
Node *stack, *top;

As you can see, this is pretty much the beginning of the singlystack.c code we worked on in class the other day.

Step 2: Prevent duplicate declarations

Okay, we've just made our header file… we need to do one additional thing to it before we proceed, so as to avoid unnecessary compiler errors claiming “already declared” symbols. What we have to do is add some preprocessor directives to only allow our header file to be included just once when we get to the compilation process.

So our code gets the following before any code begins:

#ifndef __MYSTACK_H
#define __MYSTACK_H

and the following at the very end:

#endif

After we do this, our mystack.h header file should look like this:

mystack.h
/*                                                                                        
 * singlystack.h - header file for singly linked list stack implementation
 *
 */
#ifndef __MYSTACK_H
#define __MYSTACK_H
#include <stdio.h>
#include <stdlib.h>
 
struct node {
    int value;
    struct node *next;
};
typedef struct node Node;
 
// Function prototypes
//
void push(int);
Node * pop();
 
// Create Node pointers for our stack
//
Node *stack, *top;
#endif

What's happening here is that the preprocessor directive #ifndef is checking to see if the following (and arbitrary) symbol, __MYSTACK_H, exists. If it doesn't, we #define it, and proceed with allowing the compiler to process the rest of the code in our header file.

In the event that __MYSTACK_H does exist, the rest of the header file is skipped (well, from that point to the corresponding #endif

It is traditional to define compiler symbols in CAPITALS, and to utilize underscores. In the event of program-specific symbols, such as the one we just created, prefix two underscores to the symbol name. Instead of using periods to identify the filename, substitute an underscore (again, just what I've traditionally seen).

Now, there is nothing magical about having called our symbol after the name of the file… we could have called it BOB if we had wanted, or even just bob. But, in an effort to maintain traditional programming conventions (if anyone else ever looks at your code, or you are working in a group, adhering to these conventions will dramatically increase the readability of your code).

Step 3: Use our header file

Okay, now direct our attention back to singlystack.c… all the data that now consists of our header file is still residing at the top of this file.

What we want to do is remove it, leaving just the remaining code, and making an include preprocessor directive to our header file.

The results should look like this (beginning of code to the line that defines the main() function):

/*                                                                                        
 * singlystack.c - singly linked list stack implementation
 *
 */
#include "mystack.h"
 
int main()

Make sense? We're just pulling out some code and using files to create a sense of modularity (smaller codebases to deal with when editing).

Step 4: Functions into separate file

So far in our effort to split up our monolithic codebase into multiple source files, we have two distinct files:

mystack.h contains all the global declarations, where singlystack.c contains the initializations (or definitions, in the case of our 3 functions: main(), push(), and pop()).

Now, to make our codebase even more modular, we are going to split our functions into two files- one containing just main(), and the other containing the definitions of push() and pop().

We will call our new file stackops.c, and it will end up looking as follows:

stackops.c
/*                                                                                        
 * stackops.c - singly linked list stack implementation
 *
 */
#include "mystack.h"
 
void push(int value)
{
    if (top == NULL)
    {   
        stack = (Node *) malloc (sizeof(Node));
        top = stack;
        stack -> value = value;
        stack -> next = NULL;
    }   
    else
    {   
        top -> next = (Node *) malloc (sizeof(Node));
        top -> next -> next = NULL;
        top = top -> next;
        top -> value = value;
    }   
}
 
Node * pop()
{
    Node *tmp, *tmp2;
 
    tmp = top;      // tmp points at top of stack
    tmp2 = stack;   // point tmp2 at bottom of stack
 
    if (tmp2 != NULL)   // on empty stack, don't do anything
    {
        if (tmp2 -> next != NULL) // two or more nodes in stack
        {
            while (tmp2 -> next -> next != NULL) // iterate to 2nd to last
            {
                tmp2 = tmp2 -> next;
            }
            tmp2 -> next = NULL;    // cap off our stack
            top = tmp2;             // tmp2 is the new top
        }
        else            // only one node in stack
        {
            stack = top = NULL;
        }
    }
 
    return(tmp);
}

As you look at this file, notice how we literally just lifted the push() and pop() functions out of the original file, and merely added an:

#include "mystack.h"

at the top. This is really all there is to it.

This also leaves our original singlystack.c containing only the following:

singlystack.c
/*                                                                                        
 * singlystack.c - singly linked list stack implementation
 *
 */
#include "mystack.h"
 
int main()
{
    Node *tmp;
    int input = 0;
 
    printf("Enter a value (-1 to stop): ");
    scanf("%d", &input);
 
    stack = top = NULL;
    while (input != -1) 
    {   
        push(input);
        printf("Enter a value (-1 to stop): ");
        scanf("%d", &input);
    }   
 
    printf("Stack created. Popping time.\n");
 
    while ((tmp = pop()) != NULL)
    {   
        printf("POPPED: %d\n", tmp -> value);
        free(tmp);
    }   
    printf("POPPED: NULL\n");
 
    return(0);
}

So the overall aims of splitting your source code into multiple files include:

As an exercise to the reader, go even further and split stackops.c into two discrete files- one containing push() and one containing pop().

Step 5: Compiling the code

Obviously one of the immediate side effects of splitting up a codebase is the increase in complexity of the compile operation. No longer can we do a simple:

lab46:~/src/multi$ gcc -o singlystack singlystack.c

and expect everything to be pulled in as needed. This isn't to say that some variant won't work for our purposes, but I will show you the proper way to perform a compile, which will come in handy when dealing with much more complex code bases.

We will compile as follows:

lab46:~/src/multi$ ls
mystack.h    singlystack.c    stackops.c
lab46:~/src/multi$ gcc -c stackops.c
lab46:~/src/multi$ gcc -c singlystack.c
lab46:~/src/multi$ ls
mystack.h    singlystack    singlystack.c    singlystack.o
stackops.c   stackops.o
lab46:~/src/multi$ gcc -o singlystack singlystack.o stackops.o

As you can see, we have a couple intermediate runs of the compiler to produce object files (.o), which contain the compiled version of our code. This compiled code is not executable on the system, because it has not been linked with system libraries. The last (and more familiar) step includes that operation, leaving us with an executable named singlystack that can be run, just as before.

At this point, assuming no typos, you can run singlystack and see that it operates to the end user exactly as the monolithic code base.

And we're done!

Program 2: Dots and Dashes in C++

As a means of exploring multiple examples, here is another sample code base we can split into multiple files.

Step 0: Obtaining the source

In /var/public/data you will find a file called dotdash.cc… copy this into a directory under your ~/src/ directory:

lab46:~$ cd src
lab46:~/src$ mkdir dotdash
lab46:~/src$ cd dotdash
lab46:~/src/dotdash$ cp -v /var/public/data/dotdash.cc .
`/var/public/data/dotdash.cc' -> `./dotdash.cc'
lab46:~/src/dotdash$ 

Step 1: Creating a header

Looking through this file, we see a couple class definitions and a sample implementation. Just as with our example in C, we can split off declarations into their own header file:

class.h
#ifndef _CLASS_H                                                                          
#define _CLASS_H
 
using namespace std;
 
class dot 
{
    public:
        dot();
        void setSize(int);
        int getSize();
 
    private:
        int size;
};
 
class dash
{
    public:
        dash();
        void setLength(int);
        int getLength();
 
    private:
        int length;
};
#endif

Step 2: main() function in its own file

We'll isolate main() next, placing it in a unique file:

main.cc
#include <iostream>                                                                       
#include "class.h"
 
using namespace std;
 
int main()
{
    int dotsize1, dotsize2, dashlength1, dashlength2, tmp;
    dot myDot;
    dash myDash1, myDash2;
    dot myDot2;
 
    dotsize1 = 12; 
    dotsize2 = 36; 
    dashlength1 = 4;
    dashlength2 = 73; 
 
    tmp = myDot.getSize();
    cout << "myDot's size is: " << tmp << endl;
 
    cout << "myDash1's length is: " << myDash1.getLength() << endl;
 
    myDot.setSize(dotsize1);
    cout << "myDot's size is now: " << myDot.getSize() << endl;
    return(0);
}

Step 3: dot class implementation

A file for the dot class:

dot.cc
#include "class.h"                                                                        
 
dot :: dot()
{
    size = 0;
}
 
void dot :: setSize(int mass)
{
    size = mass;
}
 
int dot :: getSize()
{
    return(size);
}

Step 4: dash class implementation

And a file for dash:

dash.cc
#include "class.h"                                                                        
 
dash :: dash()
{
    length = 0;
}
 
void dash :: setLength(int howlong)
{
    length = howlong;
}
 
int dash :: getLength()
{
    return(length);
}

Step 5: Compiling the code

Very similar as with the C code, but this time we use g++:

lab46:~/src/dotdash$ g++ -c main.cc
lab46:~/src/dotdash$ g++ -c dot.cc
lab46:~/src/dotdash$ g++ -c dash.cc
lab46:~/src/dotdash$ g++ -o dotdash main.o dot.o dash.o
lab46:~/src/dotdash$ 

The intermediary object files are created, then combined together to form the final executable.

Automating compilation with Makefiles

Now that we've gone through the process of splitting up source into multiple files and getting the whole thing compiled, we will look at automating that process.

Since complex code bases are the norm rather than the exception, tools were designed long ago that facilitate the task of compiling multiple files. A popular one used on UNIX/Linux systems is the Makefile system made available with the command make.

To use a Makefile, we must create a text file, called Makefile in the current directory of our source files.

Program 1: A Makefile for singlystack

Our Makefile will contain the following:

Makefile
CC = gcc $(OPTS) $(LIBS)
LIBS =                                                                                    
OPTS =
OBJ = singlystack.o stackops.o
BIN = singlystack
all: singlystack
 
singlystack: $(OBJ)
    $(CC) $(CFLAGS) -o $(BIN) $(OBJ) $(LIBS)
 
clean:
    rm -f *.o $(BIN) core
 
default: $(BIN)

What the Makefile does is establish a set of rules the compiler can follow to successfully compile working code. To put it in action, we would run make at the command line.

To witness it in all its glory, we should clear out any prior object/binary files. In the Makefile, we put in a rule called clean, which when provided as an argument to make, will run that command (the rm), to remove all the specified files (in this case, all the products of compilation, leaving the original source).

Let's do that now:

lab46:~/src/multi$ ls
Makefile    singlystack    singlystack.o    stackops.o
mystack.h   singlystack.c  stackops.c
lab46:~/src/multi$ make clean
rm -f *.o singlystack core
lab46:~/src/multi$ ls
Makefile    mystack.h    singlystack.c    stackops.c
lab46:~/src/multi$ 

So as you can see, make clean clears the slate, leaving just our source files.

To see make in action building our code, simply run make:

lab46:~/src/multi$ make
gcc      -c -o singlystack.o singlystack.c
gcc      -c -o stackops.o stackops.c
gcc    -o singlystack singlystack.o stackops.o 
lab46:~/src/multi$ 

The executable singlystack once again exists, and we can run it, just as before.

Hopefully you will see that with the use of a Makefile, some of our development tasks can be automated. And this is by no means the extent of what make can do… there is a whole syntax available for use in a Makefile, that can go far beyond what we currently need for our purposes.

I encourage you to study this Makefile, and even modify it for use in other programs you are working on.

Additionally, everyone should already have a Makefile, residing in the base of their ~/src/ directory… this Makefile is a bit simpler, and reading the comments within one can figure out how to use it with monolithic source files. Be sure to check out that file as well, as it includes comments on many of the options in our new Makefile created here.

make can be used to automate pretty much any code compilation. We just commonly use it with C and C++ programs.

NOTE: Unlike most conventions in UNIX, a Makefile DOES start with a capital letter. This is actually quite important; if you neglect to capitalize the first letter, make will likely be unable to find your compilation rules.

Program 2: A Makefile for dotdash

Our Makefile will contain the following:

Makefile
CXX = g++ $(INC)
CFLAGS =
OBJ = main.o dot.o dash.o
BIN = dotdash
all: dotdash
 
dotdash: $(OBJ)
    $(CXX) $(CFLAGS) -o $(BIN) $(OBJ) $(LIBS)
 
clean:
    rm -f *.o $(BIN) core
 
default: $(BIN)

Notice the Makefile appears pretty much the same, except we use the variable CXX here instead of CC, to indicate the code is in C++. Aside from that, the rules are pretty much identical.