Corning Community College
CSCS1730 UNIX/Linux Fundamentals
Lab 0x8: The UNIX Programming Environment
~~TOC~~
======Objective======
Exploration of the development environment tools available in UNIX operating systems.
======Reference======
* gcc HOWTO: http://www.faqs.org/docs/Linux-HOWTO/GCC-HOWTO.html
* Bell Labs' The Creation of the UNIX Operating System: http://www.bell-labs.com/history/unix/
* Linux Assembly HOWTO: http://www.bell-labs.com/history/unix/
======Background======
=====C and UNIX=====
Soon after UNIX was created in its first assembly language implementation on that "little-used" PDP-7, there became a need for it to be portable, so it could run on other machines (including the PDP-11). An attempt was made in the language called B, but it didn't prove to be useful. Soon thereafter C was implemented and UNIX was rewritten in that language.
Since then, C and UNIX have been tightly related to one another- UNIX facilities either appear very "C-like" or are easily accessible with C. In fact, the typical UNIX kernel is exclusively C, aside from the architecture-specific assembly language needed to interface with the basic hardware operations.
Because of this, UNIX is a very portable operating system- and as thirty years of existance has shown, a port of some form or another is found on nearly every computer architecture.
=====Terminology=====
* __Source Code__: plain ASCII text with the programming language's structures and grammar, and comments.
* __Assembly Code__: plain ASCII text containing the assembly mneumonics equivalent of your source code. (There exists a one-to-many relationship between a programming language command and the corresponding assembly functionality).
* __Object Code__: binary file containing the machine code equivalent of the assembly mneumonics. (There is a direct one-to-one relationship between an assembly mneumonic and corresponding machine instruction). This code by itself is not executable as it still relies on functionality from the various system libraries.
* __Machine Code__: the final product of the compilation process. This is in machine language- and is in the form that the computer can natively understand it.
* __Architecture__: implementation of machine code and supporting processor. Each different processor has different instructions and therefore machine language is not portable between different architectures/processors. Examples of architectures include: SPARC, AXP (Alpha), MIPS, x86 (Intel compatible), PowerPC/G3/G4/G5, m68k, and many others...
* __Platform__: implementation of a computer architecture. Machine Language for platforms of an identical architecture are the same, but the system may be constructed in a different manner to make most functionality incompatible. This is also a bit fuzzy- as it is sometimes referred to regarding hardware and othertimes as software.
Examples of hardware platforms: Sun's SPARCstations, Silicon Graphic's Indigo2 workstations, m68k Macintosh vs. m68k Amiga.
And if twisted properly, software platforms: Microsoft Windows 9x, Microsoft Windows NT, Linux, OpenBSD, MacOS X, etc.
The issue of binary compatibility comes into play-- where an executable on a Sun SPARC system is **unable** to run natively on an Amiga, and vice versa. Same with trying to natively run an x86 Linux binary on an x86 Windows system. It simply does not know how to execute the code when it is out of its particular environment.
=====The Programming Model=====
The basic programming model of going from source code (what you write) to machine code (what the machine understands) will typically be as follows:
- Source Code - higher-level english-like statements you write
- Run the Compiler
* Compiling Phase Begins ...
- Lexical Analyzer - breaks your code into pieces, or "tokens".
- Syntax Analyzer - checks tokens for correctness.
- Intermediate Code Generator - convert from high-level into something lower
- Semantic Analyzer - checks for detectable logic problems
- Optimization - check for patterns that can be simplified
- Code Generatioa - create assembly code
* ... Compiling Phase Ends.
- Assembler - translates assembly code into system-specific Object Code
- Linker - links the Object Code with other Object Code from System Libraries
* Finally: Machine Executable Code
For most intents and purposes, the source code is portable among architectures. If the language is high enough level, it will abstract away and not depend on specific details of the underlying hardware.
For this lab we will become familar with the UNIX Programming Environment. The system is very diverse, and as such is host to a number of different programming languages and tools. Perhaps some of the common languages you will encounter on a UNIX system are: C, C++, Assembly, and Java. For the purposes of this lab, we will focus almost exclusively on C, with some detail paid to C++ and Assembly Language. Sample programs will be created, and the development tools such as the **make** utility will be explored.
=====The GNU C compiler=====
To compile a single source file, you would do the following:
lab46:~$ gcc -o binary source.c
Where **binary** is the name of the desired executable, and **source.c** is the name of the text file containing the program source.
The **-o** argument to gcc indicates the name of the output file. Since there is only one file in this case, the compiler automatically performs the assembling, and linking steps for us.
=====The GNU C++ compiler=====
To compile a single source file using C++, you would do the following:
lab46:~$ g++ -o binary source.cc
Where **binary** is the name of the desired executable, and **source.cc** is the name of the text file containing the program source.
**g++** is basically a wrapper to the main GNU compiler, so options like **-o** are identical between the C and C++ compilers.
=====The GNU assembler=====
Unlike higher-level source code like C or C++, assembly language corresponds to the low-level implementation of a particular machine. Because of this, assembly language (like machine language), is machine dependent. This means that an assembly language program written for an x86 machine will not work on an Alpha machine. Not only will the binaries be incompatible (as described above), but the source code will not be compatible.
Assembly Language affords you that extra level of flexibility and control over your programs that may be required in extreme circumstances. In any event, being familiar with assembly language will only help to improve your knowledge of the computer, and to write better high-level language programs.
To assemble a single assembly file, you would do the following:
lab46:~$ as -o object.o assembly.s
Where **object.o** is the name of the desired object file, and **assembly.s** is the name of the source text file containing the assembly code.
This will actually leave your code in an intermediate step-- in order to be able to run it on a system, you will need to use the system's linker to load symbols from the appropriate system libraries. To do that you would do at least the following:
lab46:~$ ld -o binary object.o
======Procedure======
Create a new directory in your home directory called **devel/** and do the work for the next several steps in this directory.
In the **devel/** subdirectory of the UNIX Public Directory, you will find three files: **helloC.c**, **helloCPP.cc**, and **helloASM.S** (or helloASM2.asm). Copy these to your local directory to work on in this lab.
=====Compile=====
^ 1. ^|Do the following:|
| ^ a.|Compile each of the source and assembly files into binary executables.|
|:::^ b.|Show me how you accomplished this.|
|:::^ c.|Verify that they all indeed work.|
Make sure you compile each file to a uniquely named executable, so you can compare the three of them.
NOTE: If you encounter any warnings or errors during the compile, chances are there is a typo in your source code. Please include the exact message you receive (and source code) if reporting problems on the class mailing list or IRC.
=====Execute=====
^ 2. ^|Do the following:|
| ^ a.|At the **lab46:~$** prompt, type: **helloC**|
|:::^ b.|Do you get an error? If so, what is it?|
|:::^ c.|At the **lab46:~$** prompt, type: **./helloC**|
|:::^ d.|Does this work? Explain why this makes a difference, compared to the prior entry?|
=====Putting it together, piece by piece=====
Now that you've compiled and ran the program, let's see what we take for granted-- each individual step.
We'll be picking apart the procedure for compiling and linking **hello.c**, but similar methods are employed when dealing with C++ and other programming languages.
^ 3. ^|Focus on **helloC.c**; Do the following:|
| ^ a.|What type of file does **helloC.c** appear to be?|
|:::^ b.|What does **file**(**1**) say about it?|
|:::^ c.|How about the other source files?|
So, let's start taking a tour of all these various steps involved in compilation. First up, we need to convert the source to assembly.
^ 4. ^|Do the following:|
| ^ a.|Do a directory listing to take a quick inventory of files present.|
|:::^ b.|At the **lab46:~/devel$** prompt type **gcc -S helloC.c**|
|:::^ c.|Do a directory listing. Anything new?|
|:::^ d.|What type of file is it? (See what **file**(**1**) says about it)|
|:::^ e.|At the **lab46:~/devel$** prompt use **cat**(**1**) to display the contents of the new file.|
|:::^ f.|What is it that you are looking at? How does it relate to **helloC.c**?|
So, now that we've got it converted to assembly, let's assemble it.
^ 5. ^|Do the following:|
| ^ a.|Do a directory listing to take a quick inventory of files present.|
|:::^ b.|At the **lab46:~/devel$** prompt type **as -o hello.o helloC.s**|
|:::^ c.|Do a directory listing. Anything new?|
|:::^ d.|What type of file is it? (See what **file**(**1**) says about it)|
|:::^ e.|What is it that you are looking at? How does it relate to **helloC.s**?|
So, almost there.. we've got the program in object form.. just need to link it with system libraries. Usually we'd just feed it to "ld", the linker, but since this the code makes use of the C library, we've got to include a couple libraries. To save you lots of typing, I've thrown together a little script that has all this information typed out for you.
To complete this next task, please make sure you copy the **link.sh** script from the **devel/** subdirectory of the UNIX Public Directory into your **devel/** directory.
^ 6. ^|Do the following:|
| ^ a.|Copy the "**link.sh**" script from the **devel/** subdirectory of the UNIX Public Directory.|
|:::^ b.|View this script.|
|:::^ c.|See that long "**ld**" line? That's what the compiler is doing for you.|
|:::^ d.|Go ahead and run this script, following the instructions in it, to link together your final executable.|
Now, you should hopefully have an executable.
^ 7. ^|Do the following:|
| ^ a.|At the "lab46:~/devel$" prompt type: **file hello**|
|:::^ b.|What type of file is it?|
|:::^ c.|At the "lab46:~/devel$" prompt type: **file helloC**|
|:::^ d.|Does the output match that of the previous?|
|:::^ e.|Go ahead and execute your new binary. Does it run? Show me what you typed and what happens.|
=====Makefiles=====
A very popular tool used in program development is make. This tool comes in very handy when dealing with multiple source files that need compiling (and determining whether or not you need to recompile a particular object file).
It works by allowing the programmer to set up dependencies between the source and object files with a series of rules pertinent for the particular project. These rules are often places in a file called Makefile
Every account on Lab46 is equipped with a customized Makefile in the src/ subdirectory to the home directory.
^ 8. ^|Do the following:|
| ^ a.|Copy **helloC.c** into your **src/** subdirectory. How did you do it?|
|:::^ b.|Do a directory listing. Do you see at least **Makefile** and **helloC.c**?|
|:::^ c.|View the contents of **Makefile**|
|:::^ d.|Compile **helloC.c** with the **Makefile**. How did you do this?|
^ 9. ^|Do the following:|
| ^ a.|In **/var/public/unix/devel** there is a subdirectory called "**multifile/**". Copy this directory (and its contents) to your own **devel/** in your home directory. How did you do this?|
|:::^ b.|View the various files in this directory, try and trace the flow of logic between them.|
|:::^ c.|Read through the **Makefile**, and determine how to build this code.|
|:::^ d.|How did you do this?|
=====Code Efficiency: comparing file sizes=====
And interesting benchmark that can be conducted is to create programs that perform identical operations, and to compare the resulting file sizes and execution times of the executables.
^ 10. ^|Looking back on the original **helloC**, **helloCPP**, and **helloASM** binaries, do the following:|
| ^ a.|What size are each of the executables?|
|:::^ b.|What observations can you make regarding differences in file size or execution speed?|
|:::^ c.|View/compile **helloJAVA.java** and run the result (**java helloJAVA**). What is its size?|
|:::^ d.|With Java being a higher level language (as C++ is, when compared to C and assembly), what do you think about the resulting compiled file? Is there perhaps more here than meets the eye?|
=====Conclusions=====
This assignment has activities which you should tend to- document/summarize knowledge learned on your Opus.
As always, the class mailing list and class IRC channel are available for assistance, but not answers.