Monday, February 03, 2003

Multifile compilation (K&R §4.5)

So far, all the programs we have studied have been contained in a single source file. For nontrivial programs, it is helpful to break up the program into several source files. Each file (sometimes called a module or a translation unit) would typically contain a collection of logically related functions, some of which would be callable from other modules.

For example, consider the source code for Assignment #3. The source code is distributed over several different source files:

Filename	Description
`main.c`	contains the `main()` function which calls functions present in other files. It is not uncommon for programs to have a `main.c` file module that contains the `main()` function.
`create.c`	contains all the functions that are responsible for reading the floor plan from the file and storing it in memory. This module also analyzes the floor plan and makes sure that it is correctly defined,
`destroy.c`	because the functions in `create.c` dynamically allocate memory, we must remember to deallocate this memory. This module contains a function to do this.
`solve.c`	contains a function that attempts to find a trail that solves the maze.
`show.c`	contains a function that displays the floor plan and the current trail under investigation.

(The partitioning of source files in this particular example is a little overly aggressive, as some of the files define only one function. In more practical cases, file modules may contain tens of functions.)

Makefiles and the `make` program

The assignment uses a Makefile to actually compile the program. The Makefile contains the names of the source files and rules on how the program is to be built. The program make uses the Makefile to determine what source files need to be recompiled. If only a few source files have changed, then, generally speaking, only those will have to be recompiled.

For each compiled file, the compiler generates an object file, which has the same name as the original source file, but has a .o or .obj extension. When all the files have been compiled to object files, another program called the linker puts all the object files together to generate the final executable. The make program uses a series of explicit and implicit rules to determine how to compile and build the final executable.

A more detailed description of make is beyond the scope of these notes, but if you are curious, you can read a tutorial on the web.

Most current Integrated Development Environments (IDE) provide automated program building through the use of project files. These project files are certainly easier to create and modify in the context of an IDE. Unfortunately, these project files tend to be large, proprietary, binary files which are nearly impossible to use outside of an IDE, making portability difficult (even on the same operating system/hardware architecture). These project files are also notoriously in a constant state of flux making the transition from one version of a vendor's IDE to the next problematic. Curiously, an IDE may allow one to generate a Makefile from a project file.

Header files

As we learned earlier, before a function is called, it should be declared, either with a prototype or by having the function definition occur in the file before its invocation. This is generally pretty easy to do when we are dealing with a single file. However, in the context of a program which has multiple files, things become a bit trickier. For example, in the assignment, consider the function show_maze(). This function is defined in the file show.c, but is called by both the main() function (defined in main.c) and by solve_maze() (defined in solve.c). How (and where) do we declare this function so that these two other functions can see the show_maze()'s prototype? One option would be to declare the function in each file that uses it, Unfortunately, this would be tedious and error prone. If we later changed the function's interface (i.e its return type and/or arguments), then we would have to change all occurrences of the function prototype in all the files that declared it.

Instead, what we do is declare the function once in a header file and #include that header file in all files that call this function. In our example, we declare show_maze() in maze.h and #include "maze.h" in all the relevant files. Note that the filename being included is enclosed with double-quotes rather than angle brackets. This tells the preprocessor to search in the current directory for the file rather than search in the "standard" place (typically /usr/include on Unix systems). We include maze.h in show.c too because this forces us to keep the function definition and declaration consistent. If we change show_maze()'s interface and try to recompile it without changing the prototype in the header file, the compiler will generate an error.

This strategy is used for all functions that are defined in one file but called from another file, We collect all these function declarations and store them in maze.h, which is then included by all the files. The declarations in this file are listed below:

extern char *read_maze(t_room *room, const char *filename);
extern void  solve_maze(t_room *room, t_pair loc);
extern void  show_maze(const t_room *room);
extern void  free_maze(const t_room *room);

Note that the function bodies for the above functions require knowledge of the t_floor and t_pair structure types, therefore, we declare them in maze.h as well. (maze.h also contains a couple of macro #defines as well.)

Because all the files in the program rely on maze.h, whenever this header file changes, we must recompile all the source files before a new executable can be built. (Incidentally, a Makefile provides a way to specify the reliance of one file on another through dependencies.)

Linkage

The keyword extern signifies the linkage of the function. By default, functions have external linkage anyway, so declaring a function extern is redundant in most cases, but is still helpful from a consistency point of view. Notice that all functions that are not declared in the header file are defined as static functions in the respective source files (e.g. add_horizontal_wall(), scan_row() and store_maze() in create.c). By declaring a function to be static, we are saying that it has internal linkage (i.e. no other function outside this source file may call this function). Using static functions helps to make the source files more encapsulated. You can think of the static functions as being "helper" functions for the other (non-static) functions in the source file.

Another thing that some files have in common is the global variable marker that keeps track of the current letter of the alphabet when forming the trail. This variable is used by both create.c and solve.c. As a result, we declare this variable in maze.h with the declaration:

extern char  marker;

and define it in create.c with the definition:

char  marker = 'a';

Because all source files include maze.h, all of them can use the marker variable. The fact that only two of the source files actually use this variable while the rest do not is not an error. It's okay for a source file to include a header file that contains a variable/structure/function declarations that are not used by the souce file.

Definition vs. Declaration (K&R §1.10)

Notice the important distinction between declaring a variable and defining it. The declaration merely tell the compiler of the type of the variable whereas the definition actually allocates space for it (and initializes it, if an initializer was specified). Generally speaking, statements of the form extern type varname are declarations, whereas statements of the form type varname are definitions. A global variable should never be defined in more than one place (e.g. do not define variables in header files), but they may be declared in several places, therefore making it okay to declare a variable in a header file and including that header file in multiple source files. The types of a variable's definition and it's declarations must be consistent. For example, if you define a variable as an array in a source file, then do not declare it as a pointer in a header file.

One more caveat: do not confuse macro definition (e.g #define) with variable definition. The two are completely different.

`static` global variables (K&R §4.6)

If we wanted to define a global variable in a file but restrict its scope so that it is accessible only by functions defined in that source file, we can define the variable outside all functions (typically, near the top of the file) and define it as static. For example, if we have:

static int  filevar;

near the top of a souce file before any function definitions (i.e. a global variable), then all functions in that source can access that variable, but functions in other source files cannot. Other source files can define their own variables named filevar, but they would be different from the first one. Variables created in this way are sometimes referred to as file-scope variables.

Note that a global variable that is defined to be static is very different from a local variable that is defined to be static. In the first case, the static keyword denotes the (internal) linkage of the variable whereas in the second case, static denotes the storage class of the variable. The concept of storage class can be best described in the context of the runtime stack, which we will discuss in the next lecture.

Last modified: Sun Feb 2 20:39:01 2003