January 28 (Wednesday) February 02 (Monday)

January 30, 2004 (Friday)

Multifile compilation (K&R §4.5)

So far, all the programs we have studied have been contained in a single source file. For nontrivial programs, it is helpful to break up the program into several source files. Each file (sometimes called a module or a translation unit) would typically contain a collection of logically related functions, some of which would be callable from other modules.

For example, consider the source code for Assignment #3. The source code is distributed over several different source files:

Filename	Description	Functions/Linkage
`main.c`	contains the `main()` function which calls functions present in other files. It is not uncommon for programs to have a `main.c` file module that contains the `main()` function.	`main()/extern`
`imgdims.c`	contains all the functions that are responsible for calculating the dimensions of various image types.	`bytes_to_num()/static` `gifpng_file_dims()/extern` `jpg_file_dims()/extern`
`imgtype.c`	contains functions related to identifying various images types (PNG, GIF89a, JPG).	`get_img_type()/extern`
`process.c`	contains functions that open files, stores relevant image information in a linked list of structures, sorts images by dimension and frees up the linked list. This module contains functions that are called by `main()` and calls functions present in the other non-`main.c` modules.	`get_basename()/static` `free_images()/extern` `sort_by_pixels()/extern` `process_file()/extern`

(The partitioning of source files in this particular example is a little overly aggressive, as some of the files (main.c and imgtype.c) define only one function. In more practical cases, file modules may contain tens of functions.)

Makefiles and the `make` program

The assignment uses a Makefile to actually compile the program. The Makefile contains the names of the source files and rules on how the program is to be built. The program make uses the Makefile to determine what source files need to be recompiled. If only a few source files have changed, then, generally speaking, only those will have to be recompiled.

For each compiled file, the compiler generates an object file, which has the same name as the original source file, but has a .o or .obj extension. When all the files have been compiled to object files, another program called the linker puts all the object files together to generate the final executable. The make program uses a series of explicit and implicit rules to determine how to compile and build the final executable.

A more detailed description of make is beyond the scope of these notes, but if you are curious, you can read a tutorial on the web.

Most current Integrated Development Environments (IDE) provide automated program building through the use of project files. These project files are certainly easier to create and modify in the context of an IDE. Unfortunately, these project files tend to be large, proprietary, binary files which are nearly impossible to use outside of an IDE, making portability difficult (even on the same operating system/hardware architecture). These project files are also notoriously in a constant state of flux, thereby making the transition from one version of a vendor's IDE to the next problematic. Curiously, an IDE may allow one to generate a Makefile from a project file.

Header files

As we learned earlier, before a function is called, it should be declared, either with a prototype or by having the function definition occur in the file before its invocation. This is generally pretty easy to do when we are dealing with a single file. However, in the context of a program which has multiple files, things become a bit trickier. For example, in the assignment, consider the function process_file(). This function is defined in the file process.c, but is called by the main() function (defined in main.c). How (and where) do we declare this function so that the main() function (and possibly other functions) can see process_file()'s prototype? One option would be to declare the function in each file that uses it. Unfortunately, this would be tedious and error prone. If we later changed the function's interface (i.e. its return type and/or arguments), then we would have to change all occurrences of the function prototype in all the files that declared it.

Instead, what we do is declare the function prototype once in a header file and #include that header file in all files that call this function. In our example, we declare process_file() in img.h and #include "img.h" in all the modules that call process_files(). Note that the filename being included is enclosed with double-quotes rather than angle brackets. This tells the preprocessor to search in the current directory for the file rather than search in the "standard" place (typically /usr/include on Unix systems). We include img.h in process.c too because this forces us to keep the function definition and declaration consistent. If we change process_file()'s interface and try to recompile it without changing the prototype in the header file, the compiler will generate an error telling us that the prototype declaration and the function defintion are inconsistent.

This strategy is used for all functions that are defined in one file but called from another file. We collect all these function declarations and store them in img.h, which is then included by all the files. Some of the function declarations in this header file are listed below:

/* Defined in imgtype.c */
extern const t_img_info	*get_img_type(FILE *);

/* Defined in process.c */
extern void	 process_file(const char *, t_image_node **);
extern void	 sort_by_pixels(t_image_node *);
extern void	 free_images(t_image_node *);

Note that these function declarations (and their corresponding definitions in the corresponding .c files) require knowledge of the t_image_node and t_img_info structure types, therefore, we declare them in img.h as well. (img.h also contains a couple of macro #define's as well as several typedef's.)

Because all the files in the program rely on img.h, whenever this header file changes, we must recompile all the source files before a new executable can be built. (Incidentally, a Makefile provides a way to specify the reliance of one file on another through dependencies. The Makefile for Assignment #3 demostraties how to do this.)

Linkage

The keyword extern signifies the linkage of the function. By default, functions have external linkage anyway, so declaring a function extern is redundant in most cases, but is still helpful from a consistency point of view. Note that all functions that we did not declare in the img.h header file are defined as static functions in the respective source files (e.g. bytes_to_num() in imgdims.c and get_basename() in process.c). By declaring a function to be static, we are saying that it has internal linkage (i.e. no other function outside this source file may call this function). Using static functions helps to make the source files more encapsulated. You can think of the static functions as being "helper" (or private) functions for the other functions in the source file.

Another thing that some multiple files may have in common are global variables. Global variables can be declared as extern in a header file, e.g.:

extern char  marker;

and subsequently defined in one of the .c file:

char  marker = 'a';

Any file that wanted access to this variable would just have to include the appropriate header file in which the variable was declared. It's okay for a source file to include a header file that contains a variable/structure/function declarations that are not used by the source file.

Definition vs. Declaration (K&R §1.10)

Notice the important distinction between declaring a variable and defining it. The declaration merely tell the compiler of the type of the variable, whereas the definition actually allocates space for it (and initializes it, if an initializer was specified). Generally speaking, statements of the form extern type varname are declarations, whereas statements of the form type varname are definitions. A global variable should never be defined in more than one place (e.g. do not define variables in header files); otherwise, the linker may generate a multiple definition error. Variables may be declared in several places, therefore making it okay to declare a variable in a header file and including that header file in multiple source files. The types of a variable's definition and it's declarations must be consistent. For example, if you define a variable as an array in a source file, then do not declare it as a pointer in a header file. Remember that arrays and pointers are different types.

One more caveat: do not confuse macro definition (e.g #define) with variable definition. The two are completely different.

`static` global variables (K&R §4.6)

If we wanted to define a global variable in a file but restrict its scope so that it is accessible only by functions defined in that source file, we can define the variable outside all functions (typically, near the top of the file) and define it as static. For example, in imgdims.c, we define:

static const    t_img_data jpg_marker   = '\xff';

near the top of a souce file before any function definitions (i.e. a global variable). Now, all functions in that source can access that variable, but functions in other source files cannot. Other source files can define their own variables named jpg_marker, but they would be different from the first one. Variables created in this way are sometimes referred to as file-scope variables.

Note that a global variable that is defined to be static is very different from a local variable that is defined to be static. In the first case, the static keyword denotes the (internal) linkage of the variable whereas in the second case, static denotes the storage class of the variable. The concept of storage class was described in the context of the runtime stack during the previous lecture.

Last modified: January 29, 2004 23:50:27 NST (Thursday)