Wednesday, January 29, 2003

Structures (K&R Chapter 6)

A structure is a collection of heterogeneous values of arbitrary type which form a single logical unit in the context of a program. Each value within the structure is called a field and can be accessed by specifying the structure's variable name, followed by a dot followed by the field name within the structure.

Like arrays, structures can be initialized by enclosing the field initializers, (specified in the same order as they were in the structure declaration) within braces. Once an array has been initialized, the only way to change its values is by assigning to its individual fields; the entire structure cannot be reassigned to as it was during initialization (this is similar to arrays). Fields which are not initialized by the initializer are set to zero. If no initializer at all is specified for a structure, then all of its fields are undefined initially (again, like arrays).

Structures may even be nested and its nested fields can be accessed by chaining the dot operators as demonstrated by the following program, which nests a pair structure inside a floor structure:


#include	<stdio.h>

#define	MAX_ROW	10
#define	MAX_COL	10

struct pair {
	int row, col;
};

struct floor {
	char		plan[MAX_ROW][MAX_COL];
	struct pair	position;
};

int
main()
{
	struct pair	pair_var = { 3, 2 };	/* Stand alone structure */
	struct floor	floor_var = {
			/* Initialize the two-dimensional array 'plan' */
				{ ".......",
				  ".*.....",
				  "....*.*",
				  ".*...a*"  },
			/* Initialize 'position' */
				{3, 5} 		/* Nested sub-structure */
			};
	int i;

	printf("pair_var's (row,col) is (%d,%d)\n",
		pair_var.row, pair_var.col);;

	puts("floor_var:");
	for (i = 0; i < sizeof(floor_var.plan)/sizeof(floor_var.plan[0]); i++) {
		if (floor_var.plan[i][0] != '\0')
			puts(floor_var.plan[i]);
	}
	printf("floor_var's 'position' field is (%d,%d)\n",
		floor_var.position.row, floor_var.position.col);
	
	return 0;
}

Typedefs (K&R § 6.7)

In the declaration of the two structures:

struct pair {
        int row, col;
};

struct floor {
        char            plan[MAX_ROW][MAX_COL];
        struct pair     position;
};

pair and floor are sometimes referred to as structure tag names. Note that whenever we define a structure variable, we use the keyword struct followed by the structure's tag name to denote the variable's type. Obviously, having to use struct each time we wish to define a structure variable is tedious. A very common idiom in C is to declare structures using a typedef, which creates a synonym for the specified type:

typedef struct pair {
        int row, col;
} t_pair;

typedef struct floor {
        char       plan[MAX_ROW][MAX_COL];
        t_pair     position;
} t_floor;

Now, whenever we want to define a struct floor, we can simply say t_floor floor_var instead of the more cumbersome struct floor floor_var. Note that the t_ prefix is merely a convention and is not required by C. Its purpose is to provide a way to distinguish between a structure types and structure variables. If we consistently declare all our structures with a leading t_, then we can define structure variables using the same name as the structure typedef, but without the t_ prefix. (e.g. t_floor floor).

Using typedef, we can even declare the structures without the tag name at all:

typedef struct {
        int row, col;
} t_pair;

typedef struct {
        char       plan[MAX_ROW][MAX_COL];
        t_pair     position;
} t_floor;

When we do this, we can still create structure variables using t_floor floor_var, but we can no longer use the original syntax struct floor floor_var because the structure tag no longer exists. Defining structures without a tag name is quite common.

typedef's, while commonly used for structure declaration, can also be used to create simple abstract data types. For example, MUN student IDs can be represented by an array of nine digits:

char student_id[10];	/* Don't forget the nul byte! */

Instead of having to type this everytime we wish to create a variable to hold a student id, we can use a typedef to create a new, more abstract type representing a student ID as follows:

typedef char t_student_id[10];

Now, whenever we wish to create a variable that can store a student id, we can simply use our new type t_student_id:

t_student_id student_id;

Linked lists

We can define a linked list in C by using a self-referential structure. Consider the following example:


#include	<stdio.h>
#include	<stdlib.h>
#include	<string.h>

#define BUFFER_LEN	20
typedef char t_name[BUFFER_LEN];

typedef struct {
	t_name	name;
	int	num;
} t_course;

typedef struct node {
	t_course	 course;
	struct node	*next;
} t_node;

int
main()
{
	t_course course[] = {
		{ "COMP3710",	25 },
		{ "COMP3724",	20 },
		{ "COMP4718",	15 }
	};
	int	i   = 0;
	t_node *list = NULL;	/* Pointer to the front of the list */
	t_node *it = NULL;	/* Iterator */

	/* Fill up the list */
	for (i = 0; i < sizeof(course)/sizeof(course[0]); ++i) {
		t_node *cur = (t_node *) malloc(sizeof(t_node));

		strcpy(cur->course.name, course[i].name);
		cur->course.num = course[i].num;

		cur->next = list;	/* Insert node at front of list */
		list = cur;		/* Update the front of the list */
	}

	/* Display the list */
	for (it = list; it != NULL; it = it->next)
		printf("%s: %d\n", it->course.name, it->course.num);

	/* Free up the list */
	while (list) {
		t_node	*del = list;
		list = list->next;
		free(del);
	}

	return 0;
}

The above program copies a collection of course names and their enrollments to a linked list, traverses the linked list to print out the courses and the frees up the linked list (this is a very round about way of printing out this list, but the code is meant for demonstration purposes only).

Self-Referential structures (K&R § 6.5)

We define the following t_node structure which contains a structure containing the data we wish to store in the linked list (e.g. a t_course structure) and a pointer to another node structure called next. Note that because we refer to the node structure inside the node structure itself, we have to declare the structure with a tag name (node). Also note that we are not actually creating a nested structure inside itself, as that would be impossible. Instead the struct node contains a pointer to another struct node.

typedef struct node {
        t_course         course;
        struct node     *next;
} t_node;

An array of structures (K&R § 6.3)

We can define an array of structures in much the same way as we define an array of any other type. The contents of the entire array is enclosed by braces and the values of the fields of each structure element is similarly enclosed in braces in the initializer list:

t_course course[] = {
	{ "COMP3710",   25 },
	{ "COMP3724",   20 },
	{ "COMP4718",   15 }
};

Unlike initializing a two-dimensional array, the braces surrounding each structure element are required.

The `->` operator (K&R § 6.2)

The program then proceeds to iterate over all the elements in the array. On each iteration, it dynamically allocates a new node for the linked list and copies all the relevant data from the array element to the node. Note that cur represents a pointer to the newly created node. Before we can access its fields, we must dereference cur first. Then we can use the dot notation to access a field. However, because the dot operator has higher precedence that the dereference operator, we cannot refer to the course field name of cur by writing:

*cur.course

because this would try to apply the dot operator to cur and cur is not a structure (it's a pointer to a structure). Instead, we must use parenthesis to force the dereferencing to take place first:

(*cur).course

Syntactically, this is very clumsy to type each time you want to refer to a field of a structure when you have a pointer to the structure available. To make things cleaner, C provides the -> operator which has the same affect as the code above. Therefore, we can get access to the course field of cur by writing:

cur->course

Linking up the linked list

The following two lines add the newly created node to the head of the list. Note that the first node added to the list will have a next pointer of NULL (which is the initial value of list). We use this NULL pointer as a sentinel value when traversing the list.

cur->next = list;       /* Insert node at front of list */
list = cur;             /* Update the front of the list */

By hooking up the list in the manner shown above, we, in effect, build up the list backwards (i.e. the first element in array of courses will be the last element in the linked list).

Traversing and deleting the linked list

The for loop simply iterates over each element of the newly created list by using the t_node pointer it. On each iteration, it displays the course and the current enrollment. This loop stops when we hit the end of the list (i.e. when it equals NULL).

Finally, we traverse over the list a final time using a while loop, deallocating each node as we go. Because the list is being destroyed, there is no need to use the it pointer; we can simply use our original list pointer instead.

Note that we save a temporary pointer to the current linked list node in the del pointer, update the list iterator, then free the node pointed to by del. Attempting to free the list without the temporary node pointer would yield unpredictable results (K&R § 7.8.5):

/* How NOT to free up a linked list */
while (list) {
	free(list);
	list = list->next;
}

The above code fragment attempts to use the pointer list after the memory it pointed to was freed. This is a serious, (but unfortunately, common) mistake in code written by beginners.

Pointers to structures can be used to represent many other data structures as well including trees and hash tables.

Passing structures to functions (K&R § 6.2)

When we pass a structure, all the elements in the structure are copied and assigned to the function's formal parameter. (This includes arrays.) Any changes made to the structure's fields are not reflected in the context of the calling function. Passing structures this way can be quite expensive. It is more common instead to pass the structure's address to the function instead of the entire structure itself. If you do not wish for the function to be able to modify the structure, you can declare the structure parameter as const.

As a demonstration of passing structure pointers to functions, consider the following modularized version of the previous program:


#include	<stdio.h>
#include	<stdlib.h>
#include	<string.h>

#define BUFFER_LEN	20
typedef char t_name[BUFFER_LEN];

typedef struct {
	t_name	name;
	int	num;
} t_course;

typedef struct node {
	t_course	 course;
	struct node	*next;
} t_node;

void store_list(t_node **);
void display_list(const t_node *);
void free_list(t_node *);

int
main()
{
	t_node *list = NULL;	/* Pointer to the front of the list */

	store_list(&list);
	display_list(list);
	free_list(list);
	return 0;
}

void store_list(t_node **list)
{
	t_course course[] = {
		{ "COMP3710",	25 },
		{ "COMP3724",	20 },
		{ "COMP4718",	15 }
	};
	int	i   = 0;

	/* Fill up the list */
	for (i = 0; i < sizeof(course)/sizeof(course[0]); ++i) {
		t_node *cur = (t_node *) malloc(sizeof(t_node));

		strcpy(cur->course.name, course[i].name);
		cur->course.num = course[i].num;

		cur->next = *list;	/* Insert node at front of list */
		*list = cur;		/* Update the front of the list */
	}
}

/* Display the list */
void display_list(const t_node *list)
{
	for (; list != NULL; list = list->next)
		printf("%s: %d\n", list->course.name, list->course.num);
}

/* Free up the list */
void free_list(t_node *list)
{
	while (list) {
		t_node	*del = list;
		list = list->next;
		free(del);
	}

}

There are some notable differences between the two versions of the programs.

The main() function is considerably simpler due to the increased modularity.
Because list is updated by store_list() and because we want this change to be reflected in the main() function, we must pass the address of list to store_list() and dereference list in this function, as necessary.
display_list() uses the list parameter directly when traversing the linked list (i.e. there is no it iterator variable). Because list is passed by value, a copy of the address is passed to display_list(). Changes to the list parameter that take place inside the function are not reflected in the list parameter of the main() function. Because the list parameter is already initialized correctly when it is received by display_list(), the initialization part of the for loop is empty.
Also note that the list parameter is passed in as a const t_node *. This means that display_list() is not permitted to change any of the fields of the structure pointed to by list. The function may change the list parameter itself (list = list->next).

Last modified: Wed Jan 29 15:22:28 2003