Monday, January 13, 2003

Arrays

An array is a collection of data of uniform type that is accessible via an integer index.

For example, we can define an array of integers and iterate over them as follows:


#include	<stdio.h> /* stdio.h contains the declaration of printf() */

#define  MAX_NUM  5

int
main()
{
	int	primes[MAX_NUM];
	int	i;

	primes[0] = 2;
	primes[1] = 3;
	primes[2] = 5;
	primes[3] = 7;
	primes[4] = 11;

	/* "for" loops discussed in K&R sections 1.3, 3.5 */
	for (i = 0; i < MAX_NUM; i++) {
		printf("primes[%d] is %d\n", i, primes[i]);
	}
	printf("MAX_NUM is %d\n", MAX_NUM);

	return 0;
}

Note: In C, all variables must be declared at the beginning of a block (K&R § 4.8). So we define the loop index variable i at the same time we define the array.

Macro Substitution (K&R § 1.4, 4.11)

The #define MAX_NUM 5 near the top of the program is another preprocessor directive (like #include). This directive instructs the preprocessor to substitute all occurrences of the text MAX_NUM with the text 5. This is called macro substitution.

Therefore, in the definition of the array, the compiler will actually see int primes[5] and the loop boundary will be translated to for (i = 0; i < 5; i++)

Note that your original source file remains unchanged -- the preprocessor makes the substitutions to a temporary copy of your source.

Using a macro in this program makes it easier to change the maximum number of elements in the array -- we only have to change one number instead of three.

Macro substitution does not take place inside string literals. Therefore, in the line printf("MAX_NUM is %d\n", MAX_NUM), the first MAX_NUM will not be replaced, but the second one will be.

Macros are commonly in all upper case to remind the programmer that the text is actually a macro.

One common beginner mistake is to define macros as if they were assignments statements (e.g. #define MAX_NUM = 5;). This will cause the text MAX_NUM in the source code to be replaced with the text = 5; which will result in syntax errors in the code.

Array definition (K&R § 1.6)

We specify the type and the number of elements in the array in the declaration of the array: int primes[MAX_NUM].

If we define an array to have n elements, then the elements can be accessed using indices 0 to n - 1 (inclusive). Accessing the array element n is invalid. Therefore, in the above example, using primes[5] would yield undefined results. This is very important and is a common source of programming errors.

Format Printing (K&R § 1.1, 7.2)

The line:

printf("primes[%d] is %d\n", i, primes[i])

uses the printf() function which provides formatted output functionality. The function takes an variable number of arguments. The first argument is a format string. It is printed verbatim, character by character, until a conversion specifier (e.g. "%d") is encountered. For each conversion specifier encountered, the function takes the next argument from the argument list and displays the requested representation for it. %d tells printf() that the next argument should be displayed as a decimal integer.

So when i equals 2, the output will be: primes[2] is 5\n

There are numerous conversion specifiers; for example %c can be used to print characters, %s can be used to display character strings and %f can be used for displaying floating point numbers. Various width and precision fields may also be specified.

If the conversion specifiers in the format string do not match the remaining arguments (in either number or type), then bad things can happen (e.g. bad or missing output).

An equivalent way of expressing the above output in Java would be:

System.out.println("primes[" + i + "] is " + primes[i])

Using printf() takes a little getting used to. Personally, I find printf() to be easier on the eyes (and fingers).

Array Initialization (K&R § 4.9)

The above method for creating array, while correct, is cumbersome. Instead, when we define an array we can use the following shortcut to initialize the elements of the array.


#include	<stdio.h>

int
main()
{
	int	primes[] = { 2, 3, 5, 7, 11 };
	int	num = sizeof(primes) / sizeof(primes[0]);
	int	i;

	for (i = 0; i < num; i++) {
		printf("%d squared is %d\n", primes[i], primes[i] * primes[i]);
	}

	return 0;
}

We don't have to specify a size for the array -- the compiler will determine the size based upon the number of elements in the list and copy all the elements in the list into the array. If you do specify a dimension for the size of the array and that number is too small to hold all the elements on the list, a warning will result during compilation. If the dimension specified is larger than the number of elements in the list, then the unfilled positions in the array will be set to zero.

We can define the array elements this way only when we create the array initially. We cannot, for example, set the entire array to a list of new numbers after it has already been initialized. However, we can redefine individual elements (e.g. primes[4] = 13) provided, of course that we do not try to set an element outside the bounds of the array.

The third argument to the printf() function is an arithmetic expression. This expression will be evaluated and the resultant value will be passed to the printf() function for output.

sizeof is an operator (K&R § 6.3) which returns the number of bytes required to store its operand. For example, if integers require 4 bytes each, then sizeof(primes) equals 20 because primes is an array consisting of five integers. We can use this fact to determine how many elements an array has by dividing the total size of the array by the size of one element (typically the first). In the above code, num will be set to 5, as expected. The sizeof operator can also be used on types as well as variables.

Multidimensional Arrays (K&R § 5.7)

We can also define `an array of arrays,' thereby creating multidimensional arrays, as follows:


#include	<stdio.h>

#define MAX_ROW 3
#define MAX_COL 4

int
main()
{
	int	array[MAX_ROW][MAX_COL] = {	{   1,   2,   3,   4 },
						{  30,  40,  50,  60 },
						{ 500, 600, 700, 800 }  };

	int	i, j;

	for (i = 0; i < MAX_ROW; i++) {
		int	product = 1;

		for (j = 0; j < MAX_COL; j++) {
			product *= array[i][j];
			printf("%3d %c ", array[i][j],
					(j == MAX_COL - 1) ? '=' : '*');
		}
		printf("%d\n", product);
	}

	return 0;
}

The above array is defined with two dimensions. The first dimension can be omitted, but the second (and any subsequent dimensions) are required for computation of element offsets. The array could be defined as int array[][MAX_COL], and the compiler will determine the number of rows using MAX_COL and the number of elements defined in the nested list of numbers.

To traverse the multidimensional array, we now use two loops, one nested inside the other. At the top of the block of the outer loop (which iterates over the rows), we define an integer called product and initialize it to 1. We then start the second loop which accumulates the product of all the elements in the row and outputs each element in the row.

In the printf() format string, we will display a decimal integer inside a field width of three (%3d). This serves to keep the numbers aligned in columns when we display them.

The format specify also prints a character, as denoted by the %c conversion specifier. The character that we display depends upon the evaluation of the expression (j == MAX_COL - 1) ? '=' : '*'. This expression uses the ternary ? : operator (K&R § 2.11) which evaluates the boolean expression to the left of the question mark. If this expression is true, then the result of the expression between the ? and : is returned, otherwise, the result of the expression after the : is returned. So in this case, if j == MAX_COL - 1 (that is, we are displaying the last element in the column), then we'll output the '=' character, otherwise we'll output '*'. Note that characters are delimited by single quotes.

The output of the program is:

  1 *   2 *   3 *   4 = 24
 30 *  40 *  50 *  60 = 3600000
500 * 600 * 700 * 800 = 496275456

To make the spaces visible, here is the above output with the spaces replaced with dots. Notice that each of the numbers on the left hand side of the = are all displayed with a field width of three characters and that the * and = symbols have a space on either side as requested by the printf() format string.

..1.*...2.*...3.*...4.=.24
.30.*..40.*..50.*..60.=.3600000
500.*.600.*.700.*.800.=.496275456

Note that the last product is incorrect. This is because the actual product 168,000,000,000 was too large to fit inside an integer, which, for 32 bit signed integers, is too large. (A signed 32-bit integer can only hold values upto 2,147,483,647). As a result, the product "wrapped around" producing an incorrect result.

If we had used %d inside the printf() format string instead of %3d, then the output would have been:

1 * 2 * 3 * 4 = 24
30 * 40 * 50 * 60 = 3600000
500 * 600 * 700 * 800 = 496275456

(Again, the last product is still incorrect.)

Last modified: Thu Jan 16 16:19:32 2003