Main

January 12, 2004 (Monday)

Character based I/O (K&R § 1.5)

Simple character input/output can be done using getchar() and putchar(). getchar() will retrieve and return a single character (byte) from the standard input stream (e.g. the keyboard). If the end of file has been reached, then getchar() will return EOF, which is defined in stdio.h to be -1. Note that it is important that the variable used to store the character returned via getchar() be of type int and not char.

Characters can be displayed on standard output (e.g. the screen) using putchar(). putchar() takes a single parameter representing a character and shows the character on the display.

The program below reads characters from standard input and simply echos them to standard output:

/*
 * stdio.h contains the declaration for getchar() and putchar().
 * It also includes the definition of EOF.
 */
#include	<stdio.h>

int
main()
{
	int	ch;

	while ((ch = getchar()) != EOF)
		putchar(ch);

	return 0;
}
chario.c

When we compile and run this program (using gcc -o chario -Wall -ansi -pedantic chario.c), we can execute it as follows:

$ ./chario
This is a line of text.
This is a line of text.
This is another line of text.
This is another line of text.
^D
$

Notice that the input characters are not actually echoed to the display until you hit the Enter key. When doing input via the keyboard, the program will buffer the input until a newline is received. At that point the program will read each character on the line and display it on the screen. It will then wait for the next line. Also notice that you can generate an end-of-file notification on standard input by pressing Ctrl-D on Unix/Linux as shown in the above output.

We can also tie standard input to a file using the input redirection operator (<) on the command line. Doing so will cause the program to read its input from the file. The EOF notification will be returned from getchar() when the end of the file has been reached (do not put a ^D in the file). We can also tie standard output to another file using the output redirection operator (>). For example, we can redirect both standard input and standard output of our chario program as follows:

$ ./chario < chario.c > chario-copy.c
$ cat chario-copy.c
/*
 * stdio.h contains the declaration for getchar() and putchar().
 * It also includes the definition of EOF.
 */
#include	<stdio.h>

int
main()
{
	int	ch;

	while ((ch = getchar()) != EOF)
		putchar(ch);

	return 0;
}

When we execute the chario executable using its own source code as its input, a copy of the chario.c file is created and stored in chario-copy.c. Using redirection in this way, the chario program can generate a copy of any (readable) file on the file system.

Incidentally, cat (used above) is a simple Unix/Linux utility program which will display (on standard output) the contents of the file given on the command line.

Characters are actually represented as small integers in C. We can specify characters literals using single quotes around a character. We can also assign to a character variable the numeric ASCII code for the character in either decimal, hexadecimal or octal. For example, the following definitions are equivalent.

     ...
     char zero_a = '0';		/* character literal			*/
     char zero_b = 48;		/* decimal ASCII represenation		*/
     char zero_c = '\x30';	/* hexadecimal ASCII representation	*/
     char zero_d = '\060';	/* octal ASCII representation		*/
     ...
     putchar(zero_a);		/* All these output
     putchar(zero_b);		 * the character '0'
     putchar(zero_c);		 * to the display
     putchar(zero_d);		 */

Note that char zero = "0" is invalid. "0" in this case is a string (or more correctly a pointer to a character) and is incompatible with the type char.

We'll see examples of more sophisticated input/output functions in C in later lectures.

Arrays (K&R § 1.6)

An array is a collection of data of uniform type that is accessible via an integer index.

For example, we can define an array of integers and iterate over them as follows:

#include	<stdio.h> /* stdio.h contains the declaration of printf() */

#define  MAX_NUM  5

int
main()
{
	int	primes[MAX_NUM];
	int	i;

	primes[0] = 2;
	primes[1] = 3;
	primes[2] = 5;
	primes[3] = 7;
	primes[4] = 11;

	/* "for" loops discussed in K&R sections 1.3, 3.5 */
	for (i = 0; i < MAX_NUM; i++) {
		printf("primes[%d] is %d\n", i, primes[i]);
	}
	printf("MAX_NUM is %d\n", MAX_NUM);

	return 0;
}
array1.c

Note: In C, all variables must be declared at the beginning of a block (K&R § 4.8). So we define the loop index variable i at the same time we define the array.

Macro Substitution (K&R § 1.4, 4.11)

The #define MAX_NUM 5 near the top of the program is another preprocessor directive (like #include). This directive instructs the preprocessor to substitute all occurrences of the text MAX_NUM with the text 5. This is called macro substitution. Therefore, in the definition of the array, the compiler will actually see:

int primes[5]

and the loop boundary will be translated to:

for (i = 0; i < 5; i++)

Note that your original source file remains unchanged -- the preprocessor makes the substitutions to a temporary copy of your source.

Using a macro in this program makes it easier to change the maximum number of elements in the array -- we only have to change one number instead of three.

Macro substitution does not take place inside string literals. Therefore, in the line printf("MAX_NUM is %d\n", MAX_NUM), the first MAX_NUM will not be replaced, but the second one will be.

Macros are commonly in all upper case to remind the programmer that the text is actually a macro.

One common beginner mistake is to define macros as if they were assignments statements (e.g. #define MAX_NUM = 5;). This will cause the text MAX_NUM in the source code to be replaced with the text = 5; which will result in syntax errors in the code.

Array definition (K&R § 1.6)

We specify the type and the number of elements in the array in the declaration of the array: int primes[MAX_NUM].

If we define an array to have n elements, then the elements can be accessed using indices 0 to n - 1 (inclusive). Accessing the array element n is invalid. Therefore, in the above example, using primes[5] would yield undefined results. This is very important and is a common source of programming errors.

Format Printing (K&R § 1.1, 7.2)

The line:

printf("primes[%d] is %d\n", i, primes[i])

uses the printf() function which provides formatted output functionality. The function takes an variable number of arguments. The first argument is a format string. It is printed verbatim, character by character, until a conversion specifier (e.g. "%d") is encountered. For each conversion specifier encountered, the function takes the next argument from the argument list and displays the requested representation for it. %d tells printf() that the next argument should be displayed as a decimal integer.

So when i equals 2, the output will be: primes[2] is 5\n

There are numerous conversion specifiers; for example %c can be used to print characters, %s can be used to display character strings and %f can be used for displaying floating point numbers. Various width and precision fields may also be specified. Because characters are closely related to numbers, we can output char variables as a number or as a character using the %d or %c format specifier respectively:

...
char ch = '0';

printf("ch as a number is %d\n", ch);		/* ch as a number is 48 */
printf("ch as a character is %c\n", ch);	/* ch as a character is 0 */
...

If the conversion specifiers in the format string do not match the remaining arguments (in either number or type), then bad things can happen (e.g. bad or missing output). A good compiler will warn you about this.

An equivalent way of expressing the above output in Java would be:

System.out.println("primes[" + i + "] is " + primes[i])

Using printf() takes a little getting used to. Personally, I find printf() to be easier on the eyes (and fingers).

Array Initialization (K&R § 4.9)

The above method for creating array, while correct, is cumbersome. Instead, when we define an array we can use the following shortcut to initialize the elements of the array.

#include	<stdio.h>

int
main()
{
	int	primes[] = { 2, 3, 5, 7, 11 };
	int	num = sizeof(primes) / sizeof(primes[0]);
	int	i;

	for (i = 0; i < num; i++) {
		printf("%d squared is %d\n", primes[i], primes[i] * primes[i]);
	}

	return 0;
}
array2.c

We don't have to specify a size for the array -- the compiler will determine the size based upon the number of elements in the list and copy all the elements in the list into the array. If you do specify a dimension for the size of the array and that number is too small to hold all the elements on the list, a warning will result during compilation. If the dimension specified is larger than the number of elements in the list, then the unfilled positions in the array will be set to zero.

We can define the array elements this way only when we create the array initially. We cannot, for example, set the entire array to a list of new numbers after it has already been initialized. However, we can redefine individual elements (e.g. primes[4] = 13) provided, of course that we do not try to set an element outside the bounds of the array.

The third argument to the printf() function is an arithmetic expression. This expression will be evaluated and the resultant value will be passed to the printf() function for output.

sizeof is an operator (K&R § 6.3) which returns the number of bytes required to store its operand. For example, if integers require 4 bytes each, then sizeof(primes) equals 20 because primes is an array consisting of five integers. We can use this fact to determine how many elements an array has by dividing the total size of the array by the size of one element (typically the first). In the above code, num will be set to 5, as expected. The sizeof operator can also be used on types as well as variables.

Multidimensional Arrays (K&R § 5.7)

We can also define `an array of arrays,' thereby creating multidimensional arrays, as follows:

#include	<stdio.h>

#define MAX_ROW 3
#define MAX_COL 4

int
main()
{
	int	array[MAX_ROW][MAX_COL] = {	{   1,   2,   3,   4 },
						{  30,  40,  50,  60 },
						{ 500, 600, 700, 800 }  };

	int	i, j;

	for (i = 0; i < MAX_ROW; i++) {
		int	product = 1;

		for (j = 0; j < MAX_COL; j++) {
			product *= array[i][j];
			printf("%3d %c ", array[i][j],
					(j == MAX_COL - 1) ? '=' : '*');
		}
		printf("%d\n", product);
	}

	return 0;
}
array3.c

The above array is defined with two dimensions. The first dimension can be omitted, but the second (and any subsequent dimensions) are required for computation of element offsets. The array could be defined as int array[][MAX_COL], and the compiler will determine the number of rows using MAX_COL and the number of elements defined in the nested list of numbers used to initialize the array.

To traverse the multidimensional array, we now use two loops, one nested inside the other. At the top of the block of the outer loop (which iterates over the rows), we define an integer called product and initialize it to 1. We then start the second loop which accumulates the product of all the elements in the row and outputs each element in the row.

In the printf() format string, we will display a decimal integer inside a field width of three (%3d). This serves to keep the numbers aligned in columns when we display them.

The format string also prints a character, as denoted by the %c conversion specifier. The character that we display depends upon the evaluation of the expression (j == MAX_COL - 1) ? '=' : '*'. This expression uses the ternary ? : operator (K&R § 2.11) which evaluates the boolean expression to the left of the question mark. If this expression is true, then the result of the expression between the ? and : is returned, otherwise, the result of the expression after the : is returned. So in this case, if j == MAX_COL - 1 (that is, we are displaying the last element in the column), then we'll output the '=' character, otherwise we'll output '*'. Again, note that characters are delimited by single quotes.

The output of the program is:

  1 *   2 *   3 *   4 = 24
 30 *  40 *  50 *  60 = 3600000
500 * 600 * 700 * 800 = 496275456

To make the spaces visible, the above output is presented below with the spaces replaced with dots. Notice that each of the numbers on the left hand side of the = are all displayed with a field width of three characters and that the * and = symbols have a space on either side as requested by the printf() format string.

..1.*...2.*...3.*...4.=.24
.30.*..40.*..50.*..60.=.3600000
500.*.600.*.700.*.800.=.496275456

Note that the last product is incorrect. This is because the actual product 168,000,000,000 was too large to fit inside an integer, which, for 32 bit signed integers, is too large. (A signed 32-bit integer can only hold values upto 2,147,483,647). As a result, the product "wrapped around" producing an incorrect result.

If we had used %d inside the printf() format string instead of %3d, then the output would have been:

1 * 2 * 3 * 4 = 24
30 * 40 * 50 * 60 = 3600000
500 * 600 * 700 * 800 = 496275456

(Again, the last product is still incorrect.)


Last modified: January 12, 2004 15:45:23 NST (Monday)